Using Ollama on Katana

Ollama is a lightweight tool for running large language models locally on your machine (no cloud required). It lets you download, run, and interact with models like LLaMA-style chat models through a simple CLI or API.

Ollama is available on Katana as an environment module. You do not need to install it manually.

This guide shows a simple workflow for running Ollama on Katana.

Minimal Working Workflow

qsub -I -l select=1:ncpus=4:mem=32gb:ngpus=1 

module load ollama 

export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models 

mkdir -p $OLLAMA_MODELS 

ollama serve &> ollama.$(date +%s).log & 

ollama pull phi3 

ollama run phi3 

To exit type /exit 

1. Start an Interactive Job

Ollama should run on a compute node, not a login node.

GPU job (recommended)

qsub -I -l select=1:ncpus=4:mem=32gb:ngpus=1

CPU job (for small models)

qsub -I -l select=1:ncpus=4:mem=16gb

After the job starts, you will see something like:

z1234567@k001:~

This means you are now on compute node k001.


2. Load the Ollama Module

module load ollama

Verify the Ollama Installation

After loading the module, check that Ollama is available:

module load ollama

ollama --version

You may see a warning message similar to:

Warning: could not connect to a running Ollama instance
Warning: client version is 0.17.7


3. Set Model Storage Location (Recommended)

Models are large, so store them in scratch instead of home. On Katana, home directories are quota-controlled and not intended for storing very large model files; using scratch keeps downloads fast and avoids filling your home directory. Set OLLAMA_MODELS to a scratch path and create that directory before pulling models:

export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models
mkdir -p $OLLAMA_MODELS

4. Start the Ollama Service

export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models
module load netsandbox
safe_ollama

Expected output:

Ollama serve starting, log file at ollama.xxxx
ollama list

5. Recommended Method: Open Another Terminal via SSH

This is the recommended way to run multiple commands while keeping the server running.

Open a new terminal (CMD / PowerShell / Terminal).


Step 1 — SSH to Katana login node

ssh zID@katana.restech.unsw.edu.au

Step 2 — SSH to the same compute node

Use the node name shown earlier.

Example:

z1234567@k001:~

Then run:

ssh k001

Now you are connected to the same running job.

This allows you to control Ollama while the server keeps running.


Verify you are on the correct node

hostname

Expected:

k001

Start Using Ollama

Pull a model

ollama pull phi3
The phi3 model is recommended for first-time users because it is small and runs reliably on both CPU and GPU sessions.It is a small language model developed by Microsoft.
It is designed to be fast, efficient, and easy to run on standard hardware.

Run the model

ollama run phi3
After a couple seconds:
>>>Send a message (/? for help)
Now you can start chat with the model by typing into terminal.


Useful Commands Exit current model:

/exit

List models:

ollama list

Show running models:

ollama ps

Stop Ollama:

pkill ollama

Ending the Session

When finished:

pkill ollama
exit

This will:

  • stop Ollama
  • release the compute resources

GPU vs CPU Recommendations

CPU-friendly models

phi3
gemma:2b
tinyllama

Recommended GPU models

llama3
mistral
gemma:7b

Large models (require high VRAM)

llama3:70b
mixtral