Using Ollama on Katana
Ollama is a lightweight tool for running large language models locally on your machine (no cloud required). It lets you download, run, and interact with models like LLaMA-style chat models through a simple CLI or API.
Ollama is available on Katana as an environment module. You do not need to install it manually.
This guide shows a simple workflow for running Ollama on Katana.
Minimal Working Workflow
qsub -I -l select=1:ncpus=4:mem=32gb:ngpus=1
module load ollama
export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models
mkdir -p $OLLAMA_MODELS
ollama serve &> ollama.$(date +%s).log &
ollama pull phi3
ollama run phi3
To exit type /exit
1. Start an Interactive Job
Ollama should run on a compute node, not a login node.
GPU job (recommended)
qsub -I -l select=1:ncpus=4:mem=32gb:ngpus=1
CPU job (for small models)
qsub -I -l select=1:ncpus=4:mem=16gb
After the job starts, you will see something like:
z1234567@k001:~
This means you are now on compute node k001.
2. Load the Ollama Module
module load ollama
Verify the Ollama Installation
After loading the module, check that Ollama is available:
module load ollama
ollama --version
You may see a warning message similar to:
Warning: could not connect to a running Ollama instance
Warning: client version is 0.17.7
3. Set Model Storage Location (Recommended)
Models are large, so store them in scratch instead of home. On Katana, home directories are quota-controlled and not intended for storing very large model files; using scratch keeps downloads fast and avoids filling your home directory. Set OLLAMA_MODELS to a scratch path and create that directory before pulling models:
export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models
mkdir -p $OLLAMA_MODELS
4. Start the Ollama Service
export OLLAMA_MODELS=/srv/scratch/$USER/ollama/models
module load netsandbox
safe_ollama
Expected output:
Ollama serve starting, log file at ollama.xxxx
ollama list
5. Recommended Method: Open Another Terminal via SSH
This is the recommended way to run multiple commands while keeping the server running.
Open a new terminal (CMD / PowerShell / Terminal).
Step 1 — SSH to Katana login node
ssh zID@katana.restech.unsw.edu.au
Step 2 — SSH to the same compute node
Use the node name shown earlier.
Example:
z1234567@k001:~
Then run:
ssh k001
Now you are connected to the same running job.
This allows you to control Ollama while the server keeps running.
Verify you are on the correct node
hostname
Expected:
k001
Start Using Ollama
Pull a model
ollama pull phi3
It is designed to be fast, efficient, and easy to run on standard hardware.
Run the model
ollama run phi3
>>>Send a message (/? for help)
Useful Commands Exit current model:
/exit
List models:
ollama list
Show running models:
ollama ps
Stop Ollama:
pkill ollama
Ending the Session
When finished:
pkill ollama
exit
This will:
- stop Ollama
- release the compute resources
GPU vs CPU Recommendations
CPU-friendly models
phi3
gemma:2b
tinyllama
Recommended GPU models
llama3
mistral
gemma:7b
Large models (require high VRAM)
llama3:70b
mixtral