Run Ollama in Google Colab

2 min readJul 17, 2024

Get up and running with large language models. (courtesy of : https://ollama.com/)

Please refer to my previous article to learn more about running ollama locally: https://machinelearningengineer.medium.com/running-llm-locally-457a4e745433.

I ran the ollama on my local machine. I got the answer(used Llama3 8b) but waited almost 3 minutes and 20 seconds while I ran the same prompt in Google Colab, which lasted 24 seconds.

To fine-tune the model in my local machine may take a month or more with 50k data. To reduce the time, need a powerful GPU.

Question in Mind: Why should we need to fine-tune Ollama's model even though there are many powerful LLM answers in zero-shot learning? A scenario where we are working on a Financial document or any other where privacy must be needed.

If you don’t have resources and need to work with Ollama in the free tier service of Google colab with T4 GPU, follow the steps.

This article presents the setup of ollama in Google Colab.

Download the ollama:
! curl -fsSL https://ollama.com/install.sh | sh

2. Start Server:
! nohup ollama serve &

3. Pull LLM Model:
! ollama pull llama3

4. Use API :

! curl http://localhost:11434/api/generate -d ‘{“model”: “llama3”,”stream”:false , “prompt”:”Write a vector addition code in CUDA C”}’

Reference :

GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language…

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - ollama/ollama

github.com

Run Ollama in Google Colab

GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language…

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - ollama/ollama

Written by Harsh Prajapati

Responses (2)