Run Ollama in Google Colab

Harsh Prajapati
2 min readJul 17, 2024

--

Get up and running with large language models. (courtesy of : https://ollama.com/)

Please refer to my previous article to learn more about running ollama locally: https://machinelearningengineer.medium.com/running-llm-locally-457a4e745433.

I ran the ollama on my local machine. I got the answer(used Llama3 8b) but waited almost 3 minutes and 20 seconds while I ran the same prompt in Google Colab, which lasted 24 seconds.

API Response in Local
API Response in Google Colab

To fine-tune the model in my local machine may take a month or more with 50k data. To reduce the time, need a powerful GPU.

Question in Mind: Why should we need to fine-tune Ollama's model even though there are many powerful LLM answers in zero-shot learning? A scenario where we are working on a Financial document or any other where privacy must be needed.

If you don’t have resources and need to work with Ollama in the free tier service of Google colab with T4 GPU, follow the steps.

This article presents the setup of ollama in Google Colab.

  1. Download the ollama:
    ! curl -fsSL https://ollama.com/install.sh | sh

2. Start Server:
! nohup ollama serve &

3. Pull LLM Model:
! ollama pull llama3

4. Use API :

! curl http://localhost:11434/api/generate -d ‘{“model”: “llama3”,”stream”:false , “prompt”:”Write a vector addition code in CUDA C”}’

Reference :

--

--

Harsh Prajapati

Enthusiastic to work on Natural Language Processing and Computer vision Domain.