Running LLM Locally

Harsh Prajapati
3 min readJul 6, 2024

--

The introduction of transformer architecture sparked a significant shift in natural language processing. Many major language models now use it as their primary architecture. As we all know, large language models(LLM) make our lives easier.

Think about the size of the LLM model. GPT-3 has 175 billion parameters☺️, each size of 16 bits(2 bytes). It requires approximately 350 GB🙄 of storage(courtesy of https://en.wikipedia.org/wiki/GPT-3). There are many other models whose parameter size exceeds 1 trillion. So it is more computationally expensive. We can’t even think of running locally.

However, this article is all about the Ollama framework. It allows us to run the Large Language model locally.

Ollama is an open-source platform that allows us to operate large language models like Llama 3, Mistral, and many others. It also enables versatility, from customizing the model for our application to effortlessly deploying it in production.

Ollama Landing Page

Here, we focus on executing the Llama3 model on our local system, following similar processes to run the other models.

Ollama is compatible with Windows, Linux, and macOS. To download, use the link below.

Mac: https://ollama.com/download/mac

Linux: https://ollama.com/download/linux

Windows: https://ollama.com/download/windows

Let’s try LLama 3 :

— Refer to article link: https://ollama.com/library/llama3:8b. It has all the information about the model size, variants, benchmark, and API information.

First, enter “ollama --version” into your computer's command prompt to ensure it is set correctly.

Download Llama 3 :

Use the “ollama pull llama3” command to download the chat/dialogue use case model.

Other model lists are available here: https://ollama.com/library. Use the same command to pull other models for your use cases.

Ollama Pull command to download the model on the local machine

Run Llama 3 :

You can run the model directly using the “llama run llama3” command.

Run Command

Use API :

If you want to use the API for the same model, follow a few straightforward steps.

  • Use “ollama serve” to start the server locally.
Command to start the Ollama server
  • Now, you can use “curl” or any code to send the request. Here, I used the Postman tool to request the llama3 model.
Request to Llama3 model
  • There are other parameters you can pass with this request. Below is the link you refer to to explore API in more detail.

API Documentation :https://github.com/ollama/ollama/blob/main/docs/api.md

Useful links :

--

--

Harsh Prajapati

Enthusiastic to work on Natural Language Processing and Computer vision Domain.