Running LLM Locally

3 min readJul 6, 2024

The introduction of transformer architecture sparked a significant shift in natural language processing. Many major language models now use it as their primary architecture. As we all know, large language models(LLM) make our lives easier.

Think about the size of the LLM model. GPT-3 has 175 billion parameters☺️, each size of 16 bits(2 bytes). It requires approximately 350 GB🙄 of storage(courtesy of https://en.wikipedia.org/wiki/GPT-3). There are many other models whose parameter size exceeds 1 trillion. So it is more computationally expensive. We can’t even think of running locally.

However, this article is all about the Ollama framework. It allows us to run the Large Language model locally.

Ollama is an open-source platform that allows us to operate large language models like Llama 3, Mistral, and many others. It also enables versatility, from customizing the model for our application to effortlessly deploying it in production.

Here, we focus on executing the Llama3 model on our local system, following similar processes to run the other models.

Ollama is compatible with Windows, Linux, and macOS. To download, use the link below.

Mac: https://ollama.com/download/mac

Linux: https://ollama.com/download/linux

Windows: https://ollama.com/download/windows

Let’s try LLama 3 :

— Refer to article link: https://ollama.com/library/llama3:8b. It has all the information about the model size, variants, benchmark, and API information.

First, enter “ollama --version” into your computer's command prompt to ensure it is set correctly.

Download Llama 3 :

Use the “ollama pull llama3” command to download the chat/dialogue use case model.

Other model lists are available here: https://ollama.com/library. Use the same command to pull other models for your use cases.

Ollama Pull command to download the model on the local machine

Run Llama 3 :

You can run the model directly using the “llama run llama3” command.

Use API :

If you want to use the API for the same model, follow a few straightforward steps.

Use “ollama serve” to start the server locally.

Now, you can use “curl” or any code to send the request. Here, I used the Postman tool to request the llama3 model.

There are other parameters you can pass with this request. Below is the link you refer to to explore API in more detail.

API Documentation :https://github.com/ollama/ollama/blob/main/docs/api.md

Useful links :

GitHub - ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language…

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - ollama/ollama

github.com

ollama/docs/faq.md at main · ollama/ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - ollama/docs/faq.md at main ·…