Run Llama LLMs on your laptop with Hugging Face and Python

By

Cameron McKenzie, TechTarget

There are numerous ways to run large language models such as DeepSeek, Claude or Meta's Llama locally on your laptop, including Ollama and Modular's Max platform. But if you want to fully control the large language model experience, the best way is to integrate Python and Hugging Face APIs together.

How to run Llama in a Python app

To run any large language model (LLM) locally within a Python app, follow these steps:

Create a Python environment with PyTorch, Hugging Face and the transformer's dependencies.
Find the official webpage of the LLM on Hugging Face.
Programmatically download all the required files from the Hugging Face repo.
Create a pipeline that references the model and the transformers it uses.
Query the pipeline and interact with the LLM.

Transformers, PyTorch and Hugging Face

This tutorial creates a virtual Python environment and installs the required PyTorch and Hugging Face dependencies with the following PIP installs:

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/sh1/nightly/cpu

pip3 install huggingface_hub

pip3 install transformers

Find the LLM on Hugging Face

Visual of TinyLlama on Hugging Face. — The files Python requires to run your LLM locally can be found on the model's Hugging Face homepage.

The Hugging Face Python API needs to know the name of the LLM to run, and you must specify the names of the various files to download. You can obtain them all on the official webpage of the LLM on the Hugging Face site.

Hugging Face, Python and Llama

With the dependencies installed and the required files identified, the last step is to simply code the Python program. The full code for the application is as follows:

from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

HUGGING_FACE_API_KEY = 'hf_ABCDEFGHIJKLYMNOPQURSTUVDSESYANELA'

huggingface_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Replace with the actual model repo ID

required_files = [
    "special_tokens_map.json",
    "generation_config.json",
    "tokenizer_config.json",
    "model.safetensors",
    "eval_results.json",
    "tokenizer.model",
    "tokenizer.json",
    "config.json"
]

for filename in required_files:
    download_location = hf_hub_download(
        repo_id=huggingface_model,
        filename=filename,
        token=HUGGING_FACE_API_KEY
    )

model = AutoModelForCausalLM.from_pretrained(huggingface_model)  # Enable trust_remote_code for safetensors
tokenizer = AutoTokenizer.from_pretrained(huggingface_model)

text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=1000
)

response = text_generation_pipeline("Tell me a good programming joke...")
print(response)

TinyLlama prompts and replies in Python

In this example, we just ask TinyLlama for a good programming joke. Admittedly the JSON response isn't particularly impressive, although this LLM isn't exactly known for its clever sense of humor. A factual query might have garnered a better response.

[{ 'generated_text':'Tell me a funny programming joke…\n\n A programmer accidentally deleted a file from his hard drive. He tried to recover it but it was gone.'}]

And that's how easy it is to run LLMs locally and fully integrate them with your Python code.

Cameron McKenzie has been a Java EE software engineer for 20 years. His current specialties include Agile development; DevOps; Spring; and container-based technologies such as Docker, Swarm and Kubernetes.

View All Videos

Search App Architecture

Insomnia vs. Postman: Comparing API management tools
Insomnia has a streamlined interface and focus. Postman has extensive features for end-to-end development. Choosing comes down to...
8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...

Search Software Quality

Google adds Gemini CLI for GitHub Actions coding agent
The beta version of Google Gemini CLI for GitHub Actions starts simple and builds in security, but overall, the 'honeymoon phase'...
Scrum master certification exam questions and answers
Are you ready for the Scrum master certification exam? Test yourself on these 10 tough Scrum master exam questions and answers.
8 examples of ethical issues in software development
As software becomes entrenched in every aspect of the human experience, developers have an ethical responsibility to their ...

Search Cloud Computing

AWS reports 17.5% growth, fails to impress investors
Amazon's cloud business delivered better-than-expected growth in the second quarter, but pales in comparison with results from ...
Prep data for machine learning with AWS analytics services
Data preparation is crucial when building and training machine learning models with SageMaker AI. What AWS analytics services can...
Microsoft Q4 earnings surge on cloud results; AI gains steam
Booming cloud business drove fourth-quarter and full-year results past analyst expectations as the AI race continues to heat up.

Search Security

How to prevent DoS attacks and what to do if they happen
The worst DoS attacks are like digital tsunamis that put critical business operations at risk. Learn how they work, ways to stop ...
Experts weigh in on securing AI effectively
Using AI comes with security risks. Learn what the top attack vectors and privacy threats are, then discover how to mitigate them...
NSA partnering with cyber firms to support under-resourced defense contractors
The spy agency has sought out creative ways to help protect small companies supplying the U.S. military.

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Close