One-Line Summary: Create custom models using Ollama's Modelfile system — set system prompts, adjust parameters, and build specialized models for different use cases.

Prerequisites: Ollama running with Llama 3.1 pulled (Steps 2-3)


What Is a Modelfile

A Modelfile is Ollama's configuration format for creating custom models. Think of it like a Dockerfile but for LLMs. You start from a base model and customize:

  • System prompt — persistent instructions that define the model's behavior
  • Temperature — how creative or deterministic the responses are
  • Context window — how much conversation history to keep
  • Stop tokens — when the model should stop generating

Your First Custom Model

Create a file called Modelfile.coder in your project directory:

# Modelfile.coder — A coding assistant based on Llama 3.1
FROM llama3.1:8b
 
# System prompt that shapes every response
SYSTEM """You are a senior software engineer. You write clean, well-documented code.
 
Rules:
- Always include type hints in Python code
- Explain your reasoning before writing code
- Point out potential edge cases
- Suggest tests for the code you write
- Keep responses concise — no unnecessary preamble"""
 
# Lower temperature for more deterministic code output
PARAMETER temperature 0.3
 
# Increase context window for longer code discussions
PARAMETER num_ctx 8192

Build and run it:

# Create the custom model
ollama create coder -f Modelfile.coder
 
# Test it
ollama run coder "Write a Python function to find the longest palindrome substring"

The model now has your system prompt baked in. Every conversation starts with those instructions, and the lower temperature produces more consistent code output.

A Research Assistant

# Modelfile.researcher — A focused research assistant
FROM llama3.1:8b
 
SYSTEM """You are a research analyst. Your job is to analyze information thoroughly and present findings clearly.
 
Rules:
- Structure your responses with clear headings
- Cite specific data points when available
- Distinguish between facts and opinions
- Present multiple perspectives on controversial topics
- End with a brief summary of key takeaways"""
 
PARAMETER temperature 0.5
PARAMETER num_ctx 4096
ollama create researcher -f Modelfile.researcher
ollama run researcher "Analyze the pros and cons of microservices vs monoliths"

A Creative Writer

# Modelfile.writer — A creative writing assistant
FROM llama3.1:8b
 
SYSTEM """You are a creative writing partner. You help with stories, dialogue, and prose.
 
Style:
- Use vivid, specific language
- Show, don't tell
- Vary sentence length for rhythm
- Avoid clichés and purple prose"""
 
# Higher temperature for more creative output
PARAMETER temperature 0.8
PARAMETER num_ctx 4096
 
# Stop generating at these tokens to keep responses focused
PARAMETER stop "---"
PARAMETER stop "THE END"
ollama create writer -f Modelfile.writer
ollama run writer "Write the opening paragraph of a noir detective story set in Tokyo"

Modelfile Reference

Here are the most useful parameters:

ParameterDefaultDescription
temperature0.8Randomness (0.0 = deterministic, 1.0 = creative)
num_ctx2048Context window size in tokens
top_p0.9Nucleus sampling threshold
top_k40Top-k sampling — consider the top k tokens
repeat_penalty1.1Penalty for repeating tokens
stopStop sequence — model stops when it generates this
seedRandom seed for reproducible output

Using Custom Models from Python

Your custom models work with the same OpenAI-compatible API:

# Use a custom model from Python
from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)
 
# Call the coder model
response = client.chat.completions.create(
    model="coder",  # Your custom model name
    messages=[
        {"role": "user", "content": "Write a binary search in Python"},
    ],
)
 
print(response.choices[0].message.content)

Managing Custom Models

# List all models (including custom ones)
ollama list
 
# Show the Modelfile of an existing model
ollama show coder --modelfile
 
# Copy a model (to experiment with variations)
ollama cp coder coder-v2
 
# Delete a custom model
ollama rm coder

When to Use Modelfiles

Use CaseWhat to Customize
Domain-specific assistantSystem prompt with domain knowledge and rules
Code generationLow temperature (0.2-0.4), specific coding instructions
Creative writingHigh temperature (0.7-0.9), style guidelines
Strict Q&ALow temperature, instruction to only answer from context
Chat interfaceModerate temperature, conversational tone

Modelfiles are the simplest way to specialize a model. For deeper customization (changing the model's actual knowledge), you would fine-tune — which we cover in the next step.


← Benchmark Models | Next: Step 8 - What's Next →