ollama-mcp-server
The Ollama MCP Server is designed to execute Ollama models asynchronously, facilitating integration with Claude Desktop. It offers comprehensive job, script, and workflow management capabilities, supporting fast-agent workflows for advanced agent operation.
Ollama MCP Server with fast-agent
Currently Claude can probably use ollama features but not agents.
See project wiki for some examples
I am working on a hybrid gui approach, claude writes the agent files and the user can run them via gui, fast-agent gui is mostly broken currently. You can use MCP with your local Ollama models this way though, which is cool to me lol.
You can run this with an activated venv. You will have to add your own mcp servers AND local model to:
fastagent.config.yaml
and
basic_agent.py
It can run openAI api models too.
Model configuration (api fees if api key is set in os environment variables)
i.e. Pay: default_model: "openai.gpt-4o"
or
Use a local tool using ollama LLM Free: default_model: "generic.qwen3:30b-a3b"
Edit available model and mcp servers here after copying and removing "example_" from file name:
fastagent.config.yaml
And edit the basic agent.py to add your mcp servers you want the agent to use from the config yaml
# In repo root
uv venv python=3.11 --seed
# uv pip install -r requirements.txt changed to:
.venv/bin/python
uv pip install -r pyproject.toml --all-extras # this will install all extras (including dependencies) for all project packages.
You can just right-click and run in VS Code or:
.venv/bin/python
cd fast-agent-scripts
./basic_agent.py
A Model Context Protocol (MCP) server that enables Claude to run Ollama models asynchronously, with outputs stored for later retrieval. Built with uv for Python environment management.
Features
- Run Ollama models without waiting for completion (async)
- Save and manage script templates with variable substitution
- Execute bash commands and multi-step workflows
- All outputs saved to a dedicated directory
- Simple configuration for Claude Desktop
Tools Overview
The server provides these tools to Claude:
Model Management
list_ollama_models
: Lists all locally installed Ollama models
Prompt Execution
run_ollama_prompt
: Run a text prompt through an Ollama modelrun_script
: Run a script template with variable substitution
Job Management
get_job_status
: Check if a job is completed or still runninglist_jobs
: View all running and completed jobscancel_job
: Terminate a running job
Script Management
save_script
: Create a new script templatelist_scripts
: View available script templatesget_script
: Retrieve the content of a saved script
Bash and Workflow
run_bash_command
: Execute shell commandsrun_workflow
: Run a sequence of steps as a workflow
Claude Desktop Integration
To use this server with Claude Desktop:
- Copy the content of
claude_desktop_config.json
to your Claude Desktop configuration with your own paths:
{
"mcpServers": {
"OllamaMCPServer": {
"command": "uv",
"args": [
"--directory", "/home/ty/Repositories/ai_workspace/ollama-mcp-server/src/ollama_mcp_server",
"run",
"server.py"
]
}
}
}
- Adjust the file paths if needed to match your system
Usage Examples
Running a Model
# Run a prompt without waiting for completion
await run_ollama_prompt(
model="llama3",
prompt="Explain the concept of quantum entanglement",
wait_for_result=False
)
# Get the result later
await get_job_status(job_id="job-id-from-previous-response")
Using Script Templates
# Run a template with variable substitution
await run_script(
script_name="expert_analysis",
model="llama3",
variables={
"domain": "machine learning",
"content_type": "research paper",
"topic": "transformer architecture",
"content": "Paper content goes here..."
}
)
Running Shell Commands
# Execute a bash command
await run_bash_command(
command="ollama pull llama3",
wait_for_result=False
)
Multi-step Workflows
# Execute multiple steps in sequence
await run_workflow(
steps=[
{
"tool": "run_bash_command",
"params": {
"command": "ollama pull llama3"
}
},
{
"tool": "run_ollama_prompt",
"params": {
"model": "llama3",
"prompt": "Explain quantum computing"
}
}
],
wait_for_completion=False
)
Fast-Agent Scripts Guide- Don't quote me version!
Setting Up Fast-Agent with Ollama MCP: A Comprehensive Guide
I'll guide you through setting up and using fast-agent with Ollama MCP server, focusing on multi-agent workflows and Ollama models that support tool calling.
Getting Started with Ollama MCP Server
The Ollama MCP Server provides an interface for running Ollama models with capabilities like model management, prompt execution, job management, script handling, and fast-agent integration. It's implemented in src/ollama_mcp_server/server.py
in your repository.
Prerequisites
First, let's ensure you have everything set up correctly:
- Verify Ollama is installed and running:
ollama serve
- Check available models:
ollama list
- Make sure your
fastagent.config.yaml
is properly configured
Ollama Models with Tool Calling Support
Based on my research, several Ollama models now support tool/function calling:
Ollama has added support for tool calling with various popular models. Recent improvements to the Ollama Python library (version 0.4) have enhanced function calling capabilities.
Models that support tool calling include:
You can use function calling with models like Llama 3, which supports both tool calling and function calling capabilities.
Tool calling allows models to use external tools or APIs during inference. This capability is available in models like Qwen2, Phi3, Llama 3, and other models specifically tagged with tool support.
Fast-Agent Workflow Patterns
Fast-agent supports several workflow patterns based on Anthropic's "Building Effective Agents" methodology. You can set up various workflow types in your fast-agent scripts.
Let's go through the main workflow patterns:
1. Basic Agent
The simplest integration is a basic agent that uses Ollama models. Here's a template:
#!/usr/bin/env python
"""
Basic Fast-Agent using Ollama
"""
import asyncio
from mcp_agent.core.fastagent import FastAgent
# Create FastAgent instance
fast = FastAgent("My Agent")
@fast.agent(
name="my_agent",
instruction="You are a helpful AI assistant.",
model="phi4-reasoning:14b-plus-q4_K_M", # Specify an Ollama model
servers=["ollama_server"] # Reference the MCP server
)
async def main():
# Run the agent
async with fast.run() as agent:
# Start interactive mode
await agent.interactive()
if __name__ == "__main__":
asyncio.run(main())
2. Chain Workflow
For more complex scenarios, you can create a chain of agents where the output from one agent is passed to another:
@fast.agent(
name="researcher",
instruction="Research topics thoroughly.",
model="phi4-reasoning:14b-plus-q4_K_M",
servers=["ollama_server"]
)
@fast.agent(
name="summarizer",
instruction="Summarize information concisely.",
model="qwen3:0.6b", # Using a different model for summarization
servers=["ollama_server"]
)
@fast.chain(
name="research_workflow",
sequence=["researcher", "summarizer"],
instruction="Research and summarize information."
)
async def main():
async with fast.run() as agent:
await agent.interactive()
This approach follows the chain pattern described in the fast-agent documentation. The chain workflow offers a declarative approach to calling agents in sequence:
@fast.chain(
"post_writer",
sequence=["url_fetcher","social_media"]
)
# we can then prompt it directly:
async with fast.run() as agent:
await agent.interactive(agent="post_writer")
When a chain is prompted, it returns to a chat with the last agent in the chain. You can switch agents by typing @agent-name
.
3. Parallel Workflow
You can run multiple models in parallel and aggregate their outputs:
@fast.agent(
name="model1_agent",
instruction="First model perspective.",
model="phi4-reasoning:14b-plus-q4_K_M",
servers=["ollama_server"]
)
@fast.agent(
name="model2_agent",
instruction="Second model perspective.",
model="gemma3:latest",
servers=["ollama_server"]
)
@fast.agent(
name="aggregator",
instruction="Combine and analyze multiple perspectives.",
model="phi4-reasoning:14b-plus-q4_K_M",
servers=["ollama_server"]
)
@fast.parallel(
name="ensemble_workflow",
fan_out=["model1_agent", "model2_agent"],
fan_in="aggregator"
)
The Parallel Workflow is particularly useful for creating model ensembles. It sends the same message to multiple agents simultaneously (fan-out), then uses the fan-in agent to process the combined content:
@fast.agent("translate_fr", "Translate the text to French")
@fast.agent("translate_de", "Translate the text to German")
@fast.agent("translate_es", "Translate the text to Spanish")
@fast.parallel(
name="translate",
fan_out=["translate_fr","translate_de","translate_es"]
)
@fast.chain(
"post_writer",
sequence=["url_fetcher","social_media","translate"]
)
If you don't specify a fan-in agent, the parallel workflow returns the combined agent results verbatim. The parallel pattern is useful for ensembling ideas from different LLMs.
4. Evaluator-Optimizer Pattern
For iterative content improvement:
@fast.agent(
name="generator",
instruction="Generate content based on requests.",
model="phi4-reasoning:14b-plus-q4_K_M",
servers=["ollama_server"]
)
@fast.agent(
name="evaluator",
instruction="Evaluate content and provide feedback. Rate as POOR, FAIR, GOOD, or EXCELLENT.",
model="phi4-reasoning:14b-plus-q4_K_M",
servers=["ollama_server"]
)
@fast.evaluator_optimizer(
name="optimizer_workflow",
generator="generator",
evaluator="evaluator",
min_rating="EXCELLENT",
max_refinements=3
)
This creates a workflow that iteratively improves content until it reaches the specified quality threshold or maximum number of refinements.
Evaluator-Optimizers combine two agents: one to generate content (the generator), and another to judge that content and provide actionable feedback (the evaluator). Messages are sent to the generator first, then the pair run in a loop until either the evaluator is satisfied with the quality, or the maximum number of refinements is reached.
@fast.evaluator_optimizer(
name="researcher"
generator="web_searcher"
evaluator="quality_assurance"
min_rating="EXCELLENT"
max_refinements=3
)
async with fast.run() as agent:
await agent.researcher.send("produce a report on how to make the perfect espresso")
When used in a workflow, it returns the last generator message as the result.
5. Router Workflow
For dynamic agent selection based on the input:
@fast.router(
name="router_workflow",
agents=["agent1", "agent2", "agent3"],
instruction="Route the request to the most appropriate agent"
)
Routers use an LLM to assess a message and route it to the most appropriate agent. The routing prompt is automatically generated based on the agent instructions and available servers:
@fast.router(
name="route"
agents["agent1","agent2","agent3"]
)
Note that if only one agent is supplied to the router, it forwards directly.
6. Orchestrator for Complex Tasks
For complex workflows with multiple steps:
@fast.orchestrator(
name="orchestrator_workflow",
agents=["agent1", "agent2", "agent3"],
instruction="Plan and execute a complex task"
)
Given a complex task, the Orchestrator uses an LLM to generate a plan to divide the task amongst the available agents. The planning and aggregation prompts are generated by the Orchestrator, which benefits from using more capable models. Plans can either be built once at the beginning (plantype="full") or iteratively (plantype="iterative"):
@fast.orchestrator(
name="orchestrate"
agents=["task1","task2","task3"]
)
Running Fast-Agent Scripts
To run a fast-agent script:
cd /path/to/ollama-mcp-server
uv run src/fast-agent-scripts/your_script.py
For specific agent targets:
uv run src/fast-agent-scripts/your_script.py --agent my_agent_name
With a specific message:
uv run src/fast-agent-scripts/your_script.py --agent my_agent_name --message "Your prompt"
Configuration
The fastagent.config.yaml
file defines the available MCP servers:
# MCP Servers configuration
mcp:
ollama_server:
# Direct reference to local Ollama MCP server
command: "uv"
args: ["run", "-m", "src.ollama_mcp_server.server"]
This configuration makes the Ollama MCP server available to fast-agent scripts.
Practical Example: Building a Multi-Agent Research Assistant
Let's create a practical example that combines different workflow patterns for a research assistant:
#!/usr/bin/env python
"""
Multi-agent research assistant using Ollama models
"""
import asyncio
from mcp_agent.core.fastagent import FastAgent
# Create FastAgent instance
fast = FastAgent("Research Assistant")
# Define individual agents
@fast.agent(
name="web_searcher",
instruction="Search the web for relevant information on the given topic.",
model="llama3:latest", # Llama3 has good tool-using capabilities
servers=["ollama_server", "fetch_server"]
)
@fast.agent(
name="fact_checker",
instruction="Verify facts and identify potential inaccuracies.",
model="qwen2:latest", # Qwen2 has good reasoning
servers=["ollama_server"]
)
@fast.agent(
name="summarizer",
instruction="Create concise, well-structured summaries of information.",
model="phi3:latest",
servers=["ollama_server"]
)
@fast.agent(
name="evaluator",
instruction="Evaluate content quality and accuracy. Rate as POOR, FAIR, GOOD, or EXCELLENT.",
model="llama3:latest",
servers=["ollama_server"]
)
# Define workflows
@fast.chain(
name="research_chain",
sequence=["web_searcher", "fact_checker", "summarizer"],
instruction="Research, verify, and summarize information on a topic."
)
@fast.evaluator_optimizer(
name="refined_research",
generator="research_chain",
evaluator="evaluator",
min_rating="EXCELLENT",
max_refinements=3
)
async def main():
async with fast.run() as agent:
await agent.interactive()
if __name__ == "__main__":
asyncio.run(main())
This example combines chain and evaluator-optimizer patterns to create a research assistant that:
- Searches for information
- Fact-checks it
- Summarizes the verified information
- Evaluates the quality of the summary
- Refines it until it reaches excellent quality
Troubleshooting Tips
If you encounter issues:
- Ensure Ollama is running (
ollama serve
) - Check model availability with
ollama list
- Verify the
fastagent.config.yaml
configuration - Check script syntax and model names
- Restart Claude Desktop after making changes to the Ollama MCP server
Next Steps
To advance your fast-agent setup:
- Experiment with different agent combinations
- Test various Ollama models for different tasks
- Create specialized agents for specific domains
- Develop more complex workflows for real-world applications