GeminiMCP
Gemini MCP Server is an integration of the Model Control Protocol with Google's Gemini API, offering advanced capabilities like dynamic model access, caching, and seamless file integration. It is designed for reliable performance and ease of use in a variety of applications.
Gemini MCP Server
MCP (Model Control Protocol) server integrating with Google's Gemini API.
Key Advantages
- Single Self-Contained Binary: Written in Go, the project compiles to a single binary with no dependencies, eliminating package manager issues and preventing unexpected changes without user knowledge
- Dynamic Model Access: Automatically fetches the latest available Gemini models at startup
- Advanced Context Handling: Efficient caching system with TTL control for repeated queries
- Enhanced File Handling: Seamless file integration with intelligent MIME detection
- Production Reliability: Robust error handling, automatic retries, and graceful degradation
- Comprehensive Capabilities: Full support for code analysis, general queries, and search with grounding
Installation and Configuration
Prerequisites
- Google Gemini API key
Building from Source
## Clone and build
git clone https://github.com/chew-z/GeminiMCP
cd GeminiMCP
go build -o mcp-gemini
## Start server with environment variables
export GEMINI_API_KEY=your_api_key
export GEMINI_MODEL=gemini-1.5-pro
./mcp-gemini
Client Configuration
Add this server to any MCP-compatible client like Claude Desktop by adding to your client's configuration:
{
"gemini": {
"command": "/Users/<user>/Path/to/bin/mcp-gemini",
"env": {
"GEMINI_API_KEY": "YOUR_API_KEY_HERE",
"GEMINI_MODEL": "gemini-2.5-pro-exp-03-25",
"GEMINI_SEARCH_MODEL": "gemini-2.5-flash-preview-04-17",
"GEMINI_SYSTEM_PROMPT": "You are a senior developer. Your job is to do a thorough code review of this code...",
"GEMINI_SEARCH_SYSTEM_PROMPT": "You are a search assistant. Your job is to find the most relevant information about this topic..."
}
}
}
Important Notes:
-
Environment Variables: For Claude Desktop app all configuration variables must be included in the MCP configuration JSON shown above (in the
env
section), not as system environment variables or in .env files. Variables set outside the config JSON will not take effect for the client application. -
Claude Desktop Config Location:
- On macOS:
~/Library/Application\ Support/Claude/claude_desktop_config.json
- On Windows:
%APPDATA%\Claude\claude_desktop_config.json
- On macOS:
-
Configuration Help: If you encounter any issues configuring the Claude desktop app, refer to the MCP Quickstart Guide for additional assistance.
Using this MCP server from Claude Desktop app
You can use Gemini tools directly from an LLM console by creating prompt examples that invoke the tools. Here are some example prompts for different use cases:
Listing Available Models
Say to your LLM:
Please use the gemini_models tool to show me the list of available Gemini models.
The LLM will invoke the gemini_models
tool and return the list of available models, organized by preference and capability. The output prioritizes recommended models for specific tasks, then organizes remaining models by version (newest to oldest).
Code Analysis with gemini_ask
Say to your LLM:
Use the
gemini_ask
tool to analyze this Go code for potential concurrency issues:func processItems(items []string) { var wg sync.WaitGroup results := make([]string, len(items)) for i, item := range items { wg.Add(1) go func(i int, item string) { results[i] = processItem(item) wg.Done() }(i, item) } wg.Wait() return results }
Please use a system prompt that focuses on code review and performance optimization.
Creative Writing with gemini_ask
Say to your LLM:
Use the
gemini_ask
tool to create a short story about a space explorer discovering a new planet. Set a custom system prompt that encourages creative, descriptive writing with vivid imagery.
Factual Research with gemini_search
Say to your LLM:
Use the
gemini_search
tool to find the latest information about advancements in fusion energy research from the past year. Set the start_time to one year ago and end_time to today. Include sources in your response.
Complex Reasoning with Thinking Mode
Say to your LLM:
Use the
gemini_ask
tool with a thinking-capable model to solve this algorithmic problem:"Given an array of integers, find the longest consecutive sequence of integers. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so return 4."
Enable thinking mode with a high budget level so I can see the detailed step-by-step reasoning process.
This will show both the final answer and the model's comprehensive reasoning process with maximum detail.
Simple Project Analysis with Caching
Say to your LLM:
Please use a caching-enabled Gemini model to analyze our project files. Include the main.go, config.go and models.go files and ask Gemini a series of questions about our project architecture and how it could be improved. Use appropriate system prompts for each question.
With this simple prompt, the LLM will:
- Select a caching-compatible model (with -001 suffix)
- Include the specified project files
- Enable caching automatically
- Ask multiple questions while maintaining context
- Customize system prompts for each question type
This approach makes it easy to have an extended conversation about your codebase without complex configuration.
Combined File Attachments with Caching
For programming tasks, you can directly use the file attachments feature with caching to create a more efficient workflow:
Use gemini_ask with model gemini-2.0-flash-001 to analyze these Go files. Please add both structs.go and models.go to the context, enable caching with a 30-minute TTL, and ask about how the model management system works in this application.
The server has special optimizations for this use case, particularly useful when:
- Working with complex codebases requiring multiple files for context
- Planning to ask follow-up questions about the same code
- Debugging issues that require file context
- Code review scenarios discussing implementation details
When combining file attachments with caching, files are analyzed once and stored in the cache, making subsequent queries much faster and more cost-effective.
Managing Multiple Caches and Reducing Costs
During a conversation, you can create and use multiple caches for different sets of files or contexts:
Please create a new cache for our frontend code (App.js, components/.js) and analyze it separately from our backend code cache we created earlier.*
The LLM can intelligently manage these different caches, switching between them as needed based on your queries. This capability is particularly valuable for projects with distinct components that require different analysis approaches.
Cost Savings: Using caching significantly reduces API costs, especially when working with large codebases or having extended conversations. By caching the context:
- Files are processed and tokenized only once instead of with every query
- Follow-up questions reuse the existing context instead of creating new API requests
- Complex analyses can be performed incrementally without re-uploading files
- Multi-session analysis becomes more economical, with some users reporting 40-60% cost reductions for extended code reviews
Customizing System Prompts
The gemini_ask
and gemini_search
tools are highly versatile and not limited to programming-related queries. You can customize the system prompt for various use cases:
- Educational content: "You are an expert teacher who explains complex concepts in simple terms..."
- Creative writing: "You are a creative writer specializing in vivid, engaging narratives..."
- Technical documentation: "You are a technical writer creating clear, structured documentation..."
- Data analysis: "You are a data scientist analyzing patterns and trends in information..."
When using these tools from an LLM console, always encourage the LLM to set appropriate system prompts and parameters for the specific use case. The flexibility of system prompts allows these tools to be effective for virtually any type of query.
Detailed Documentation
Available Tools
The server provides three primary tools:
1. gemini_ask
For code analysis, general queries, and creative tasks with optional file context.
{
"name": "gemini_ask",
"arguments": {
"query": "Review this Go code for concurrency issues...",
"model": "gemini-2.0-flash-001",
"systemPrompt": "Optional custom instructions",
"file_paths": ["main.go", "config.go"],
"use_cache": true,
"cache_ttl": "1h"
}
}
Simple code analysis with file attachments:
{
"name": "gemini_ask",
"arguments": {
"query": "Analyze this code and suggest improvements",
"model": "gemini-2.5-pro-exp-03-25",
"file_paths": ["models.go"]
}
}
Combining file attachments with caching for repeated queries:
{
"name": "gemini_ask",
"arguments": {
"query": "Explain the main data structures in these files and how they interact",
"model": "gemini-2.0-flash-001",
"file_paths": ["models.go", "structs.go"],
"use_cache": true,
"cache_ttl": "30m"
}
}
2. gemini_search
Provides grounded answers using Google Search integration with enhanced model capabilities.
{
"name": "gemini_search",
"arguments": {
"query": "What is the current population of Warsaw, Poland?",
"systemPrompt": "Optional custom search instructions",
"enable_thinking": true,
"thinking_budget": 8192,
"thinking_budget_level": "medium",
"max_tokens": 4096,
"model": "gemini-2.5-pro-exp-03-25",
"start_time": "2024-01-01T00:00:00Z",
"end_time": "2024-12-31T23:59:59Z"
}
}
Returns structured responses with sources and optional thinking process:
{
"answer": "Detailed answer text based on search results...",
"thinking": "Optional detailed reasoning process when thinking mode is enabled",
"sources": [
{
"title": "Source Title",
"url": "https://example.com/source-page",
"type": "web"
}
],
"search_queries": ["population Warsaw Poland 2025"]
}
3. gemini_models
Lists all available Gemini models with capabilities and caching support.
{
"name": "gemini_models",
"arguments": {}
}
Returns comprehensive model information including:
- Complete list of available models (dynamically fetched at startup)
- Model IDs and descriptions
- Caching support status
- Usage examples
Model Management
The server dynamically fetches available Gemini models from the Google API at startup, preserving pre-defined descriptions and filtering out non-relevant models like embedding and visual models. Models are organized by preference and capability:
Recommended Models for Specific Tasks
Model ID | Description | Recommended For |
---|---|---|
gemini-2.5-pro-exp-03-25 | Advanced Pro model with superior thinking support | Complex reasoning with thinking mode |
gemini-2.0-flash-001 | Cacheable Flash model optimized for repeated tasks | Programming tasks with caching |
gemini-2.5-flash-preview-04-17 | Fast Flash model with excellent search capabilities | Search queries and web browsing |
Models are organized by preference first, then by version (newest to oldest) when displayed in the gemini_models
tool output. Use the gemini_models
tool for a complete, up-to-date list.
Caching System
The server offers sophisticated context caching:
- Model Compatibility: Only models with version suffixes (e.g.,
-001
) support caching - Cache Control: Set
use_cache: true
and specifycache_ttl
(e.g., "10m", "2h") - File Association: Automatically stores files and associates with cache context
- Performance Optimization: Local metadata caching for quick lookups
Example with caching:
{
"name": "gemini_ask",
"arguments": {
"query": "Follow up on our previous discussion...",
"model": "gemini-1.5-pro-001",
"use_cache": true,
"cache_ttl": "1h"
}
}
File Handling
Robust file processing with:
- Direct Path Integration: Simply specify local file paths in
file_paths
array - Automatic Validation: Size checking, MIME type detection, and content validation
- Wide Format Support: Handles common code, text, and document formats
- Metadata Caching: Stores file information for quick future reference
Advanced Features
Thinking Mode
The server supports "thinking mode" for compatible models (primarily Gemini 2.5 Pro models):
- Enhanced Reasoning: Shows the model's step-by-step reasoning process
- Complex Problem Solving: Particularly useful for debugging, mathematical reasoning, and complex analysis
- Model Compatibility: Automatically validates thinking capability based on requested model
- Tool Support: Available in both
gemini_ask
andgemini_search
tools - Configurable Budget: Control thinking depth with budget levels or explicit token counts
Example with thinking mode:
{
"name": "gemini_ask",
"arguments": {
"query": "Analyze the algorithmic complexity of merge sort vs. quick sort",
"model": "gemini-2.5-pro-exp-03-25",
"enable_thinking": true,
"thinking_budget_level": "high"
}
}
Thinking Budget Control
Configure the depth and detail of the model's thinking process:
-
Predefined Budget Levels:
none
: 0 tokens (thinking disabled)low
: 4096 tokens (default, quick analysis)medium
: 16384 tokens (detailed reasoning)high
: 24576 tokens (maximum depth for complex problems)
-
Custom Token Budget: Alternatively, set a specific token count with
thinking_budget
parameter (0-24576)
Examples:
// Using predefined level
{
"name": "gemini_ask",
"arguments": {
"query": "Analyze this algorithm...",
"model": "gemini-2.5-pro-exp-03-25",
"enable_thinking": true,
"thinking_budget_level": "medium"
}
}
// Using explicit token count
{
"name": "gemini_search",
"arguments": {
"query": "Research quantum computing developments...",
"model": "gemini-2.5-pro-exp-03-25",
"enable_thinking": true,
"thinking_budget": 12000
}
}
Context Window Size Management
The server intelligently manages token limits:
- Custom Sizing: Set
max_tokens
parameter to control response length - Model-Aware Defaults: Automatically sets appropriate defaults based on model capabilities
- Capacity Warnings: Provides warnings when requested tokens exceed model limits
- Proportional Defaults: Uses percentage-based defaults (75% for general queries, 50% for search)
Example with context window size management:
{
"name": "gemini_ask",
"arguments": {
"query": "Generate a detailed analysis of this code...",
"model": "gemini-1.5-pro-001",
"max_tokens": 8192
}
}
Configuration Options
Essential Environment Variables
Variable | Description | Default |
---|---|---|
GEMINI_API_KEY | Google Gemini API key | Required |
GEMINI_MODEL | Default model ID for gemini_ask | gemini-1.5-pro |
GEMINI_SEARCH_MODEL | Default model ID for gemini_search | gemini-2.0-flash |
GEMINI_SYSTEM_PROMPT | System prompt for general queries | Custom review prompt |
GEMINI_SEARCH_SYSTEM_PROMPT | System prompt for search | Custom search prompt |
GEMINI_MAX_FILE_SIZE | Max upload size (bytes) | 10485760 (10MB) |
GEMINI_ALLOWED_FILE_TYPES | Comma-separated MIME types | [Common text/code types] |
Optimization Variables
Variable | Description | Default |
---|---|---|
GEMINI_TIMEOUT | API timeout in seconds | 90 |
GEMINI_MAX_RETRIES | Max API retries | 2 |
GEMINI_TEMPERATURE | Model temperature (0.0-1.0) | 0.4 |
GEMINI_ENABLE_CACHING | Enable context caching | true |
GEMINI_DEFAULT_CACHE_TTL | Default cache time-to-live | 1h |
GEMINI_ENABLE_THINKING | Enable thinking mode capability | true |
GEMINI_THINKING_BUDGET_LEVEL | Default thinking budget level (none/low/medium/high) | low |
GEMINI_THINKING_BUDGET | Explicit thinking token budget (0-24576) | 4096 |
Operational Features
- Degraded Mode: Automatically enters safe mode on initialization errors
- Retry Logic: Configurable exponential backoff for reliable API communication
- Structured Logging: Comprehensive event logging with severity levels
- File Validation: Secure handling with size and type restrictions
Development
Running Tests
go test -v
Running Linter
./run_lint.sh
Formatting Code
./run_format.sh
Recent Changes
- Time Range Filtering: Added time range filtering to
gemini_search
tool withstart_time
andend_time
parameters to filter search results by publication date - Improved Model Management: Enhanced model handling with preference-based organization, filtering of embedding/visual models, and preservation of custom descriptions
- Model Task Preferences: Added model recommendations for specific tasks (thinking, caching, search)
- Advanced Usage Examples: Added documentation for combining file attachments with caching for programming tasks
- File Context Optimizations: Improved handling of file content with caching for more efficient follow-up queries
- Model Display Organization: Reorganized model output to prioritize recommended models and newer versions
- Thinking Budget Control: Added configurable thinking budget levels and explicit token control for fine-tuning reasoning depth
- Model Selection for Search: Added support for custom model selection in the
gemini_search
tool - Enhanced Thinking Mode Support: Added thinking capability across compatible models, enabling more detailed reasoning processes
- Conflict Management: Improved handling of caching and thinking mode interactions to prevent conflicts
- Context Window Sizing: Better management of token limits with automatic adjustments for model capabilities
- Advanced Model Selection: Enhanced dynamic model validation and selection based on requested capabilities
- Improved Error Handling: Better error messages and logging for troubleshooting API interactions
- Code Optimization: Removed unnecessary whitespace and improved formatting for better maintainability
- Dynamic Model Fetching: Automatic retrieval of available Gemini models at startup
- Enhanced Client Integration: Added configuration guides for MCP clients
- Expanded Model Support: Updated compatibility with latest Gemini 2.5 Pro and 2.0 Flash models
- Search Capabilities: Added Google Search integration with source attribution
- Improved File Handling: Enhanced MIME detection and validation
- Caching Enhancements: Better support for models with version suffixes
License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request