GeminiMCP

GeminiMCP

3.4

Gemini MCP Server is an integration of the Model Control Protocol with Google's Gemini API, offering advanced capabilities like dynamic model access, caching, and seamless file integration. It is designed for reliable performance and ease of use in a variety of applications.

Gemini MCP Server

MCP (Model Control Protocol) server integrating with Google's Gemini API.

Key Advantages

  • Single Self-Contained Binary: Written in Go, the project compiles to a single binary with no dependencies, eliminating package manager issues and preventing unexpected changes without user knowledge
  • Dynamic Model Access: Automatically fetches the latest available Gemini models at startup
  • Advanced Context Handling: Efficient caching system with TTL control for repeated queries
  • Enhanced File Handling: Seamless file integration with intelligent MIME detection
  • Production Reliability: Robust error handling, automatic retries, and graceful degradation
  • Comprehensive Capabilities: Full support for code analysis, general queries, and search with grounding

Installation and Configuration

Prerequisites

  • Google Gemini API key

Building from Source

## Clone and build
git clone https://github.com/chew-z/GeminiMCP
cd GeminiMCP
go build -o mcp-gemini

## Start server with environment variables
export GEMINI_API_KEY=your_api_key
export GEMINI_MODEL=gemini-1.5-pro
./mcp-gemini

Client Configuration

Add this server to any MCP-compatible client like Claude Desktop by adding to your client's configuration:

{
    "gemini": {
        "command": "/Users/<user>/Path/to/bin/mcp-gemini",
        "env": {
            "GEMINI_API_KEY": "YOUR_API_KEY_HERE",
            "GEMINI_MODEL": "gemini-2.5-pro-exp-03-25",
            "GEMINI_SEARCH_MODEL": "gemini-2.5-flash-preview-04-17",
            "GEMINI_SYSTEM_PROMPT": "You are a senior developer. Your job is to do a thorough code review of this code...",
            "GEMINI_SEARCH_SYSTEM_PROMPT": "You are a search assistant. Your job is to find the most relevant information about this topic..."
        }
    }
}

Important Notes:

  • Environment Variables: For Claude Desktop app all configuration variables must be included in the MCP configuration JSON shown above (in the env section), not as system environment variables or in .env files. Variables set outside the config JSON will not take effect for the client application.

  • Claude Desktop Config Location:

    • On macOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json
    • On Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Configuration Help: If you encounter any issues configuring the Claude desktop app, refer to the MCP Quickstart Guide for additional assistance.

Using this MCP server from Claude Desktop app

You can use Gemini tools directly from an LLM console by creating prompt examples that invoke the tools. Here are some example prompts for different use cases:

Listing Available Models

Say to your LLM:

Please use the gemini_models tool to show me the list of available Gemini models.

The LLM will invoke the gemini_models tool and return the list of available models, organized by preference and capability. The output prioritizes recommended models for specific tasks, then organizes remaining models by version (newest to oldest).

Code Analysis with gemini_ask

Say to your LLM:

Use the gemini_ask tool to analyze this Go code for potential concurrency issues:

func processItems(items []string) {
    var wg sync.WaitGroup
    results := make([]string, len(items))

    for i, item := range items {
        wg.Add(1)
        go func(i int, item string) {
            results[i] = processItem(item)
            wg.Done()
        }(i, item)
    }

    wg.Wait()
    return results
}

Please use a system prompt that focuses on code review and performance optimization.

Creative Writing with gemini_ask

Say to your LLM:

Use the gemini_ask tool to create a short story about a space explorer discovering a new planet. Set a custom system prompt that encourages creative, descriptive writing with vivid imagery.

Factual Research with gemini_search

Say to your LLM:

Use the gemini_search tool to find the latest information about advancements in fusion energy research from the past year. Set the start_time to one year ago and end_time to today. Include sources in your response.

Complex Reasoning with Thinking Mode

Say to your LLM:

Use the gemini_ask tool with a thinking-capable model to solve this algorithmic problem:

"Given an array of integers, find the longest consecutive sequence of integers. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so return 4."

Enable thinking mode with a high budget level so I can see the detailed step-by-step reasoning process.

This will show both the final answer and the model's comprehensive reasoning process with maximum detail.

Simple Project Analysis with Caching

Say to your LLM:

Please use a caching-enabled Gemini model to analyze our project files. Include the main.go, config.go and models.go files and ask Gemini a series of questions about our project architecture and how it could be improved. Use appropriate system prompts for each question.

With this simple prompt, the LLM will:

  • Select a caching-compatible model (with -001 suffix)
  • Include the specified project files
  • Enable caching automatically
  • Ask multiple questions while maintaining context
  • Customize system prompts for each question type

This approach makes it easy to have an extended conversation about your codebase without complex configuration.

Combined File Attachments with Caching

For programming tasks, you can directly use the file attachments feature with caching to create a more efficient workflow:

Use gemini_ask with model gemini-2.0-flash-001 to analyze these Go files. Please add both structs.go and models.go to the context, enable caching with a 30-minute TTL, and ask about how the model management system works in this application.

The server has special optimizations for this use case, particularly useful when:

  • Working with complex codebases requiring multiple files for context
  • Planning to ask follow-up questions about the same code
  • Debugging issues that require file context
  • Code review scenarios discussing implementation details

When combining file attachments with caching, files are analyzed once and stored in the cache, making subsequent queries much faster and more cost-effective.

Managing Multiple Caches and Reducing Costs

During a conversation, you can create and use multiple caches for different sets of files or contexts:

Please create a new cache for our frontend code (App.js, components/.js) and analyze it separately from our backend code cache we created earlier.*

The LLM can intelligently manage these different caches, switching between them as needed based on your queries. This capability is particularly valuable for projects with distinct components that require different analysis approaches.

Cost Savings: Using caching significantly reduces API costs, especially when working with large codebases or having extended conversations. By caching the context:

  • Files are processed and tokenized only once instead of with every query
  • Follow-up questions reuse the existing context instead of creating new API requests
  • Complex analyses can be performed incrementally without re-uploading files
  • Multi-session analysis becomes more economical, with some users reporting 40-60% cost reductions for extended code reviews

Customizing System Prompts

The gemini_ask and gemini_search tools are highly versatile and not limited to programming-related queries. You can customize the system prompt for various use cases:

  • Educational content: "You are an expert teacher who explains complex concepts in simple terms..."
  • Creative writing: "You are a creative writer specializing in vivid, engaging narratives..."
  • Technical documentation: "You are a technical writer creating clear, structured documentation..."
  • Data analysis: "You are a data scientist analyzing patterns and trends in information..."

When using these tools from an LLM console, always encourage the LLM to set appropriate system prompts and parameters for the specific use case. The flexibility of system prompts allows these tools to be effective for virtually any type of query.

Detailed Documentation

Available Tools

The server provides three primary tools:

1. gemini_ask

For code analysis, general queries, and creative tasks with optional file context.

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Review this Go code for concurrency issues...",
        "model": "gemini-2.0-flash-001",
        "systemPrompt": "Optional custom instructions",
        "file_paths": ["main.go", "config.go"],
        "use_cache": true,
        "cache_ttl": "1h"
    }
}

Simple code analysis with file attachments:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Analyze this code and suggest improvements",
        "model": "gemini-2.5-pro-exp-03-25",
        "file_paths": ["models.go"]
    }
}

Combining file attachments with caching for repeated queries:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Explain the main data structures in these files and how they interact",
        "model": "gemini-2.0-flash-001",
        "file_paths": ["models.go", "structs.go"],
        "use_cache": true,
        "cache_ttl": "30m"
    }
}
2. gemini_search

Provides grounded answers using Google Search integration with enhanced model capabilities.

{
    "name": "gemini_search",
    "arguments": {
        "query": "What is the current population of Warsaw, Poland?",
        "systemPrompt": "Optional custom search instructions",
        "enable_thinking": true,
        "thinking_budget": 8192,
        "thinking_budget_level": "medium",
        "max_tokens": 4096,
        "model": "gemini-2.5-pro-exp-03-25",
        "start_time": "2024-01-01T00:00:00Z",
        "end_time": "2024-12-31T23:59:59Z"
    }
}

Returns structured responses with sources and optional thinking process:

{
    "answer": "Detailed answer text based on search results...",
    "thinking": "Optional detailed reasoning process when thinking mode is enabled",
    "sources": [
        {
            "title": "Source Title",
            "url": "https://example.com/source-page",
            "type": "web"
        }
    ],
    "search_queries": ["population Warsaw Poland 2025"]
}
3. gemini_models

Lists all available Gemini models with capabilities and caching support.

{
    "name": "gemini_models",
    "arguments": {}
}

Returns comprehensive model information including:

  • Complete list of available models (dynamically fetched at startup)
  • Model IDs and descriptions
  • Caching support status
  • Usage examples

Model Management

The server dynamically fetches available Gemini models from the Google API at startup, preserving pre-defined descriptions and filtering out non-relevant models like embedding and visual models. Models are organized by preference and capability:

Recommended Models for Specific Tasks
Model IDDescriptionRecommended For
gemini-2.5-pro-exp-03-25Advanced Pro model with superior thinking supportComplex reasoning with thinking mode
gemini-2.0-flash-001Cacheable Flash model optimized for repeated tasksProgramming tasks with caching
gemini-2.5-flash-preview-04-17Fast Flash model with excellent search capabilitiesSearch queries and web browsing

Models are organized by preference first, then by version (newest to oldest) when displayed in the gemini_models tool output. Use the gemini_models tool for a complete, up-to-date list.

Caching System

The server offers sophisticated context caching:

  • Model Compatibility: Only models with version suffixes (e.g., -001) support caching
  • Cache Control: Set use_cache: true and specify cache_ttl (e.g., "10m", "2h")
  • File Association: Automatically stores files and associates with cache context
  • Performance Optimization: Local metadata caching for quick lookups

Example with caching:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Follow up on our previous discussion...",
        "model": "gemini-1.5-pro-001",
        "use_cache": true,
        "cache_ttl": "1h"
    }
}

File Handling

Robust file processing with:

  • Direct Path Integration: Simply specify local file paths in file_paths array
  • Automatic Validation: Size checking, MIME type detection, and content validation
  • Wide Format Support: Handles common code, text, and document formats
  • Metadata Caching: Stores file information for quick future reference

Advanced Features

Thinking Mode

The server supports "thinking mode" for compatible models (primarily Gemini 2.5 Pro models):

  • Enhanced Reasoning: Shows the model's step-by-step reasoning process
  • Complex Problem Solving: Particularly useful for debugging, mathematical reasoning, and complex analysis
  • Model Compatibility: Automatically validates thinking capability based on requested model
  • Tool Support: Available in both gemini_ask and gemini_search tools
  • Configurable Budget: Control thinking depth with budget levels or explicit token counts

Example with thinking mode:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Analyze the algorithmic complexity of merge sort vs. quick sort",
        "model": "gemini-2.5-pro-exp-03-25",
        "enable_thinking": true,
        "thinking_budget_level": "high"
    }
}
Thinking Budget Control

Configure the depth and detail of the model's thinking process:

  • Predefined Budget Levels:

    • none: 0 tokens (thinking disabled)
    • low: 4096 tokens (default, quick analysis)
    • medium: 16384 tokens (detailed reasoning)
    • high: 24576 tokens (maximum depth for complex problems)
  • Custom Token Budget: Alternatively, set a specific token count with thinking_budget parameter (0-24576)

Examples:

// Using predefined level
{
  "name": "gemini_ask",
  "arguments": {
    "query": "Analyze this algorithm...",
    "model": "gemini-2.5-pro-exp-03-25",
    "enable_thinking": true,
    "thinking_budget_level": "medium"
  }
}

// Using explicit token count
{
  "name": "gemini_search",
  "arguments": {
    "query": "Research quantum computing developments...",
    "model": "gemini-2.5-pro-exp-03-25",
    "enable_thinking": true,
    "thinking_budget": 12000
  }
}
Context Window Size Management

The server intelligently manages token limits:

  • Custom Sizing: Set max_tokens parameter to control response length
  • Model-Aware Defaults: Automatically sets appropriate defaults based on model capabilities
  • Capacity Warnings: Provides warnings when requested tokens exceed model limits
  • Proportional Defaults: Uses percentage-based defaults (75% for general queries, 50% for search)

Example with context window size management:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Generate a detailed analysis of this code...",
        "model": "gemini-1.5-pro-001",
        "max_tokens": 8192
    }
}

Configuration Options

Essential Environment Variables
VariableDescriptionDefault
GEMINI_API_KEYGoogle Gemini API keyRequired
GEMINI_MODELDefault model ID for gemini_askgemini-1.5-pro
GEMINI_SEARCH_MODELDefault model ID for gemini_searchgemini-2.0-flash
GEMINI_SYSTEM_PROMPTSystem prompt for general queriesCustom review prompt
GEMINI_SEARCH_SYSTEM_PROMPTSystem prompt for searchCustom search prompt
GEMINI_MAX_FILE_SIZEMax upload size (bytes)10485760 (10MB)
GEMINI_ALLOWED_FILE_TYPESComma-separated MIME types[Common text/code types]
Optimization Variables
VariableDescriptionDefault
GEMINI_TIMEOUTAPI timeout in seconds90
GEMINI_MAX_RETRIESMax API retries2
GEMINI_TEMPERATUREModel temperature (0.0-1.0)0.4
GEMINI_ENABLE_CACHINGEnable context cachingtrue
GEMINI_DEFAULT_CACHE_TTLDefault cache time-to-live1h
GEMINI_ENABLE_THINKINGEnable thinking mode capabilitytrue
GEMINI_THINKING_BUDGET_LEVELDefault thinking budget level (none/low/medium/high)low
GEMINI_THINKING_BUDGETExplicit thinking token budget (0-24576)4096

Operational Features

  • Degraded Mode: Automatically enters safe mode on initialization errors
  • Retry Logic: Configurable exponential backoff for reliable API communication
  • Structured Logging: Comprehensive event logging with severity levels
  • File Validation: Secure handling with size and type restrictions

Development

Running Tests

go test -v

Running Linter

./run_lint.sh

Formatting Code

./run_format.sh

Recent Changes

  • Time Range Filtering: Added time range filtering to gemini_search tool with start_time and end_time parameters to filter search results by publication date
  • Improved Model Management: Enhanced model handling with preference-based organization, filtering of embedding/visual models, and preservation of custom descriptions
  • Model Task Preferences: Added model recommendations for specific tasks (thinking, caching, search)
  • Advanced Usage Examples: Added documentation for combining file attachments with caching for programming tasks
  • File Context Optimizations: Improved handling of file content with caching for more efficient follow-up queries
  • Model Display Organization: Reorganized model output to prioritize recommended models and newer versions
  • Thinking Budget Control: Added configurable thinking budget levels and explicit token control for fine-tuning reasoning depth
  • Model Selection for Search: Added support for custom model selection in the gemini_search tool
  • Enhanced Thinking Mode Support: Added thinking capability across compatible models, enabling more detailed reasoning processes
  • Conflict Management: Improved handling of caching and thinking mode interactions to prevent conflicts
  • Context Window Sizing: Better management of token limits with automatic adjustments for model capabilities
  • Advanced Model Selection: Enhanced dynamic model validation and selection based on requested capabilities
  • Improved Error Handling: Better error messages and logging for troubleshooting API interactions
  • Code Optimization: Removed unnecessary whitespace and improved formatting for better maintainability
  • Dynamic Model Fetching: Automatic retrieval of available Gemini models at startup
  • Enhanced Client Integration: Added configuration guides for MCP clients
  • Expanded Model Support: Updated compatibility with latest Gemini 2.5 Pro and 2.0 Flash models
  • Search Capabilities: Added Google Search integration with source attribution
  • Improved File Handling: Enhanced MIME detection and validation
  • Caching Enhancements: Better support for models with version suffixes

License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request