README - Gemini MCP Server by chew-z

Gemini MCP Server

MCP (Model Control Protocol) server integrating with Google's Gemini API.

Key Advantages

Single Self-Contained Binary: Written in Go, the project compiles to a single binary with no dependencies, eliminating package manager issues and preventing unexpected changes without user knowledge
Dynamic Model Access: Automatically fetches the latest available Gemini models at startup
Advanced Context Handling: Efficient caching system with TTL control for repeated queries
Enhanced File Handling: Seamless file integration with intelligent MIME detection
Production Reliability: Robust error handling, automatic retries, and graceful degradation
Comprehensive Capabilities: Full support for code analysis, general queries, and search with grounding

Installation and Configuration

Prerequisites

Google Gemini API key

Building from Source

## Clone and build
git clone https://github.com/chew-z/GeminiMCP
cd GeminiMCP
go build -o mcp-gemini

## Start server with environment variables
export GEMINI_API_KEY=your_api_key
export GEMINI_MODEL=gemini-1.5-pro
./mcp-gemini

Client Configuration

Add this server to any MCP-compatible client like Claude Desktop by adding to your client's configuration:

{
    "gemini": {
        "command": "/Users/<user>/Path/to/bin/mcp-gemini",
        "env": {
            "GEMINI_API_KEY": "YOUR_API_KEY_HERE",
            "GEMINI_MODEL": "gemini-2.5-pro-exp-03-25",
            "GEMINI_SEARCH_MODEL": "gemini-2.5-flash-preview-04-17",
            "GEMINI_SYSTEM_PROMPT": "You are a senior developer. Your job is to do a thorough code review of this code...",
            "GEMINI_SEARCH_SYSTEM_PROMPT": "You are a search assistant. Your job is to find the most relevant information about this topic..."
        }
    }
}

Important Notes:

Environment Variables: For Claude Desktop app all configuration variables must be included in the MCP configuration JSON shown above (in the env section), not as system environment variables or in .env files. Variables set outside the config JSON will not take effect for the client application.
Claude Desktop Config Location:
- On macOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json
- On Windows: %APPDATA%\Claude\claude_desktop_config.json
Configuration Help: If you encounter any issues configuring the Claude desktop app, refer to the MCP Quickstart Guide for additional assistance.

Using this MCP server from Claude Desktop app

You can use Gemini tools directly from an LLM console by creating prompt examples that invoke the tools. Here are some example prompts for different use cases:

Listing Available Models

Say to your LLM:

Please use the gemini_models tool to show me the list of available Gemini models.

The LLM will invoke the gemini_models tool and return the list of available models, organized by preference and capability. The output prioritizes recommended models for specific tasks, then organizes remaining models by version (newest to oldest).

Code Analysis with `gemini_ask`

Say to your LLM:

Use the gemini_ask tool to analyze this Go code for potential concurrency issues:
func processItems(items []string) {
    var wg sync.WaitGroup
    results := make([]string, len(items))

    for i, item := range items {
        wg.Add(1)
        go func(i int, item string) {
            results[i] = processItem(item)
            wg.Done()
        }(i, item)
    }

    wg.Wait()
    return results
}
Please use a system prompt that focuses on code review and performance optimization.

Creative Writing with `gemini_ask`

Say to your LLM:

Use the gemini_ask tool to create a short story about a space explorer discovering a new planet. Set a custom system prompt that encourages creative, descriptive writing with vivid imagery.

Factual Research with `gemini_search`

Say to your LLM:

Use the gemini_search tool to find the latest information about advancements in fusion energy research from the past year. Set the start_time to one year ago and end_time to today. Include sources in your response.

Complex Reasoning with Thinking Mode

Say to your LLM:

Use the gemini_ask tool with a thinking-capable model to solve this algorithmic problem:

"Given an array of integers, find the longest consecutive sequence of integers. For example, given [100, 4, 200, 1, 3, 2], the longest consecutive sequence is [1, 2, 3, 4], so return 4."

Enable thinking mode with a high budget level so I can see the detailed step-by-step reasoning process.

This will show both the final answer and the model's comprehensive reasoning process with maximum detail.

Simple Project Analysis with Caching

Say to your LLM:

Please use a caching-enabled Gemini model to analyze our project files. Include the main.go, config.go and models.go files and ask Gemini a series of questions about our project architecture and how it could be improved. Use appropriate system prompts for each question.

With this simple prompt, the LLM will:

Select a caching-compatible model (with -001 suffix)
Include the specified project files
Enable caching automatically
Ask multiple questions while maintaining context
Customize system prompts for each question type

This approach makes it easy to have an extended conversation about your codebase without complex configuration.

Combined File Attachments with Caching

For programming tasks, you can directly use the file attachments feature with caching to create a more efficient workflow:

Use gemini_ask with model gemini-2.0-flash-001 to analyze these Go files. Please add both structs.go and models.go to the context, enable caching with a 30-minute TTL, and ask about how the model management system works in this application.

The server has special optimizations for this use case, particularly useful when:

Working with complex codebases requiring multiple files for context
Planning to ask follow-up questions about the same code
Debugging issues that require file context
Code review scenarios discussing implementation details

When combining file attachments with caching, files are analyzed once and stored in the cache, making subsequent queries much faster and more cost-effective.

Managing Multiple Caches and Reducing Costs

During a conversation, you can create and use multiple caches for different sets of files or contexts:

Please create a new cache for our frontend code (App.js, components/.js) and analyze it separately from our backend code cache we created earlier.*

The LLM can intelligently manage these different caches, switching between them as needed based on your queries. This capability is particularly valuable for projects with distinct components that require different analysis approaches.

Cost Savings: Using caching significantly reduces API costs, especially when working with large codebases or having extended conversations. By caching the context:

Files are processed and tokenized only once instead of with every query
Follow-up questions reuse the existing context instead of creating new API requests
Complex analyses can be performed incrementally without re-uploading files
Multi-session analysis becomes more economical, with some users reporting 40-60% cost reductions for extended code reviews

Customizing System Prompts

The gemini_ask and gemini_search tools are highly versatile and not limited to programming-related queries. You can customize the system prompt for various use cases:

Educational content: "You are an expert teacher who explains complex concepts in simple terms..."
Creative writing: "You are a creative writer specializing in vivid, engaging narratives..."
Technical documentation: "You are a technical writer creating clear, structured documentation..."
Data analysis: "You are a data scientist analyzing patterns and trends in information..."

When using these tools from an LLM console, always encourage the LLM to set appropriate system prompts and parameters for the specific use case. The flexibility of system prompts allows these tools to be effective for virtually any type of query.

Detailed Documentation

Available Tools

The server provides three primary tools:

1. `gemini_ask`

For code analysis, general queries, and creative tasks with optional file context.

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Review this Go code for concurrency issues...",
        "model": "gemini-2.0-flash-001",
        "systemPrompt": "Optional custom instructions",
        "file_paths": ["main.go", "config.go"],
        "use_cache": true,
        "cache_ttl": "1h"
    }
}

Simple code analysis with file attachments:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Analyze this code and suggest improvements",
        "model": "gemini-2.5-pro-exp-03-25",
        "file_paths": ["models.go"]
    }
}

Combining file attachments with caching for repeated queries:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Explain the main data structures in these files and how they interact",
        "model": "gemini-2.0-flash-001",
        "file_paths": ["models.go", "structs.go"],
        "use_cache": true,
        "cache_ttl": "30m"
    }
}

2. `gemini_search`

Provides grounded answers using Google Search integration with enhanced model capabilities.

{
    "name": "gemini_search",
    "arguments": {
        "query": "What is the current population of Warsaw, Poland?",
        "systemPrompt": "Optional custom search instructions",
        "enable_thinking": true,
        "thinking_budget": 8192,
        "thinking_budget_level": "medium",
        "max_tokens": 4096,
        "model": "gemini-2.5-pro-exp-03-25",
        "start_time": "2024-01-01T00:00:00Z",
        "end_time": "2024-12-31T23:59:59Z"
    }
}

Returns structured responses with sources and optional thinking process:

{
    "answer": "Detailed answer text based on search results...",
    "thinking": "Optional detailed reasoning process when thinking mode is enabled",
    "sources": [
        {
            "title": "Source Title",
            "url": "https://example.com/source-page",
            "type": "web"
        }
    ],
    "search_queries": ["population Warsaw Poland 2025"]
}

3. `gemini_models`

Lists all available Gemini models with capabilities and caching support.

{
    "name": "gemini_models",
    "arguments": {}
}

Returns comprehensive model information including:

Complete list of available models (dynamically fetched at startup)
Model IDs and descriptions
Caching support status
Usage examples

Model Management

The server dynamically fetches available Gemini models from the Google API at startup, preserving pre-defined descriptions and filtering out non-relevant models like embedding and visual models. Models are organized by preference and capability:

Recommended Models for Specific Tasks

Model ID	Description	Recommended For
`gemini-2.5-pro-exp-03-25`	Advanced Pro model with superior thinking support	Complex reasoning with thinking mode
`gemini-2.0-flash-001`	Cacheable Flash model optimized for repeated tasks	Programming tasks with caching
`gemini-2.5-flash-preview-04-17`	Fast Flash model with excellent search capabilities	Search queries and web browsing

Models are organized by preference first, then by version (newest to oldest) when displayed in the gemini_models tool output. Use the gemini_models tool for a complete, up-to-date list.

Caching System

The server offers sophisticated context caching:

Model Compatibility: Only models with version suffixes (e.g., -001) support caching
Cache Control: Set use_cache: true and specify cache_ttl (e.g., "10m", "2h")
File Association: Automatically stores files and associates with cache context
Performance Optimization: Local metadata caching for quick lookups

Example with caching:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Follow up on our previous discussion...",
        "model": "gemini-1.5-pro-001",
        "use_cache": true,
        "cache_ttl": "1h"
    }
}

File Handling

Robust file processing with:

Direct Path Integration: Simply specify local file paths in file_paths array
Automatic Validation: Size checking, MIME type detection, and content validation
Wide Format Support: Handles common code, text, and document formats
Metadata Caching: Stores file information for quick future reference

Advanced Features

Thinking Mode

The server supports "thinking mode" for compatible models (primarily Gemini 2.5 Pro models):

Enhanced Reasoning: Shows the model's step-by-step reasoning process
Complex Problem Solving: Particularly useful for debugging, mathematical reasoning, and complex analysis
Model Compatibility: Automatically validates thinking capability based on requested model
Tool Support: Available in both gemini_ask and gemini_search tools
Configurable Budget: Control thinking depth with budget levels or explicit token counts

Example with thinking mode:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Analyze the algorithmic complexity of merge sort vs. quick sort",
        "model": "gemini-2.5-pro-exp-03-25",
        "enable_thinking": true,
        "thinking_budget_level": "high"
    }
}

Thinking Budget Control

Configure the depth and detail of the model's thinking process:

Predefined Budget Levels:
- none: 0 tokens (thinking disabled)
- low: 4096 tokens (default, quick analysis)
- medium: 16384 tokens (detailed reasoning)
- high: 24576 tokens (maximum depth for complex problems)
Custom Token Budget: Alternatively, set a specific token count with thinking_budget parameter (0-24576)

Examples:

// Using predefined level
{
  "name": "gemini_ask",
  "arguments": {
    "query": "Analyze this algorithm...",
    "model": "gemini-2.5-pro-exp-03-25",
    "enable_thinking": true,
    "thinking_budget_level": "medium"
  }
}

// Using explicit token count
{
  "name": "gemini_search",
  "arguments": {
    "query": "Research quantum computing developments...",
    "model": "gemini-2.5-pro-exp-03-25",
    "enable_thinking": true,
    "thinking_budget": 12000
  }
}

Context Window Size Management

The server intelligently manages token limits:

Custom Sizing: Set max_tokens parameter to control response length
Model-Aware Defaults: Automatically sets appropriate defaults based on model capabilities
Capacity Warnings: Provides warnings when requested tokens exceed model limits
Proportional Defaults: Uses percentage-based defaults (75% for general queries, 50% for search)

Example with context window size management:

{
    "name": "gemini_ask",
    "arguments": {
        "query": "Generate a detailed analysis of this code...",
        "model": "gemini-1.5-pro-001",
        "max_tokens": 8192
    }
}

Configuration Options

Essential Environment Variables

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key	Required
`GEMINI_MODEL`	Default model ID for `gemini_ask`	`gemini-1.5-pro`
`GEMINI_SEARCH_MODEL`	Default model ID for `gemini_search`	`gemini-2.0-flash`
`GEMINI_SYSTEM_PROMPT`	System prompt for general queries	Custom review prompt
`GEMINI_SEARCH_SYSTEM_PROMPT`	System prompt for search	Custom search prompt
`GEMINI_MAX_FILE_SIZE`	Max upload size (bytes)	`10485760` (10MB)
`GEMINI_ALLOWED_FILE_TYPES`	Comma-separated MIME types	[Common text/code types]

Optimization Variables

Variable	Description	Default
`GEMINI_TIMEOUT`	API timeout in seconds	`90`
`GEMINI_MAX_RETRIES`	Max API retries	`2`
`GEMINI_TEMPERATURE`	Model temperature (0.0-1.0)	`0.4`
`GEMINI_ENABLE_CACHING`	Enable context caching	`true`
`GEMINI_DEFAULT_CACHE_TTL`	Default cache time-to-live	`1h`
`GEMINI_ENABLE_THINKING`	Enable thinking mode capability	`true`
`GEMINI_THINKING_BUDGET_LEVEL`	Default thinking budget level (none/low/medium/high)	`low`
`GEMINI_THINKING_BUDGET`	Explicit thinking token budget (0-24576)	`4096`

Operational Features

Degraded Mode: Automatically enters safe mode on initialization errors
Retry Logic: Configurable exponential backoff for reliable API communication
Structured Logging: Comprehensive event logging with severity levels
File Validation: Secure handling with size and type restrictions

Development

Running Tests

go test -v

Running Linter

./run_lint.sh

Formatting Code

./run_format.sh

Recent Changes

Time Range Filtering: Added time range filtering to gemini_search tool with start_time and end_time parameters to filter search results by publication date
Improved Model Management: Enhanced model handling with preference-based organization, filtering of embedding/visual models, and preservation of custom descriptions
Model Task Preferences: Added model recommendations for specific tasks (thinking, caching, search)
Advanced Usage Examples: Added documentation for combining file attachments with caching for programming tasks
File Context Optimizations: Improved handling of file content with caching for more efficient follow-up queries
Model Display Organization: Reorganized model output to prioritize recommended models and newer versions
Thinking Budget Control: Added configurable thinking budget levels and explicit token control for fine-tuning reasoning depth
Model Selection for Search: Added support for custom model selection in the gemini_search tool
Enhanced Thinking Mode Support: Added thinking capability across compatible models, enabling more detailed reasoning processes
Conflict Management: Improved handling of caching and thinking mode interactions to prevent conflicts
Context Window Sizing: Better management of token limits with automatic adjustments for model capabilities
Advanced Model Selection: Enhanced dynamic model validation and selection based on requested capabilities
Improved Error Handling: Better error messages and logging for troubleshooting API interactions
Code Optimization: Removed unnecessary whitespace and improved formatting for better maintainability
Dynamic Model Fetching: Automatic retrieval of available Gemini models at startup
Enhanced Client Integration: Added configuration guides for MCP clients
Expanded Model Support: Updated compatibility with latest Gemini 2.5 Pro and 2.0 Flash models
Search Capabilities: Added Google Search integration with source attribution
Improved File Handling: Enhanced MIME detection and validation
Caching Enhancements: Better support for models with version suffixes

License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the project
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

GeminiMCP

Gemini MCP Server

Key Advantages

Installation and Configuration

Prerequisites

Building from Source

Client Configuration

Using this MCP server from Claude Desktop app

Listing Available Models

Code Analysis with gemini_ask

Creative Writing with gemini_ask

Factual Research with gemini_search

Complex Reasoning with Thinking Mode

Simple Project Analysis with Caching

Combined File Attachments with Caching

Managing Multiple Caches and Reducing Costs

Customizing System Prompts

Detailed Documentation

Available Tools

1. gemini_ask

2. gemini_search

3. gemini_models

Model Management

Recommended Models for Specific Tasks

Caching System

File Handling

Advanced Features

Thinking Mode

Thinking Budget Control

Context Window Size Management

Configuration Options

Essential Environment Variables

Optimization Variables

Operational Features

Development

Running Tests

Running Linter

Formatting Code

Recent Changes

License

Contributing

Code Analysis with `gemini_ask`

Creative Writing with `gemini_ask`

Factual Research with `gemini_search`

1. `gemini_ask`

2. `gemini_search`

3. `gemini_models`