mcp-gemini-server
The MCP Gemini Server is an MCP-compatible server that simplifies integration with Google's Gemini models for other language learning models and systems. It provides a consistent interface to leverage Gemini's capabilities, including text and image generation, audio transcription, and URL processing. Designed to be a robust backend solution, it offers extensive tools and features to enhance content processing and interaction.
Overview
This project provides a dedicated MCP (Model Context Protocol) server that wraps the @google/genai
SDK (v0.10.0). It exposes Google's Gemini model capabilities as standard MCP tools, allowing other LLMs (like Claude) or MCP-compatible systems to leverage Gemini's features as a backend workhorse. This server aims to simplify integration with Gemini models by providing a consistent, tool-based interface managed via the MCP standard. It supports the latest Gemini models including gemini-1.5-pro-latest
, gemini-1.5-flash-latest
, and gemini-2.5-pro
models.
Features
- Core Generation: Standard and streaming text generation with support for system instructions and cached content.
- Function Calling: Enables Gemini models to request client-defined functions.
- Stateful Chat: Manages conversational context across multiple turns.
- File Handling: Upload, list, retrieve, and delete files.
- Caching: Manage cached content to optimize prompts.
- Image Generation: Generate images from text prompts.
- Object Detection: Detect objects in images and return bounding box coordinates.
- Visual Content Understanding: Extract information from visual content.
- Audio Transcription: Transcribe audio files with optional timestamps and multilingual support.
- URL Context Processing: Fetch and analyze web content directly from URLs.
- MCP Client: Connect to and interact with external MCP servers.