google-research-mcp

google-research-mcp

1

The Google Researcher MCP Server enhances AI assistants with advanced web research functionality, utilizing tools like Google Search and Gemini AI for content analysis. This open-source project offers robust performance with features like persistent caching and flexible integration options.

Google Researcher MCP Server

License: MIT Node.js Version

Empower AI assistants with web research capabilities through Google Search, content scraping, and Gemini AI analysis.

This server implements the Model Context Protocol (MCP), allowing AI clients to perform research tasks with persistent caching for improved performance and reduced API costs.

Quick Start

# Clone and install
git clone <repository-url>
cd <repository-directory>
npm install

# Configure environment variables (copy .env.example to .env and fill in)
cp .env.example .env
# (Edit .env with your API keys)

# Run in development mode (auto-reloads on changes)
npm run dev

# Or build and run for production
# npm run build
# npm start

Table of Contents

Features

  • Research Tools

    • google_search: Find information via Google Search API
    • scrape_page: Extract content from websites and YouTube videos
    • analyze_with_gemini: Process text using Google's Gemini AI
    • research_topic: Combine search, scraping, and analysis in one operation
  • Performance & Reliability

    • Persistent caching system (memory + disk)
    • Session resumption for web clients
    • Multiple transport options (STDIO, HTTP+SSE)
    • Management API endpoints for monitoring and control

Why Use This?

  • Extend AI Capabilities: Give AI assistants access to real-time web information
  • Save Money: Reduce API calls through sophisticated caching
  • Improve Performance: Get faster responses for repeated queries
  • Flexible Integration: Works with any MCP-compatible client
  • Open Source: MIT licensed, free to use and modify

Installation

Requirements

Setup

  1. Clone and install:

    git clone <repository-url>
    cd <repository-directory>
    npm install
    
  2. Configure environment:

    Copy the example environment file and fill in your API keys:

    cp .env.example .env
    # Now edit the .env file with your actual keys
    

    The server automatically loads variables from the .env file if it exists. See .env.example for details on required and optional variables.

  3. Run the server:

    • Development: For development with automatic reloading on file changes:
      npm run dev
      
    • Production: Build the project and run the compiled JavaScript:
      npm run build
      npm start
      
  4. Verify: The server should show:

    ✅ stdio transport ready
    🌐 SSE server listening on http://127.0.0.1:3000/mcp
    

Usage

Available Tools

ToolDescriptionParameters
google_searchSearch the webquery (string), num_results (number, default: 5)
scrape_pageExtract content from URLsurl (string)
analyze_with_geminiProcess text with AItext (string), model (string, default: "gemini-2.0-flash-001")
research_topicCombined research workflowquery (string), num_results (number, default: 3)

Management Endpoints

  • GET /mcp/cache-stats: View cache statistics
  • GET /mcp/event-store-stats: View event store statistics
  • POST /mcp/cache-invalidate: Clear cache entries (requires mcp:admin:cache:invalidate scope)
  • POST /mcp/cache-persist: Force cache persistence (requires mcp:admin:cache:persist scope)
  • GET /mcp/oauth-scopes: View OAuth scopes documentation (public)
  • GET /mcp/oauth-config: View server OAuth configuration (public)
  • GET /mcp/oauth-token-info: View details of the provided token (requires authentication)

Security & OAuth Scopes

The server implements OAuth 2.1 authorization for secure access to its HTTP endpoints. OAuth scopes provide granular permission control:

Tool Execution Scopes
  • mcp:tool:google_search:execute: Permission to execute the Google Search tool
  • mcp:tool:scrape_page:execute: Permission to scrape web pages
  • mcp:tool:analyze_with_gemini:execute: Permission to use Gemini AI for analysis
  • mcp:tool:research_topic:execute: Permission to use the composite research tool
Administrative Scopes
  • mcp:admin:cache:read: Permission to view cache statistics
  • mcp:admin:cache:invalidate: Permission to clear cache entries
  • mcp:admin:cache:persist: Permission to force cache persistence
  • mcp:admin:event-store:read: Permission to view event store statistics
  • mcp:admin:config:read: Permission to view server configuration
  • mcp:admin:logs:read: Permission to access server logs

For detailed documentation on OAuth scopes, visit the /mcp/oauth-scopes endpoint when the server is running.

Architecture

The server uses a layered architecture with:

  1. Transport Layer: STDIO and HTTP+SSE communication
  2. MCP Core: Request handling and routing
  3. Tools Layer: Research capabilities implementation
  4. Support Systems: Caching and event store

For detailed information, see the .

Client Integration

STDIO Client (Direct Process)

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

// Create client
const transport = new StdioClientTransport({
  command: "node",
  args: ["dist/server.js"]
});
const client = new Client({ name: "test-client" });
await client.connect(transport);

// Call a tool
const result = await client.callTool({
  name: "google_search",
  arguments: { query: "MCP protocol" }
});
console.log(result.content[0].text);

HTTP+SSE Client (Web)

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

// Create client
// NOTE: The client MUST obtain a valid OAuth 2.1 Bearer token from the
// configured external Authorization Server before making requests.
const transport = new StreamableHTTPClientTransport(
  new URL("http://localhost:3000/mcp"),
  {
    // The client needs to dynamically provide the token here
    getAuthorization: async () => `Bearer YOUR_ACCESS_TOKEN`
  }
);
const client = new Client({ name: "test-client" });
await client.connect(transport);

// Call a tool
const result = await client.callTool({
  name: "google_search",
  arguments: { query: "MCP protocol" }
});
console.log(result.content[0].text);

Using with Roo Code

Note: The following example uses STDIO transport. Integrating Roo Code with the HTTP transport requires handling the OAuth 2.1 flow, which may need specific configuration within Roo Code or a proxy setup. This example needs review based on the mandatory OAuth for HTTP.

Roo Code (VS Code extension) can use this server via STDIO:

  1. Enable MCP Servers in Roo Code settings
  2. Create .roo/mcp.json in your project:
 {
  "mcpServers": {
    "google-researcher-mcp": {
      "command": "node",
      "args": ["~/Documents/Cline/MCP/google-researcher-mcp/dist/server.js"],
      "cwd": "~/Documents/Cline/MCP/google-researcher-mcp/dist/",
      "env": {
        "GOOGLE_CUSTOM_SEARCH_API_KEY": "${env:GOOGLE_CUSTOM_SEARCH_API_KEY}",
        "GOOGLE_CUSTOM_SEARCH_ID": "${env:GOOGLE_CUSTOM_SEARCH_ID}",
        "GOOGLE_GEMINI_API_KEY": "${env:GOOGLE_GEMINI_API_KEY}"
      },
      "alwaysAllow": [
        "google_search",
        "scrape_page",
        "analyze_with_gemini",
        "research_topic"
      ],
      "disabled": false
    }
  }
}
  1. Start the server and use Roo Code to ask research questions

Tests

The project uses a focused testing approach that combines end-to-end validation with targeted unit/integration tests.

Test Scripts

ScriptDescription
npm testRuns Jest tests for internal components
npm run test:e2eRuns both STDIO and SSE end-to-end tests
npm run test:e2e:stdioRuns only the STDIO end-to-end test
npm run test:e2e:sseRuns only the SSE end-to-end test
npm run test:coverageGenerates detailed coverage reports

Testing Approach

Our testing strategy has two main components:

  1. End-to-End Tests: Validate the server's overall functionality through its MCP interface:

    • e2e_stdio_mcp_client_test.mjs: Tests the server using STDIO transport
    • e2e_sse_mcp_client_test.mjs: Tests the server using HTTP+SSE transport
  2. Focused Component Tests: Jest tests for the stateful logic unique to this server:

    • Cache System: Unit and integration tests for the in-memory cache, persistence manager, and persistence strategies
    • Event Store: Unit and integration tests for the event store and event persistence manager

This approach provides comprehensive validation while keeping tests simple, focused, and fast.

Contributing

We welcome contributions! This project is open source under the MIT license.

  • Star this repo if you find it useful
  • Fork it to create your own version
  • Submit PRs for bug fixes or new features
  • Report issues if you find bugs or have suggestions

To contribute code:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Submit a pull request

License

This project is licensed under the MIT License - see the file for details.