search-engine-with-rag-and-mcp

search-engine-with-rag-and-mcp

3.3

If you are the rightful owner of search-engine-with-rag-and-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcpreview.com.

A powerful search engine that combines LangChain, Model Context Protocol (MCP), Retrieval-Augmented Generation (RAG), and Ollama to create an agentic AI system capable of searching the web, retrieving information, and providing relevant answers.

Search Engine with RAG and MCP

A powerful search engine that combines LangChain, Model Context Protocol (MCP), Retrieval-Augmented Generation (RAG), and Ollama to create an agentic AI system capable of searching the web, retrieving information, and providing relevant answers.

Features

  • Web search capabilities using the Exa API
  • Web content retrieval using FireCrawl
  • RAG (Retrieval-Augmented Generation) for more relevant information extraction
  • MCP (Model Context Protocol) server for standardized tool invocation
  • Support for both local LLMs via Ollama and cloud-based LLMs via OpenAI
  • Flexible architecture supporting direct search, agent-based search, or server mode
  • Comprehensive error handling and graceful fallbacks
  • Python 3.13+ with type hints
  • Asynchronous processing for efficient web operations

Architecture

This project integrates several key components:

  1. Search Module: Uses Exa API to search the web and FireCrawl to retrieve content
  2. RAG Module: Embeds documents, chunks them, and stores them in a FAISS vector store
  3. MCP Server: Provides a standardized protocol for tool invocation
  4. Agent: LangChain-based agent that uses the search and RAG capabilities

Project Structure

search-engine-with-rag-and-mcp/
ā”œā”€ā”€ LICENSE              # MIT License
ā”œā”€ā”€ README.md            # Project documentation
ā”œā”€ā”€ data/                # Data directories
ā”œā”€ā”€ docs/                # Documentation
│   └── env_template.md  # Environment variables documentation
ā”œā”€ā”€ logs/                # Log files directory (auto-created)
ā”œā”€ā”€ src/                 # Main package (source code)
│   ā”œā”€ā”€ __init__.py      
│   ā”œā”€ā”€ core/            # Core functionality
│   │   ā”œā”€ā”€ __init__.py
│   │   ā”œā”€ā”€ main.py      # Main entry point
│   │   ā”œā”€ā”€ search.py    # Web search module
│   │   ā”œā”€ā”€ rag.py       # RAG implementation
│   │   ā”œā”€ā”€ agent.py     # LangChain agent
│   │   └── mcp_server.py # MCP server implementation
│   └── utils/           # Utility modules
│       ā”œā”€ā”€ __init__.py
│       ā”œā”€ā”€ env.py       # Environment variable loading
│       └── logger.py    # Logging configuration
ā”œā”€ā”€ pyproject.toml       # Poetry configuration
ā”œā”€ā”€ requirements.txt     # Project dependencies
└── tests/               # Test directory

Getting Started

Prerequisites

  • Python 3.13+
  • Poetry (optional, for development)
  • API keys for Exa and FireCrawl
  • (Optional) Ollama installed locally
  • (Optional) OpenAI API key

Installation

  1. Clone the repository
git clone https://github.com/yourusername/search-engine-with-rag-and-mcp.git
cd search-engine-with-rag-and-mcp
  1. Install dependencies
# Using pip
pip install -r requirements.txt

# Or using poetry
poetry install
  1. Create a .env file (use docs/env_template.md as a reference)

Usage

The application has three main modes of operation:

1. Direct Search Mode (Default)
# Using pip
python -m src.core.main "your search query"

# Or using poetry
poetry run python -m src.core.main "your search query"
2. Agent Mode
python -m src.core.main --agent "your search query"
3. MCP Server Mode
python -m src.core.main --server

You can also specify custom host and port:

python -m src.core.main --server --host 0.0.0.0 --port 8080

Using Ollama (Optional)

To use Ollama for local embeddings and LLM capabilities:

  1. Install Ollama: https://ollama.ai/
  2. Pull a model:
ollama pull mistral:latest
  1. Set the appropriate environment variables in your .env file:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral:latest

Development

This project follows these best practices:

  • Code formatting: Black and isort for consistent code style
  • Type checking: mypy for static type checking
  • Linting: flake8 for code quality
  • Testing: pytest for unit and integration tests
  • Environment Management: python-dotenv for managing environment variables
  • Logging: Structured logging to both console and file

License

This project is licensed under the MIT License - see the file for details.

Acknowledgements