Coffee MCP Server by vijay-fs - MCP Server

Coffee MCP Server

A powerful document extraction and processing API server built with FastAPI. This server is designed to extract text, tables, and generate embeddings from documents using advanced OCR techniques. It handles documents asynchronously, making it suitable for processing large files without blocking the client.

Key Features

Asynchronous document processing
Page-by-page PDF processing
Text extraction using OCR
Table detection and extraction
Generation of text embeddings
MongoDB storage for job tracking
RESTful API with comprehensive endpoints
Background thread processing

Setup Guide

Prerequisites

Python 3.8+
MongoDB
Tesseract OCR engine
Poppler
API keys for embedding providers (OpenAI, Anthropic)

Installation Steps

Clone the repository and navigate to coffee_mcp_server.
Install system and Python dependencies.
Configure environment variables in a .env file.
Start MongoDB and run the server with uvicorn.

API Endpoints

Document Extraction API

POST /v1/extract_data: Submit a document for extraction.
GET /v1/extract_data_job: Check the status of a document extraction job.
GET /v1/extract_data_result: Retrieve the result of a completed document extraction job.

Architecture

Coffee MCP Server comprises components such as the FastAPI application, API routes, document processor, text extractor, format handlers, and embedding generator. It uses MongoDB for storage and supports real-time updates for document processing.