openai-ocr-mcp
OpenAI OCR MCP Server is a Model Context Protocol server designed to leverage OpenAI's vision capabilities for OCR tasks. It integrates with Cursor IDE for efficient text extraction from images, providing features like content-based file management and robust error handling.
OpenAI OCR MCP Server
A Model Context Protocol (MCP) server that provides Optical Character Recognition (OCR) functionality using OpenAI's vision capabilities. This server integrates with Cursor IDE for seamless text extraction from images.
Features
- Image Text Extraction using OpenAI's GPT-4.1-mini vision model.
- Automatic Text File Creation saves extracted text alongside the source image.
- Content-Based File Naming for organized file management.
- Support for multiple image formats including JPG, PNG, GIF, and WebP.
- Robust Error Handling and detailed logging for troubleshooting.
Technical Details
The server uses OpenAI's GPT-4.1-mini model optimized for text extraction, with high-detail image analysis, processing images through OpenAI's vision API. It supports automatic text file creation, content-based hash generation, and built-in file size validation.
Usage
In Cursor IDE, configure the MCP server and use the OCR tool through Cursor's command palette. Select an image file to process. The extracted text is displayed in Cursor and saved as a text file.
File Size Limits
Max file size: 5MB. Files over this limit get rejected with an error.
Error Handling
The server provides detailed error messages for invalid image formats, file size issues, file access problems, API key issues, and text extraction failures.