mcp-server-whisper
MCP Server Whisper is an advanced audio processing server implementing the Model Context Protocol, enabling seamless interaction with AI tools using OpenAI's models. It supports extensive audio file management, transcription, and text-to-speech capabilities.
Overview
MCP Server Whisper offers standardized audio processing through OpenAI's transcription and speech services, facilitating seamless integration with AI assistants like Claude. Key features include:
- Advanced file searching with metadata filtering and sorting
- Parallel batch processing for audio files
- Format conversion between audio types
- Automatic file compression
- Multi-model transcription and interactive audio chat
- Enhanced transcription with specialized prompts and timestamp support
- Text-to-speech generation with customizable settings
- High-performance caching for operations
Installation and Usage
Clone the repository, set up the environment variables, and run the MCP server in development mode. Use the server for tasks like audio file management, processing, transcription, and text-to-speech generation.
Configuration and Development
Configure with Claude Desktop and utilize modern Python development tools for testing and building.
License
MIT License.