mcp-server-whisper
20
MCP Server Whisper is an advanced audio processing server implementing the Model Context Protocol, enabling seamless interaction with AI tools using OpenAI's models. It supports extensive audio file management, transcription, and text-to-speech capabilities.
What audio formats are supported?
Supported formats include flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm for transcription, and mp3, wav for chat.
How does the server handle large files?
Files larger than 25MB are automatically compressed to meet API limits.
Can I customize the text-to-speech output?
Yes, you can customize voices, speed, and provide specific instructions for the text-to-speech generation.
What models are used for transcription?
The server supports OpenAI's whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe models.
Is there support for batch processing?
Yes, the server supports parallel batch processing for multiple audio files.