Fast-Whisper-MCP-Server
5
The Whisper Speech Recognition MCP Server provides efficient and high-performance audio transcription using the Faster Whisper model. It supports multiple model sizes, batch processing, and various output formats while utilizing CUDA acceleration for improved speed.
Whisper Speech Recognition MCP Server
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
Features
- Integrated with Faster Whisper for efficient speech recognition
- Batch processing acceleration for improved transcription speed
- Automatic CUDA acceleration (if available)
- Support for multiple model sizes (tiny to large-v3)
- Output formats include VTT subtitles, SRT, and JSON
- Support for batch transcription of audio files in a folder
- Model instance caching to avoid repeated loading
- Dynamic batch size adjustment based on GPU memory
Usage
Starting the Server
On Windows, simply run the batch file. On other platforms, execute the Python script.
Configuring Claude Desktop
Configure the Claude Desktop configuration file to link with the Whisper MCP server.
Available Tools
- get_model_info - Retrieve available Whisper models information
- transcribe - Transcribe a single audio file
- batch_transcribe - Batch transcribe audio files in a folder
Error Handling
The server provides mechanisms for handling errors such as audio file checks, model loading failures, and GPU memory management.