claude-voice-mcp
Claude Voice MCP is an MCP server implementation that enhances Claude Desktop with voice conversation capabilities, focusing on text-to-speech (TTS) conversion. It supports real-time WebSocket communication and integrates with the CogentEcho.ai ecosystem.
Claude Voice MCP
⚠️ PRE-ALPHA WARNING ⚠️
This project is in pre-alpha stage. The content has been created conceptually but has not been tested. Proceed with caution as significant changes may occur before the first stable release.
MCP server implementation that enables voice conversations with Claude Desktop, initially focusing on Text-to-Speech (TTS) capabilities.
Project Overview
This project implements a Model-Centric Programming (MCP) server that extends Claude Desktop with voice conversation capabilities. The initial focus is on Text-to-Speech functionality, converting Claude's text responses into spoken audio.
Features
- MCP server implementation compatible with Claude Desktop
- Text-to-Speech conversion for Claude's responses
- WebSocket-based real-time communication
- Multiple language and voice support
- Simple test client for verification
CogentEcho.ai Ecosystem Integration
This repository is part of the CogentEcho.ai ecosystem:
- Strategic Layer: Orchestrate-AI - Strategic orchestration and business logic
- Tactical Layer: Automated-Dev-Agents - Tactical task execution and agent management
- Foundation Layer: Multi-Tiered Memory Architecture - Memory services for persistence
- Tool Manager: MCP Manager - Manages Claude MCP servers, including this one
Development Roadmap
-
Phase 1 (Current): Text-to-Speech Implementation
- Basic MCP server setup
- Text-to-Speech integration
- Configuration options for voice selection
-
Phase 2 (Future): Speech-to-Text Implementation
- Audio capture and processing
- Speech recognition integration
- Full duplex conversation support
Getting Started
Prerequisites
- Node.js 18.x or higher
- Claude Desktop application
- Web browser for testing the test client
Installation
# Clone the repository
git clone https://github.com/gregmulvihill/claude-voice-mcp.git
# Navigate to project directory
cd claude-voice-mcp
# Install dependencies
npm install
# Copy environment example and modify as needed
cp .env.example .env
# Start the server
npm start
Testing the MCP Server
The repository includes a simple test client to verify the functionality of the MCP server:
- Start the MCP server using
npm start
- Open the
test-client/index.html
file in a web browser - Connect to the MCP server using the default WebSocket URL (
ws://localhost:3000/api/v1/ws
) - Enter text and click "Generate Speech" to test the TTS functionality
Development
Branch Protection
The main branch is protected and requires pull requests with at least one approval before merging. This ensures code quality and proper review of all changes.
Development Workflow
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request for review
- Address any feedback
- Your changes will be merged after approval
Integrating with Claude Desktop
Claude Desktop supports connection to MCP servers for enhanced functionality. To integrate this voice MCP server with Claude Desktop:
Method 1: Using the Claude Desktop UI
- Open Claude Desktop
- Go to Settings > Extensions
- Click "Add MCP Server"
- Enter the server URL:
http://localhost:3000/api/v1
- Click "Connect" and follow the authentication prompts if required
Method 2: Using the Command Line
If your Claude Desktop application supports command-line installation of MCP servers:
# Run the MCP server
npm start
# In a separate terminal, use the Claude Desktop CLI to add the MCP server
claude-desktop extensions add --url=http://localhost:3000/api/v1 --name="Claude Voice"
Method 3: Using npx (For Development)
For development and testing purposes, you can install the MCP server directly in Claude Desktop:
cd claude-voice-mcp
npm run build
# Install the server into Claude Desktop
npx @anthropic/claude-desktop-mcp install --path=./dist
Verification
After installation, verify the integration:
- In Claude Desktop, go to Settings > Extensions
- Confirm "Claude Voice" is listed and shows "Connected" status
- Start a conversation with Claude
- Click the voice icon that appears in the interface to activate voice output
Technical Architecture
The MCP server acts as an intermediary between Claude Desktop and voice processing services:
- Claude Desktop sends text responses to the MCP server
- The MCP server processes the text through a TTS engine
- Audio is streamed back to Claude Desktop for playback
The primary components are:
- MCP Protocol Implementation: Handles API endpoints and WebSocket communication
- TTS Service: Processes text into speech using Google's TTS API
- Session Management: Maintains connection state and client information
API Documentation
REST Endpoints
GET /api/v1/info
: Returns information about the MCP serverGET /api/v1/health
: Health check endpointPOST /api/v1/register
: Registers a client with the MCP serverGET /api/v1/tts/config
: Returns TTS configuration optionsPOST /api/v1/tts
: Processes a TTS request
WebSocket Messages
-
Client to Server:
tts_request
: Request to convert text to speechtts_cancel
: Cancel an in-progress TTS requestping
: Keepalive message
-
Server to Client:
tts_response
: Response containing audio datatts_status
: Status updates for TTS processingerror
: Error messagespong
: Response to ping messages
Troubleshooting
If you encounter issues with the MCP server:
-
Connection Issues:
- Verify the server is running (
npm start
) - Check that the port (default 3000) is not blocked by a firewall
- Ensure Claude Desktop has permission to connect to local servers
- Verify the server is running (
-
TTS Issues:
- Check server logs for specific error messages
- Verify internet connectivity (required for Google TTS API)
- Try with shorter text samples to isolate problems
-
Integration Issues:
- Restart both the MCP server and Claude Desktop
- Check Claude Desktop logs for connection errors
- Verify the server URL is correctly configured
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.