gradio-transcript-mcp

gradio-transcript-mcp

0

The gradio-transcript-mcp project is a Gradio application set as an MCP server to transcribe audio and video from URLs to text using OpenAI's Whisper. It offers functionality for MCP clients to efficiently process multimedia inputs and is designed for robust handling including format conversion.

gradio-transcript-mcp: A Gradio MCP Server for Audio/Video Transcription from URLs

Overview

gradio-transcript-mcp is a Gradio application configured to function as an MCP (Model Control Protocol) server. It is designed to transcribe audio and video from URLs into text. Implementing OpenAI's Whisper and ffmpeg (via yt-dlp), this server enables MCP clients (like Cline) to process multimedia inputs efficiently by downloading and converting content from a given URL. It supports robust handling, including format conversion to WAV and dynamic device selection (CPU or GPU).

The repository contains the following main components:

  • app.py: The main Gradio application file that runs the MCP server.
  • transcription_tool.py: The core logic for handling file conversion and calling the transcription function.
  • transcription.py: Contains the implementation for Whisper transcription using the transformers library.
  • requirements.txt: Lists the necessary Python dependencies.
  • ffmpeg_setup.py: Script to ensure ffmpeg is available.
  • logging_config.py: Configuration for logging.

Usage

Running the Gradio App / MCP Server

To run the Gradio application which also starts the MCP server, execute:

This will launch a local Gradio web interface and start the MCP server. The server will expose the transcribe_url function as an MCP tool.

Using as an MCP Server

When you run the application, it starts an MCP server accessible to MCP clients.

  • Exposed Tool:
    • Description: Transcribes audio or video from a given URL. Downloads the media from the URL, converts it to WAV format, and then uses the TranscriptTool to perform the transcription in English.
    • Input: url (string): The URL of the audio or video file.
    • Output: (string): The transcription of the audio/video in English, or an error message if download or transcription fails.

Connecting to the Hosted Server on Hugging Face Spaces

This application is also hosted on Hugging Face Spaces, providing a publicly accessible MCP server.