mcp-web-extractor

mcp-web-extractor

3.3

MCP Web Extractor is a Model Context Protocol server that extracts web content using Readability.js, ideal for saving clean, readable versions of articles to Obsidian notes.

The MCP Web Extractor is a server designed to fetch and extract the main content from web pages using Readability.js. It is particularly useful for users who want to save clean and readable versions of articles, free from ads and other distractions, directly into their Obsidian notes. The server processes URLs to return the main text content along with metadata such as the title and an excerpt. It is easily integrable with Obsidian through the Model Context Protocol, making it a valuable tool for users who frequently save web content for later reference or research.

Features

  • Extracts readable content from any URL
  • Removes ads, sidebars, and other distractions
  • Returns clean text along with metadata (title, excerpt, etc.)
  • Easy integration with Obsidian via MCP

Usage with Different Platforms

standalone_service

bash
ts-node-esm client-example.ts

obsidian_integration

typescript
// obsidian-integration.ts example
import { MCPWebExtractor } from './mcp-web-extractor';

const extractor = new MCPWebExtractor();

extractor.extractContent('https://example.com').then(content => {
  console.log(content);
});