keywords:web-scraping

html-content-processor

A professional library for processing, cleaning, filtering, and converting HTML content to Markdown. Features advanced customization options, presets, plugin support, fluent API, and TypeScript integration for reliable content extraction.

kamjin3086

published version 1.0.5, 7 days ago0 dependents licensed under $MIT

218

crawlee-storage-extensions

Package for Apify/Crawlee that allows to store encrypted text values into the Storages

jetmar

published version 1.0.10, 10 months ago0 dependents licensed under $MIT

179

webscraping-ai-mcp

Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

webscraping-ai

published version 1.0.2, a month ago0 dependents licensed under $MIT

187

lightfeed

Lightfeed API Client for Node.js

lightfeed

published version 0.1.5, 7 days ago0 dependents licensed under $MIT

176

octoscrape

Advanced web scraping framework built on Puppeteer designed to bypass rate limits with smart proxy rotation and browser fingerprinting protection

hemantdua

published version 1.0.1, a month ago0 dependents licensed under $ISC

146

scraperis-mcp

Model Context Protocol (MCP) integration for Scraper.is - A web scraping tool for AI assistants

tuanvt

published version 0.1.22, 3 months ago0 dependents licensed under $MIT

164

content-web-extractor

MCP server for extracting content from web pages

bmen25125

published version 1.0.2, a month ago0 dependents licensed under $MIT

134

mult-fetch-mcp-server

一个基于 MCP 协议的网页内容获取工具，支持多种模式和格式，可与 Claude 等 AI 助手集成

martinguo

published version 1.0.0, 3 months ago0 dependents licensed under $MIT

122

@mseep/firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

skydeckai

published version 1.9.0, a month ago0 dependents licensed under $MIT

116

node-curl-impersonate

A wrapper around cURL-impersonate, a binary which can be used to bypass TLS fingerprinting.

wearr

published version 1.5.4, 7 months ago0 dependents licensed under $ISC

125

@mseep/brave-deep-research-mcp

DeepSearch MCP Server with Brave Search API and Puppeteer content extraction

skydeckai

published version 0.0.1, 23 days ago0 dependents licensed under $MIT

119

nemo-webminer

Nemo-webminer is a Node.js toolkit for scraping content from any website.

ananddey

published version 1.0.1, a month ago0 dependents licensed under $MIT

137

waterfall-fetch

utility for web scraping and fetching the html from a url or using puppeteer to interact with the page. getHtml uses various strategies in a 'waterfall' approch to get the content of the url, depending on priorities, such as stealth, speed, freshness.

andytyler

published version 1.0.11, 4 months ago0 dependents licensed under $MIT

121

@watercrawl/mcp

A Model Context Protocol (MCP) server for WaterCrawl, enabling AI systems to perform web crawling and search operations

amir.asaran

published version 1.0.1, 24 days ago0 dependents licensed under $ISC

134

puremd-mcp

Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

jasonbarry

published version 1.0.3, 2 months ago0 dependents licensed under $MIT

116

firecrawl-simple-mcp

Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

sacode

published version 1.0.2, 2 months ago0 dependents licensed under $MIT

109

@rpidanny/google-scholar

A minimal TypeScript library for fetching and parsing Google Scholar pages.

mabhishek

published version 3.3.0, 10 months ago1 dependents licensed under $MIT

107

@nrjdalal/google-parser

Google parser is a lightweight yet powerful HTTP client based Google Search Result scraper/parser with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.