A Model Context Protocol (MCP) server for WaterCrawl, built with FastMCP. This package provides AI systems with web crawling, scraping, and search capabilities through a standardized interface.
Use WaterCrawl MCP directly without installation using npx:
npx @watercrawl/mcp --api-key YOUR_API_KEY
Configure your Codeium or Windsurf with this package without installing it:
{
"mcpServers": {
"watercrawl": {
"command": "npx",
"args": [
"@watercrawl/mcp",
"--api-key",
"YOUR_API_KEY",
"--base-url",
"https://app.watercrawl.dev"
]
}
}
}
Run WaterCrawl MCP in SSE mode:
npx @watercrawl/mcp sse --port 3000 --endpoint /sse --api-key YOUR_API_KEY
Then configure Claude Desktop to connect to your SSE server.
-
-b, --base-url <url>
: WaterCrawl API base URL (default: https://app.watercrawl.dev) -
-k, --api-key <key>
: Required, your WaterCrawl API key -
-h, --help
: Display help information -
-V, --version
: Display version information
SSE mode additional options:
-
-p, --port <number>
: Port for the SSE server (default: 3000) -
-e, --endpoint <path>
: SSE endpoint path (default: /sse)
wc-mcp/
├── src/ # Source code
│ ├── cli/ # Command-line interface
│ ├── config/ # Configuration management
│ ├── mcp/ # MCP implementation
│ ├── services/ # WaterCrawl API services
│ └── tools/ # MCP tools implementation
├── tests/ # Test suite
├── dist/ # Compiled JavaScript
├── tsconfig.json # TypeScript configuration
├── package.json # npm package configuration
└── README.md # This file
- Clone the repository and install dependencies:
git clone https://github.com/watercrawl/watercrawl-mcp
cd watercrawl-mcp
npm install
- Build the project:
npm run build
- Link the package for local development:
npm run dev:link
The project includes tests for both SSE and npx modes:
# Run all tests
npm test
# Run only SSE tests
npm run test:sse
# Run only npx tests
npm run test:npx
Tests require a valid WaterCrawl API key to be set in the .env
file or passed as an environment variable.
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature
) - Commit your changes (
git commit -m 'Add your feature'
) - Push to the branch (
git push origin feature/your-feature
) - Open a Pull Request
npm install -g @watercrawl/mcp
npm install @watercrawl/mcp
Configure WaterCrawl MCP using environment variables or command-line parameters.
Create a .env
file or set environment variables:
WATERCRAWL_BASE_URL=https://app.watercrawl.dev
WATERCRAWL_API_KEY=YOUR_API_KEY
SSE_PORT=3000 # Optional, for SSE mode
SSE_ENDPOINT=/sse # Optional, for SSE mode
The WaterCrawl MCP server provides the following tools:
Scrape content from a URL with customizable options.
{
"url": "https://example.com",
"pageOptions": {
"exclude_tags": ["script", "style"],
"include_tags": ["p", "h1", "h2"],
"wait_time": 1000,
"only_main_content": true,
"include_html": false,
"include_links": true,
"timeout": 15000,
"accept_cookies_selector": ".cookies-accept-button",
"locale": "en-US",
"extra_headers": {
"User-Agent": "Custom User Agent"
},
"actions": [
{"type": "screenshot"},
{"type": "pdf"}
]
},
"sync": true,
"download": true
}
Search the web using WaterCrawl.
{
"query": "artificial intelligence latest developments",
"searchOptions": {
"language": "en",
"country": "us",
"time_range": "recent",
"search_type": "web",
"depth": "deep"
},
"resultLimit": 5,
"sync": true,
"download": true
}
Download a sitemap from a crawl request in different formats.
{
"crawlRequestId": "uuid-of-crawl-request",
"format": "json" // or "graph" or "markdown"
}
Manage crawl requests: list, get details, stop, or download results.
{
"action": "list", // or "get", "stop", "download"
"crawlRequestId": "uuid-of-crawl-request", // for get, stop, and download actions
"page": 1,
"pageSize": 10
}
Manage search requests: list, get details, or stop running searches.
{
"action": "list", // or "get", "stop"
"searchRequestId": "uuid-of-search-request", // for get and stop actions
"page": 1,
"pageSize": 10,
"download": true
}
Monitor a crawl or search request in real-time, with timeout control.
{
"type": "crawl", // or "search"
"requestId": "uuid-of-request",
"timeout": 30, // in seconds
"download": true
}
ISC