This package contains custom n8n nodes that integrate with the FireCrawl API for web scraping, crawling, extraction, and data analysis.
The FireCrawl Scraper node provides several powerful resources for web data extraction:
- Scrape: Scrape a single URL with various output formats and data extraction capabilities
- Batch Scrape: Process multiple URLs in a batch operation
- Crawler: Crawl an entire website starting from a specific URL
- Extract: Extract structured data from web pages using AI
- Map: Map URLs and extract structured data
- LLMs.txt: Generate LLMs.txt files from websites for LLM training and analysis
- Deep Research: AI-powered deep research on any topic
This package includes a Docker Compose configuration for easy setup with n8n:
- Clone this repository
- Create a
.env
file with your FireCrawl API key:FIRECRAWL_API_KEY=your_api_key_here
- Run Docker Compose:
docker-compose up -d
- Access n8n at
http://localhost:5678
npm install n8n-nodes-firecrawl-scraper-optimized
Before using the nodes, you need to set up the FireCrawl API credentials in n8n:
- Go to Settings > Credentials
- Click on New Credential
- Select FireCrawl API
- Enter your API key
- Save the credential
The Scrape resource allows you to extract content from a single URL.
Key Features:
- Multiple output formats (Markdown, HTML, Screenshots, etc.)
- Change tracking between scrapes
- Structured data extraction
- Page action support for dynamic content
- FIRE-1 agent integration for advanced capabilities
Example Configuration:
- Add the FireCrawl Scraper node
- Select Scrape as the resource
- Enter the URL to scrape
- Select output formats
- Enable additional options as needed
Process multiple URLs in a batch operation.
Key Features:
- Batch processing of multiple URLs
- Synchronous or asynchronous operation modes
- Data extraction across multiple pages
Crawl an entire website starting from a specific URL.
Key Features:
- Configurable crawl depth and limits
- Path inclusion and exclusion patterns
- LLM extraction during crawling
- Change tracking
Extract specific structured data from web pages using AI.
Key Features:
- Schema-based extraction
- Simple prompt-based extraction
- Multiple URL processing
- V2 support with FIRE-1 agent
Perform AI-powered deep research on any topic.
Key Features:
- Multi-depth research capabilities
- Customizable system and analysis prompts
- JSON structured output
- Activity tracking
Generate LLMs.txt files from websites.
Key Features:
- Create LLMs.txt files for LLM training
- Configurable crawl limits
- Full text generation option
Map URLs and extract structured data.
Key Features:
- Site mapping capabilities
- Structure discovery
Option | Description |
---|---|
URL | The URL to scrape |
Output Formats | Formats to return (Markdown, HTML, Screenshots, etc.) |
Track Changes | Track differences between scrapes |
Include Extract | Extract structured data |
Include Page Actions | Perform actions before scraping |
Use FIRE-1 Agent | Enable advanced agent capabilities |
Location Settings | Specify geographic location |
Option | Description |
---|---|
URL | The starting URL to crawl |
Limit | Maximum number of pages to crawl |
Maximum Depth | How deep to crawl (1-10) |
Include/Exclude Paths | Regular expressions for paths |
Operation Mode | Synchronous or asynchronous |
Enable LLM Extraction | Extract data during crawling |
Option | Description |
---|---|
URL(s) | URLs to extract data from |
Version | API version (V1 or V2 with FIRE-1) |
Extraction Method | Simple or schema-based |
Operation Mode | Single, batch, or URL-less |
Enable Web Search | Follow external links for context |
Track Changes | Track differences between extractions |
[n8n Workflow]
1. FireCrawl Scraper (Scrape)
- URL: https://example.com
- Output Formats: Markdown, HTML
- Include Extract: Yes
- Extraction Method: Schema Based
- Schema: JSON schema defining product information
[n8n Workflow]
1. FireCrawl Scraper (Crawler)
- URL: https://example.com
- Limit: 100
- Maximum Depth: 3
- Enable LLM Extraction: Yes
2. Google Sheets
- Action: Append
- Sheet: Crawl Results
[n8n Workflow]
1. FireCrawl Scraper (Deep Research)
- Query: "Latest advancements in renewable energy"
- Maximum Depth: 7
- Wait for Completion: Yes
2. Text Formatter
- Format research results
3. Email
- Send formatted research
The nodes implement error handling with the option to continue workflow execution on failures. Each response includes a success
field indicating whether the operation succeeded, along with detailed error messages when applicable.
To build the package:
npm run build
npm run test
For support with the FireCrawl API, visit FireCrawl Documentation.
For issues with these custom nodes, please open an issue on GitHub.