n8n-nodes-firecrawl-scraper-optimized

1.0.0 • Public • Published

FireCrawl Scraper Custom Nodes for n8n

This package contains custom n8n nodes that integrate with the FireCrawl API for web scraping, crawling, extraction, and data analysis.

Features

The FireCrawl Scraper node provides several powerful resources for web data extraction:

  • Scrape: Scrape a single URL with various output formats and data extraction capabilities
  • Batch Scrape: Process multiple URLs in a batch operation
  • Crawler: Crawl an entire website starting from a specific URL
  • Extract: Extract structured data from web pages using AI
  • Map: Map URLs and extract structured data
  • LLMs.txt: Generate LLMs.txt files from websites for LLM training and analysis
  • Deep Research: AI-powered deep research on any topic

Installation

Option 1: Using Docker Compose

This package includes a Docker Compose configuration for easy setup with n8n:

  1. Clone this repository
  2. Create a .env file with your FireCrawl API key:
    FIRECRAWL_API_KEY=your_api_key_here
    
  3. Run Docker Compose:
    docker-compose up -d
  4. Access n8n at http://localhost:5678

Option 2: Installing in an existing n8n instance

npm install n8n-nodes-firecrawl-scraper-optimized

Configuration

Before using the nodes, you need to set up the FireCrawl API credentials in n8n:

  1. Go to Settings > Credentials
  2. Click on New Credential
  3. Select FireCrawl API
  4. Enter your API key
  5. Save the credential

Usage

Scrape Resource

The Scrape resource allows you to extract content from a single URL.

Key Features:

  • Multiple output formats (Markdown, HTML, Screenshots, etc.)
  • Change tracking between scrapes
  • Structured data extraction
  • Page action support for dynamic content
  • FIRE-1 agent integration for advanced capabilities

Example Configuration:

  1. Add the FireCrawl Scraper node
  2. Select Scrape as the resource
  3. Enter the URL to scrape
  4. Select output formats
  5. Enable additional options as needed

Batch Scrape Resource

Process multiple URLs in a batch operation.

Key Features:

  • Batch processing of multiple URLs
  • Synchronous or asynchronous operation modes
  • Data extraction across multiple pages

Crawler Resource

Crawl an entire website starting from a specific URL.

Key Features:

  • Configurable crawl depth and limits
  • Path inclusion and exclusion patterns
  • LLM extraction during crawling
  • Change tracking

Extract Resource

Extract specific structured data from web pages using AI.

Key Features:

  • Schema-based extraction
  • Simple prompt-based extraction
  • Multiple URL processing
  • V2 support with FIRE-1 agent

Deep Research Resource

Perform AI-powered deep research on any topic.

Key Features:

  • Multi-depth research capabilities
  • Customizable system and analysis prompts
  • JSON structured output
  • Activity tracking

LLMs.txt Resource

Generate LLMs.txt files from websites.

Key Features:

  • Create LLMs.txt files for LLM training
  • Configurable crawl limits
  • Full text generation option

Map Resource

Map URLs and extract structured data.

Key Features:

  • Site mapping capabilities
  • Structure discovery

Resource Details

Scrape Resource Options

Option Description
URL The URL to scrape
Output Formats Formats to return (Markdown, HTML, Screenshots, etc.)
Track Changes Track differences between scrapes
Include Extract Extract structured data
Include Page Actions Perform actions before scraping
Use FIRE-1 Agent Enable advanced agent capabilities
Location Settings Specify geographic location

Crawler Resource Options

Option Description
URL The starting URL to crawl
Limit Maximum number of pages to crawl
Maximum Depth How deep to crawl (1-10)
Include/Exclude Paths Regular expressions for paths
Operation Mode Synchronous or asynchronous
Enable LLM Extraction Extract data during crawling

Extract Resource Options

Option Description
URL(s) URLs to extract data from
Version API version (V1 or V2 with FIRE-1)
Extraction Method Simple or schema-based
Operation Mode Single, batch, or URL-less
Enable Web Search Follow external links for context
Track Changes Track differences between extractions

Examples

Example: Scraping a Website and Extracting Structured Data

[n8n Workflow]
1. FireCrawl Scraper (Scrape)
   - URL: https://example.com
   - Output Formats: Markdown, HTML
   - Include Extract: Yes
   - Extraction Method: Schema Based
   - Schema: JSON schema defining product information

Example: Crawling a Website and Generating a Report

[n8n Workflow]
1. FireCrawl Scraper (Crawler)
   - URL: https://example.com
   - Limit: 100
   - Maximum Depth: 3
   - Enable LLM Extraction: Yes
2. Google Sheets
   - Action: Append
   - Sheet: Crawl Results

Example: Performing Deep Research on a Topic

[n8n Workflow]
1. FireCrawl Scraper (Deep Research)
   - Query: "Latest advancements in renewable energy"
   - Maximum Depth: 7
   - Wait for Completion: Yes
2. Text Formatter
   - Format research results
3. Email
   - Send formatted research

Error Handling

The nodes implement error handling with the option to continue workflow execution on failures. Each response includes a success field indicating whether the operation succeeded, along with detailed error messages when applicable.

Development

Building the Package

To build the package:

npm run build

Testing

npm run test

License

MIT

Support

For support with the FireCrawl API, visit FireCrawl Documentation.

For issues with these custom nodes, please open an issue on GitHub.

Package Sidebar

Install

npm i n8n-nodes-firecrawl-scraper-optimized

Weekly Downloads

6

Version

1.0.0

License

MIT

Unpacked Size

157 kB

Total Files

24

Last publish

Collaborators

  • polyblockdev