Logging in Scrapy

Last Updated : 28 Apr, 2025

Scrapy is a fast high-level web crawling and scraping framework written in Python used to crawl websites and extract structured data from their pages. It can be used for many purposes, from data mining to monitoring and automated testing.

As developers, we spend most of our time debugging than writing new code. Logging is one of the techniques that is used to make debugging easier. It refers to keeping track of the log of events, including errors, problems, etc., that arise during the code's runtime.

Logging in Scrapy:

Initially, Scrapy provided the logging feature through the scrapy.log module.

But it is deprecated now and no longer supported.

Instead, python's built-in logging module can be used along with Scrapy to log its events.

Python’s built-in logging has defined 5 different levels to indicate the severity of a given log message as listed below in Decreasing order of severity:

Level 5: logging.CRITICAL - for critical errors [Highest severity]

Python3

import logging
logging.critical("Scrapy Log to display Critical messages")

Level 4: logging.ERROR - for regular errors

Python3

import logging
logging.error("Scrapy Log to display Error messages")

Level 3: logging.WARNING - for warning messages

Python3

import logging
logging.warning("Scrapy Log to display Warning messages")

Level 2: logging.INFO - for informational messages

Python3

import logging
logging.info("Scrapy Log to display Info messages")

Level 1: logging.DEBUG - for debugging messages [Lowest severity]

Python3

import logging
logging.debug("Scrapy log to display Debugging messages")

Scrapy Spider Logs:

Scrapy supports a Logger inside each Spider instance. It can be accessed and used as shown below:

A step-by-step method for logging in spiders:

1. Installation of packages – run the following command from the terminal

pip install scrapy

2. Create a Scrapy project – run the following command from the terminal

scrapy startproject scrapy_log
cd scrapy_log
scrapy genspider log http://books.toscrape.com/

Here,

Project Name: "scrapy_log"
Spider Name: "log"
Domain to be Scraped: "http://books.toscrape.com/"

4. Define the Parse function - Add the following code to "scrapy_log\spiders\log.py"

To create a logger with the name of the spider: (i.e. "log")

Python3

import scrapy

class LogSpider(scrapy.Spider):
    name = 'log'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):
        self.logger.info('Parse function called on %s', response.url)

To create a user-defined custom-named logger: (i.e. "GFG_logger")

Python3

import scrapy
import logging

logger = logging.getLogger('GFG_logger')

class LogSpider(scrapy.Spider):
    name = 'log'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):
        logger.info('Parse function called on %s', response.url)

To create a custom Logger Format:

The Logging basic configuration is defined in the below code as follows:

level - Defines till which level of messages should be logged starting from level 1
format - Defines the general format of the log messages - ("[DateTime] {LoggerName} LevelName: Message")
datefmt - Defines the format of the Timestamp that is displayed

Python3

import scrapy
import logging

logging.basicConfig(level=logging.CRITICAL,
                    format='[%(asctime)s] {%(name)s} %(levelname)s:  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S')
logger = logging.getLogger('GFG_logger')

class LogSpider(scrapy.Spider):
    name = 'log'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):
        logger.info('Parse function called on %s', response.url)

To Export the logs to a Log File:

The logs can be saved to a Log File as shown in the below code where it saves the logs to a file named ("saved_logs.log")

Python3

import scrapy
import logging

logging.basicConfig(level=logging.CRITICAL,
                    format='[%(asctime)s] {%(name)s} %(levelname)s:  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S',
                    filename="saved_logs.log")
logger = logging.getLogger('GFG_logger')

class LogSpider(scrapy.Spider):
    name = 'log'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):
        logger.info('Parse function called on %s', response.url)

5. Run the spider using either of the following commands:

scrapy crawl log

The above command lists all the logs.

scrapy crawl log -L INFO

Here, "-L" is used to specify the Log level that needs to be listed (i.e. INFO/DEBUG/CRITICAL/WARN/ERROR)

How to use Scrapy to parse PDF pages online?

qwerty_gfg

Improve

Article Tags :

Logging in Scrapy

Logging in Scrapy:

Scrapy Spider Logs:

A step-by-step method for logging in spiders:

Similar Reads

Getting Started With Scrapy

Scrapy Basics

Data Collection and Management

Data Extraction and Export

Appliaction And Projects

Thank You!

What kind of Experience do you want to share?