Open In App

BeautifulSoup - Error Handling

Last Updated : 31 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

When scraping data from websites, we often face different types of errors. Some are caused by incorrect URLs, server issues or incorrect usage of scraping libraries like requests and BeautifulSoup.

In this tutorial, we’ll explore some common exceptions encountered during web scraping and how to handle them.

1. HTTPError: 

An HTTPError occurs when the server responds with an HTTP error status code, such as 404 (Not Found) or 500 (Internal Server Error).

Example 1 (Valid URL):

import requests

url = 'https://www.geeksforgeeks.org/python/implementing-web-scraping-python-beautiful-soup/'

try:
    response = requests.get(url)
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print("HTTP Error:", e)
else:
    print("Request successful")

Output:

Request successful

Explanation:

  • raise_for_status() automatically raises an HTTPError if the response status code indicates an error.
  • Since the URL exists, the request succeeds.

Example 2 (Invalid URL triggering HTTPError):

import requests

url = 'https://www.geeksforgeeks.org/page-that-does-not-exist'

try:
    response = requests.get(url)
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print("HTTP Error:", e)
else:
    print("Request successful")

Output:

HTTP Error: 404 Client Error: Not Found for url: https://www.geeksforgeeks.org/page-that-does-not-exist/

Explanation: This URL does not exist, so a 404 Not Found error is raised.

2. URLError:

URLError typically occurs when the URL is invalid, or there’s a network connection issue.

Note: In Python’s requests module, URLError is not directly raised- instead, requests.exceptions.ConnectionError is raised for connection failures.

Example:

import requests

url = 'https://thiswebsitedoesnotexist123456789.com'

try:
    response = requests.get(url)
    response.raise_for_status()
except requests.exceptions.ConnectionError as e:
    print("Connection Error:", e)
else:
    print("Request successful")

Output:

Connection Error: HTTPSConnectionPool(host='thiswebsitedoesnotexist123456789.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x783ef8211210>: Failed to resolve 'thiswebsitedoesnotexist123456789.com' ([Errno -2] Name or service not known)"))

Explanation:

  • If the domain name is incorrect or unreachable, ConnectionError is raised.
  • Always handle connection-related exceptions when scraping.

3. AttributeError (BeautifulSoup specific)

AttributeError in BeautifulSoup is raised when an invalid attribute reference is made, or when an attribute assignment fails. When we try to access the Tag using BeautifulSoup from a website and that tag is not present on that website then BeautifulSoup always gives an AttributeError.

Example:

import requests
from bs4 import BeautifulSoup

url = 'https://www.geeksforgeeks.org/python/implementing-web-scraping-python-beautiful-soup/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Accessing a non-existing tag will raise AttributeError
print(soup.NonExistingTag.SomeTag)

Output:

AttributeError: 'NoneType' object has no attribute 'SomeTag'

Explanation:

  • If NonExistingTag does not exist, soup.NonExistingTag returns None.
  • Trying to access SomeTag on None triggers AttributeError.

Safer way to avoid AttributeError:

tag = soup.find('NonExistingTag')
if tag:
    print(tag.SomeTag)
else:
    print("Tag not found")

4. XMLParserError (Parsing Errors)

When parsing invalid or incomplete XML data with BeautifulSoup, you might face parsing errors or get None or empty results when using find() or find_all().

Syntax:

soup = bs4.BeautifulSoup( response, ' xml ' )

or

soup = bs4.BeautifulSoup( response, ' xml -xml' )  

XML parser error generally happens when we're not passing any element in the find() and find_all() function or element is missing from the document. It gives the empty bracket [] or None as their output.

Example:

import requests
from bs4 import BeautifulSoup

url = 'https://www.geeksforgeeks.org/python/implementing-web-scraping-python-beautiful-soup/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'xml')

# Trying to find a non-existent element
result = soup.find('div', class_='non-existent-class')
print(result)

Output:

None


Article Tags :
Practice Tags :

Similar Reads