4. Web Scraping
Web Scraping is a process of extracting information from a website or internet. Web scraping is one of
the most important techniques of data extraction from internet. It allows the extraction of unstructured
data from websites and convert it into structured data.
BASIC STEPS FOR WEB SCRAPING
Select
website
Authenticate
Generate
request
Process
Information
5. Web Scraping Applications
Web Scraping plays a major role in data extraction that helps in business Improvements. At present, a
website to any business is mandatory. This explains the importance of web scraping in information
extraction
Let’s see some of the applications of web scraping.
Data
Scienc
e
E-
Commerce
Sales
Finance
Web
Scrapping
Applications
Marketing
6. Different Methods of Web Scraping
There are different methods to extract information from websites. Authentication is an important aspect
for web scraping and every website has some restrictions for their content extraction.
Web scraping focuses on extracting data such as product costs, weather data, pollution check, criminal
data, stock price movements etc,. in our local database for analysis.
Copying
API
Keys
Socket
Programming
7. Web Scraping in Python
Python is one of the favorite languages for web scraping. Web scraping can be used for data analysis
when we have to analyze information from a website
The important libraries in Python that assists us in web scraping are:
Allows to scrape information from website in simple
steps.
Beautiful
Soup
Web scraping and automation
tool
Mechanize
8. Beautiful Soup Installation Steps
Execute conda install –c anaconda beautifulsoup4 in anaconda prompt
or
Execute pip install beautifulsoup4 in command prompt
Installation
starts
here
13. Do it yourself: Web Scraping Using Beautiful Soup
pip install beautifulsoup4
from urllib.request import urlopen
from bs4 import BeautifulSoup
url="https://timesofindia.com"
html=urlopen(url)
s=BeautifulSoup(html, 'lxml')
type(s)
title=s.title
title
text=s.get_text()
s.text
s.find_all('a')
links=s.find_all('a')
for link in links:
print(link.get("href"))
15. Django
Django is a high-level, popular Python framework for web development. Access to Django is
free & open source. Django is open-source and web apps can be created with less code. As a
framework, it is used for backend and front-end web development.
Fast Secure Scalable
17. Important Attributes of Django
• A web browser is an interface for URL.
• A URL is the web address and the act of assigning functions to url is called
mapping.
• Django template is simply a text document or a Python string marked-
up using the Django template language. All the html files are stored in
templates.
• Static folder is used to store other CSS files, java files , images etc.
• Functions related to web apps are written inside view. It also renders
content to templates, puts information into model and gets information
from databases.
18. Important Attributes of Django
• Form fetches data from HTML form and helps connect to the model.
• Model is information about the object structure stored in a database. It
contains essential fields and data behavior. Information can be directly
edited in the database.
• Django automatically looks for an admin module in each application and
imports it. Registration of object in model is done through admin, which is
the mandatory first step for database management.
• Database is the collection of data at backend.
22. Which of the following is a web scraping library in
Python?
a. Beautiful Soup
b. Pandas
c. Numpy
d. None of the above
Knowledge
Check
1
23. Which of the following is a web scraping library in Python?
a. Beautiful Soup
b. Pandas
c. Numpy
d. None of the above
Knowledge
Check
1
The correct answer is a
Beautiful Soup is for web scraping, Pandas for data analysis, and Numpy for numerical
analysis.
25. Knowledge
Check
2
Data extraction is the most important aspect of web
scraping.
The correct answer is b
Web scraping means extracting information from a URL. So, data extraction is the most important aspect of
web scraping.
a. False
b. True
26. In Python, a=BeautifulSoup() is an expression, where a/an is
a. A constructor
b. An object
c. A class
d. A value returning function
Knowledge
Check
3
27. In Python, a=BeautifulSoup() is an expression, where a/an is
a. A constructor
b. An object
c. A class
d. A value returning function
Knowledge
Check
3
The correct answer is b
a is an object created using
BeautifulSoup().
28. What is the role of render_to_response method in Django?
a. Generating web response
b. Rendering data from
web
c. Rendering an HTML response
d. None of above
Knowledge
Check
4
29. What is the role of render_to_response method in Django?
a. Generating web response
b. Rendering data from
web
c. Rendering an HTML response
d. None of above
Knowledge
Check
4
The correct answer is c
In Django, render_to_response method is used to easily render an HTML
response.
30. Key Takeaways
Web scraping is a method of extracting information from a
URL.
Beautiful Soup is one of the simplest and most useful web
scraping libraries in Python.
Django is a high-level web framework used for web
development in Python.