50 Page PYTHON Notes
50 Page PYTHON Notes
UNIT – I
BASICS OF PYTHON
PYTHON
Python is a high-level, interpreted, interactive and object-oriented scripting language.
Python is designed to be highly readable. It uses English keywords frequently whereas the
other languages use punctuations. It has fewer syntactical constructions than other
languages.
Python is Interpreted: Python is processed at runtime by the interpreter. You do
not need to compile your program before executing it. This is similar to PERL and
PHP.
Python is Interactive: You can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.
Python is Object-Oriented: Python supports Object-Oriented style or technique
of programming that encapsulates code within objects.
Python is a Beginner's Language: Python is a great language for the beginnerlevel
programmers and supports the development of a wide range of applicationsfrom simple text
processing to WWW browsers to games.
HISTORY OF PYTHON
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.
i. Python is derived from many other languages, including ABC, Modula-3, C, C++,
Algol-68, SmallTalk, and Unix shell and other scripting languages.
ii. Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).
.
iii. Python 1.0 was released in November 1994. In 2000, Python 2.0 was released.
Python 2.7.11 is the latest edition of Python 2.
iv. Meanwhile, Python 3.0 was released in 2008. Python 3 is not backward
compatiblewith Python 2. The emphasis in Python 3 had been on the removal of
duplicateprogramming constructs and modules so that "There should be one –
2
andpreferably only one -- obvious way to do it." Python 3.5.1 is the latest version
ofPython 3.
2. Python 3 – Overview
PYTHON'S FEATURES
Easy-to-learn: Python has few keywords, simple structure, and a clearly defined
syntax. This allows a student to pick up the language quickly.
Easy-to-read: Python code is more clearly defined and visible to the eyes.
Easy-to-maintain: Python's source code is fairly easy-to-maintain.
A broad standard library: Python's bulk of the library is very portable and crossplatform
compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode, which allows
interactive testing and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
Databases: Python provides interfaces to all major commercial databases.
GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable: Python provides a better structure and support for large programs than
shell scripting.
Apart from the above-mentioned features, Python has a big list of good features. A few
are listed below-
It supports functional and structured programming methods as well as OOP.
It can be used as a scripting language or can be compiled to byte-code for building
large applications.
It provides very high-level dynamic data types and supports dynamic type
checking.
It supports automatic garbage collection.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
3
KEYWORDS IN PYTHON
Python Keywords are some predefined and reserved words in python that have special
meanings. Keywords are used to define the syntax of the coding. The keyword cannot be
used as an identifier, function, or variable name. All the keywords in python are written in
lowercase except True and False. There are 36 keywords in Python 3.11.
IDENTIFIERS IN PYTHON
Identifier is a user-defined name given to a variable, function, class, module, etc. The
identifier is a combination of character digits and an underscore. They are case-sensitive i.e.,
‘num’ and ‘Num’ and ‘NUM’ are three different identifiers in python. It is a good
programming practice to give meaningful names to identifiers to make the code
understandable.
FORMATTED OUTPUT
q = 459
p = 0.098
print(q, p, p * q)
OUTPUT:
INPUT STATEMENT
input (): This function first takes the input from the user and converts it into a string. The type of the
returned object always will be <type ‘str’>.
Syntax:
inp = input('STATEMENT')
Example:
1. >>> name = input('What is your name?\n') # \n ---> newline ---> It causes a line break
>>> What is your name?
4
IF Statement
The IF statement is similar to that of other languages. The if statement contains a logical
expression using which the data is compared and a decision is made based on the result of the
comparison. Syntax
if expression:
6
statement(s)
If the boolean expression evaluates to TRUE, then the block of statement(s) inside the if
statement is executed. In Python, statements in a block are uniformly indented after the :
symbol. If boolean expression evaluates to FALSE, then the first set of code after the end of
block is executed.
IF...ELIF...ELSE Statements
An else statement can be combined with an if statement. An else statement contains a block
of code that executes if the conditional expression in the if statement resolves to 0 or a
FALSE value. The else statement is an optional statement and there could be at the most only
one else statement following if.
Syntax
The syntax of the if...else statement is
if expression:
statement(s)
else:
statement(s)
Nested IF Statements
There may be a situation when you want to check for another condition after a condition
resolves to true. In such a situation, you can use the nested if construct. In a nested if
construct, you can have an if...elif...else construct inside another if...elif...else construct.
body.
for loop Executes a sequence of statements multiple times and abbreviates
the code that manages the loop variable.
Nested loop You can use one or more loop inside any another while, or for
loop.
Break statement
The break statement isused for premature termination of the current loop.
Afterabandoningtheloop,executionatthenextstatementisresumed,justliketh
etraditionalbreakstatement in C.
continue Statement
FUNCTIONS
8
Dividing a complex problem into smaller chunks makes our program easy to understand and
reuse.
Types of function
Standard library functions - These are built-in functions in Python that are available
to use.
User-defined functions - We can create our own functions based on our
requirements.
deffunction_name(arguments):
# function body
Return
Here,
def greet():
print('Hello World!')
Here, we have created a function named greet(). It simply prints the text Hello World!.
def greet():
print('Hello World!')
def greet():
print('Hello World!')
print('Outside function')
Output
Hello World!
Outside function
For example,
If we create a function with arguments, we need to pass the corresponding values while
calling them.
UNIT – 2
LIST
List is a collection of ordered items.
For example,
numbers = [1, 2, 5]
print(numbers)
Output: [1, 2, 5]
A list can have any number of items and they may be of different types (integer, float, string,
etc.). For example,
10
# empty list
my_list = []
In Python, each item in a list is associated with a number. The number is known as a list
index.
We can access elements of an array using the index number (0, 1, 2 …).
For example,
print(languages[0]) # Python
print(languages[2]) # C++
Python allows negative indexing for its sequences. The index of -1 refers to the last item, -2
to the second last item and so on.
Example
print(languages[-1]) # C++
print(languages[-3]) # Python
In Python it is possible to access a section of items from the list using the slicing operator :,
not just a single item.
For example,
my_list = ['p','r','o','g','r','a','m','i','z']
1. Using append()
For example,
Python has many useful list methods that makes it really easy to work with lists.
Method Description
extend() add items of lists and other iterables to the end of the list
For example,
12
Output
Python
Swift
C++
TUPLE
A tuple in Python is similar to a list. The difference between the two is that we cannot change
the elements of a tuple once it is assigned whereas we can change the elements of a list.
Creating a Tuple
A tuple is created by placing all the items (elements) inside parentheses (), separated by
commas. The parentheses are optional, however, it is a good practice to use them.
A tuple can have any number of items and they may be of different types (integer, float, list,
string, etc.).
# Empty tuple
my_tuple = ()
print(my_tuple)
Like a list, each element of a tuple is represented by index numbers (0, 1, ...) where the first
element is at index 0.
1. Indexing
We can use the index operator [] to access an item in a tuple, where the index starts from 0.
13
So, a tuple having 6 elements will have indices from 0 to 5. Trying to access an index outside
of the tuple index range( 6,7,... in this example) will raise an IndexError.
The index must be an integer, so we cannot use float or other types. This will result in
TypeError.
2. Negative Indexing
The index of -1 refers to the last item, -2 to the second last item and so on. For example,
3. Slicing
We can access a range of items in a tuple by using the slicing operator colon :.
In Python ,methods that add items or remove items are not available with tuple. Only the
following two methods are available.
The for loop to iterate over the elements of a tuple. For example,
Output
Python
Swift
C++
SETS
A set is a collection of unique data. That is, elements of a set cannot be duplicate.
In Python, we create sets by placing all the elements inside curly braces {}, separated by
comma.
A set can have any number of items and they may be of different types (integer, float, tuple,
string etc.). But a set cannot have mutable elements like list or dictionaries as its elements.
Python Dictionary
Python dictionary is an ordered collection (starting from Python 3.7) of items. It stores
elements in key/value pairs. Here, keys are unique identifiers that are associated with
each value.
Let's see an example,
If we want to store information about countries and their capitals, we can create a dictionary
with country names as keys and capitals as values.
Keys Values
Nepal Kathmandu
Italy Rome
England London
Output
Python Strings
In computer programming, a string is a sequence of characters. For example, "hello" is a
string containing a sequence of characters 'h', 'e', 'l', 'l', and 'o'.
We use single quotes or double quotes to represent a string in Python. For example,
REGULAR EXPRESSION
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.
For example,
^a . . . s$
The pattern is: any five letter string starting with a and ending with s
import re
pattern = '^a...s$'
test_string = 'abyss'
if result:
16
print("Search successful.")
else:
print("Search unsuccessful.")
Here, we used re.match() function to search pattern within the test_string. The method
returns a match object if the search is successful. If not, it returns None.
Specify Pattern Using RegEx
To specify regular expressions, metacharacters are used. In the above example, ^ and $ are
metacharacters.
Modules
As our program grows bigger, it may contain many lines of code. Instead of putting
everything in a single file, we can use modules to separate codes in separate files as per their
functionality. This makes our code organized and easier to maintain.
Module is a file that contains code to perform a specific task. A module may contain
variables, functions, classes etc.
Custom Moules
Let us create a module. Type the following and save it as example.py.
result = a + b
return result
Here, we have defined a function add() inside a module named example. The function takes
in two numbers and returns their sum.
Import modules in Python
We can import the definitions inside a module to another module or the interactive interpreter
in Python.
17
We use the import keyword to do this. To import our previously defined module example ,
PACKAGE
A package is a container that contains various functions to perform specific tasks. For
example, the math package includes the sqrt() function to perform the square root of a
number.
While working on big projects, we have to deal with a large amount of code, and writing
everything together in the same file will make our code look messy. Instead, we can separate
our code into multiple files by keeping the related code together in packages.
import Game.Level.start
18
Now, if this module contains a function named select_difficulty() , we must use the full
name to reference it.
Game.Level.start.select_difficulty(2)
UNIT - 3
FILE INTRODUCTION
FILE PATH
If the file is located in a different location, you will have to specify the file path, like this:
Example
f = open("D:\\myfiles\welcome.txt", "r")
print(f.read())
Output
Welcome to this text file!
This file is located in a folder named "myfiles", on the D drive.
19
Mode Description
Open a file for writing. Creates a new file if it does not exist or truncates the file if it
w
exists.
x Open a file for exclusive creation. If the file already exists, the operation fails.
Open a file for appending at the end of the file without truncating it. Creates a new file if it
a
does not exist.
Read Lines
One line of the file can be returned by using the readline() method:
finally:
# close the file
file1.close()
Unpickling is the inverse operation. A byte stream from a binary file or bytes-like object is
converted back into an object hierarchy. To de-serialize a data stream, you call the loads()
function.
Pickling and unpickling are alternatively known as serialization.
What can be pickled and unpickled?
In Python, the following types can be pickled −
None, True, and False.
integers, floating-point numbers, complex numbers.
strings, bytes, bytearrays.
tuples, lists, sets, and dictionaries containing only picklable objects.
functions, built-in and user-defined.
record and your program crashes, there are very few chances that you detect the cause of
the problem. And if you detect the cause, it will consume a lot of time.
Why Printing is not a good option?
Some developers use the concept of printing the statements to validate if the statements are
executed correctly or some error has occurred. But printing is not a good idea. It may solve
your issues for simple scripts but for complex scripts, the printing approach will fail.
Python has a built-in module logging which allows writing status messages to a file or any
other output streams. The file can contain the information on which part of the code is
executed and what problems have been arisen.
Levels of Log Message
There are five built-in levels of the log message.
Debug : These are used to give Detailed information, typically of interest only when
diagnosing problems.
Info : These are used to confirm that things are working as expected
Warning : These are used an indication that something unexpected happened, or is
indicative of some problem in the near future
Error : This tells that due to a more serious problem, the software has not been able to
perform some function
Critical : This tells serious error, indicating that the program itself may be unable to
continue running
If required, developers have the option to create more levels but these are sufficient enough
to handle every possible situation. Each built-in level has been assigned its numeric value.
EXCEPTION HANDLING
Error in Python can be of two types i.e. Syntax errors and Exceptions . Errors are
problems in a program due to which the program will stop the execution. On the other
hand, exceptions are raised when some internal events occur which change the normal flow
of the program.
In Python, there are several built-in exceptions that can be raised when an error occurs
during the execution of a program. Here are some of the most common types of exceptions
in Python:
SyntaxError: This exception is raised when the interpreter encounters a syntax error in
the code, such as a misspelled keyword, a missing colon, or an unbalanced parenthesis.
TypeError: This exception is raised when an operation or function is applied to an
object of the wrong type, such as adding a string to an integer.
NameError: This exception is raised when a variable or function name is not found in
the current scope.
IndexError: This exception is raised when an index is out of range for a list, tuple, or
other sequence types.
23
Syntax Error: As the name suggests this error is caused by the wrong syntax in the code. It
leads to the termination of the program.
Improved program reliability: By handling exceptions properly, you can prevent your
program from crashing or producing incorrect results due to unexpected errors or input.
Simplified error handling: Exception handling allows you to separate error handling
code from the main program logic, making it easier to read and maintain your code.
Cleaner code: With exception handling, you can avoid using complex conditional
statements to check for errors, leading to cleaner and more readable code.
Easier debugging: When an exception is raised, the Python interpreter prints a
traceback that shows the exact location where the exception occurred, making it easier
to debug your code.
Exceptions need to be derived from the Exception class, either directly or indirectly.
Although not mandatory, most of the exceptions are named as names that end
in “Error” similar to the naming of the standard exceptions in python. For example,
UNIT – 4
OBJECT ORIENTED PROGRAMMING
CLASSES AND OBJECTS
In Python, object-oriented Programming (OOPs) is a programming paradigm that uses
objects and classes in programming. It aims to implement real-world entities like
inheritance, polymorphisms, encapsulation, etc. in the programming. The main concept of
OOPs is to bind the data and the functions that work on that together as a single unit so that
no other part of the code can access this data.
OOPs Concepts in Python
Class
Objects
Polymorphism
Encapsulation
Inheritance
Data Abstraction
25
Python Class
A class is a collection of objects. A class contains the blueprints or the prototype from
which the objects are being created. It is a logical entity that contains some attributes and
methods.
Classes are created by keyword class.
Attributes are the variables that belong to a class.
Attributes are always public and can be accessed using the dot (.) operator. Eg.:
Myclass.Myattribute
Class Definition Syntax:
class ClassName:
# Statement-1
# Statement-N
The object is an entity that has a state and behavior associated with it. It may be any real-
world object like a mouse, keyboard, chair, table, pen, etc. Integers, strings, floating-point
numbers, even arrays, and dictionaries, are all objects.
Creating an Object
This will create an object named obj of the class Dog defined above.
Example:
obj = Dog()
1. Class methods must have an extra first parameter in the method definition. We do not
give a value for this parameter when we call the method, Python provides it
2. If we have a method that takes no arguments, then we still have to have one argument.
Self represents the instance of the class. By using the “self” we can access the attributes
and methods of the class in python. It binds the attributes with the given arguments.
CONSTRUCTOR IN PYTHON
Constructors are generally used for instantiating an object. The task of constructors is to
initialize(assign values) to the data members of the class when an object of the class is
created. In Python the __init__() method is called the constructor and is always called when
an object is created.
Syntax of constructor declaration :
def __init__(self):
DESTRUCTOR IN PYTHON
Destructors are called when an object gets destroyed. Python has a garbage collector that
handles memory management automatically.
The __del__() method is a known as a destructor method in Python. It is called when all
references to the object have been deleted i.e when an object is garbage collected.
Syntax of destructor declaration :
def __del__(self):
GETTER AND SETTER METHOD
In Python, getters and setters are not the same as those in other object-oriented
programming languages. Basically, the main purpose of using getters and setters in object-
oriented programs is to ensure data encapsulation.
27
ENCAPSULATION
Encapsulation is one of the fundamental concepts in object-oriented programming (OOP).
It describes the idea of wrapping data and the methods that work on data within one unit.
This puts restrictions on accessing variables and methods directly and can prevent the
accidental modification of data.
A class is an example of encapsulation as it encapsulates all the data that is member
functions, variables, etc. The goal of information hiding is to ensure that an object’s state is
always valid by controlling access to attributes that are hidden from the outside world.
PROTECTED MEMBERS
Protected members (in C++ and JAVA) are those members of the class that cannot be
accessed outside the class but can be accessed from within the class and its subclasses. To
accomplish this in Python, just follow the convention by prefixing the name of the member
by a single underscore “_”.
PRIVATE MEMBERS
Private members are similar to protected members, the difference is that the class
members declared private should neither be accessed outside the class nor by any base
class. In Python, there is no existence of Private instance variables that cannot be
accessed except inside a class.
INHERITANCE
It provides the reusability of a code. We don’t have to write the same code again and
again. Also, it allows us to add more features to a class without modifying it.
It is transitive in nature, which means that if class B inherits from another class A, then
all the subclasses of B would automatically inherit from class A.
Inheritance offers a simple, understandable model structure.
Less development and maintenance expenses result from an inheritance.
Syntax
Class BaseClass:
{Body}
Class DerivedClass(BaseClass):
{Body}
Adding Properties
One of the features that inheritance provides is inheriting the properties of the parent class
as well as adding new properties of our own to the child class. Let us see this with an
example:
Single inheritance:
Single inheritance enables a derived class to inherit properties from a single parent
class, thus enabling code reusability and the addition of new features to existing code.
Multiple Inheritance:
When a class can be derived from more than one base class this type of inheritance is called
multiple inheritances. In multiple inheritances, all the features of the base classes are
inherited into the derived class.
Multilevel Inheritance:
In multilevel inheritance, features of the base class and the derived class are further
inherited into the new derived class. This is similar to a relationship representing a
child and a grandfather.
Hierarchical Inheritance:
30
When more than one derived class are created from a single base this type of inheritance is
called hierarchical inheritance. In this program, we have a parent (base) class and two child
(derived) classes.
Hybrid Inheritance:
Inheritance consisting of multiple types of inheritance is called hybrid inheritance
POLYMORPHISM
An abstract
An abstract class can be considered as a blueprint for other classes. It allows you to create
a set of methods that must be created within any child classes built from the abstract class.
31
A class which contains one or more abstract methods is called an abstract class. An
abstract method is a method that has a declaration but does not have an implementation.
While we are designing large functional units we use an abstract class. When we want to
provide a common interface for different implementations of a component, we use an
abstract class.
By default, Python does not provide abstract classes. Python comes with a module that
provides the base for defining Abstract Base classes(ABC) and that module name is
ABC. ABC works by decorating methods of the base class as abstract and then registering
concrete classes as implementations of the abstract base. A method becomes abstract when
decorated with the keyword @abstractmethod.
INTERFACE
An interface in Python is a collection of method signatures that should be provided by
the implementing class.
An interface contains methods that are abstract in nature. The abstract methods will
have the only declaration as there is no implementation.
An interface in Python is defined using Python class and is a subclass of an interface.
Interface which is the parent interface for all interfaces.
The implementations will be done by the classes which will inherit the
interface. Interfaces in Python are a little different from other languages like Java or
C# or C++.
Implementing an interface is a way of writing organized code.
INFORMAL INTERFACE
Python informal interface is also a class that defines methods that can be overridden but
without force enforcement. An informal interface also called Protocols or Duck Typing. The
duck typing is actually we execute a method on the object as we expected an object to have,
instead of checking the type of an object.
FORMAL INTERFACE
the object of the interface. So we use a base class to create an object, and we can say that the
object implements an interface. And we will use the type function to confirm that the object
implements a particular interface or not.
UNIT V
PYTHON WEB APPLICATION PROJECT TEMPLATES
Python in Visual Studio supports developing web projects in Bottle, Flask, and Django
frameworks through project templates and a debug launcher that can be configured to handle
various frameworks. These templates include a requirements.txt file to declare the necessary
dependencies. When creating a project from one of these templates, Visual Studio prompts
you to install those packages (see Install project requirements later in this article).
You can also use the generic Web Project template for other frameworks such as Pyramid.
In this case, no frameworks are installed with the template. Instead, install the necessary
packages into the environment you're using for the project (see Python environments window
- Package tab).
The generic Web Project template, mentioned earlier, provides only an empty Visual Studio
project with no code and no assumptions other than being a Python project.
All the other templates are based on the Bottle, Flask, or Django web frameworks, and fall
into three general groups as described in the following sections. The apps created by any of
these templates contain sufficient code to run and debug the app locally. Each one also
provides the necessary WSGI app object (python.org) for use with production web servers.
Web group
All <Framework> Web Project templates create a starter web app with an identical design
regardless of the chosen framework. The app has Home, About, and Contact pages, along
with a nav bar and responsive design using Bootstrap. Each app is appropriately configured
to serve static files (CSS, JavaScript, and fonts), and uses a page template mechanism
appropriate for the framework.
Template Description
Bottle Web Generates an app whose static files are contained in the static folder and handled
Project through code in app.py. Routing for the individual pages is contained in routes.py,
and the views folder contains the page templates.
Django Web Generates a Django project and a Django app with three pages, authentication
Project support, and a SQLite database (but no data models). For more information,
see Django templates and Learn Django Step 4.
Flask Web Generates an app whose static files are contained in the static folder. Code
Project in views.py handles routing, with page templates using the Jinja engine contained in
the templates folder. The runserver.py file provides startup code. See
Before reading further, try out a working version of this app. The complete code for the app
is in the folder named actors_app.
1. You type an actor’s name into the form and submit it.
2. If the actor’s name is in the data source (ACTORS), the app loads a detail page for that
actor. (Photos of bears 🐻 stand in for real photos of the actors.)
3. Otherwise, you stay on the same page, the form is cleared, and a message tells you that
actor is not in the database.
First we have the route, as usual, but with a new addition for handling form data: methods .
Bootstrap 4 was used in all templates in the Books Hopper app, but Bootstrap-Flask was not.
Bootstrap styles were all coded in the usual ways.
Templates
Folder structure for a Flask app
A proper Flask app is going to use multiple files — some of which will be template files. The
organization of these files has to follow rules so the app will work. Here is a diagram of the
typical structure:
my-flask-app
├── static/
│ └── css/
│ └── main.css
├── templates/
│ ├── index.html
│ └── student.html
├── data.py
└── students.py
Summary: The route tells Flask, “When this URL is received, run the following function.”
Then everything up to the final return in the function is preparing the data that will be in
the render_template() function. We also have an except clause, in case the route’s variable
value is unusable.
34
import sqlite3
conn = sqlite3.connect('test.db')
Create a Table
Following Python program will be used to create a table in the previously created database.
#!/usr/bin/python
import sqlite3
conn = sqlite3.connect('test.db')
print "Opened database successfully";
conn.close()
35
Following Python code shows how to use UPDATE statement to update any record and then
fetch and display the updated records from the COMPANY table.
#!/usr/bin/python
import sqlite3
conn = sqlite3.connect('test.db')
print "Opened database successfully";
ID = 2
NAME = Allen
ADDRESS = Texas
SALARY = 15000.0
ID = 3
NAME = Teddy
ADDRESS = Norway
SALARY = 20000.0
ID = 4
NAME = Mark
ADDRESS = Rich-Mond
SALARY = 65000.0
Following Python code shows how to use DELETE statement to delete any record and then
fetch and display the remaining records from the COMPANY table.
Let’s suppose you want to get some information from a website? Let’s say an article from
the geeksforgeeks website or some news article, what will you do? The first thing that may
come in your mind is to copy and paste the information into your local media. But what if
you want a large amount of data on a daily basis and as quickly as possible. In such
situations, copy and paste will not work and that’s where you’ll need web scraping.
In this article, we will discuss how to perform web scraping using the requests library and
beautifulsoup library in Python.
Requests Module
Requests library is used for making HTTP requests to a specific URL and returns the
response. Python requests provide inbuilt functionalities for managing both the request and
response.
Installation
Requests installation depends on the type of operating system, the basic command
anywhere would be to open a command terminal and run,
pip install requests
Making a Request
Python requests module has several built-in methods to make HTTP requests to specified
URI using GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to
either retrieve data from a specified URI or to push data to a server. It works as a request-
response protocol between a client and a server. Here we will be using the GET request.
GET method is used to retrieve information from the given server using a given URI. The
GET method sends the encoded user information appended to the page request.
Response object
When one makes a request to a URI, it returns a response. This Response object in terms of
python is returned by requests.method(), method being – get, post, put, etc. Response is a
powerful object with lots of functions and attributes that assist in normalizing data or
creating ideal portions of code. For example, response.status_code returns the status code
from the headers itself, and one can check if the request was processed successfully or not.
Response objects can be used to imply lots of features, methods, and functionalities.
37
Program:
import requests
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# print request object
print(r.url)
# print status code
print(r.status_code)
Output:
https://www.geeksforgeeks.org/python-programming-language/
200
BeautifulSoup Library
BeautifulSoup is used extract information from the HTML and XML files. It provides a
parse tree and the functions to navigate, search or modify this parse tree.
Beautiful Soup is a Python library used to pull the data out of HTML and XML files for
web scraping purposes. It produces a parse tree from page source code that can be
utilized to drag data hierarchically and more legibly.
It was first presented by Leonard Richardson, who is still donating to this project, and
this project is also supported by Tide lift (a paid subscription tool for open-source
supervision).
Beautiful soup3 was officially released in May 2006, Latest version released by
Beautiful Soup is 4.9.2, and it supports Python 3 and Python 2.4 as well.
Beautiful Soup is a Python library developed for quick reversal projects like screen-
scraping. Three features make it powerful:
1. Beautiful Soup provides a few simple methods and Pythonic phrases for guiding,
searching, and changing a parse tree: a toolkit for studying a document and removing what
you need. It doesn’t take much code to document an application.
2. Beautiful Soup automatically converts incoming records to Unicode and outgoing forms
to UTF-8. You don’t have to think about encodings unless the document doesn’t define an
encoding, and Beautiful Soup can’t catch one. Then you just have to choose the original
encoding.
3. Beautiful Soup sits on top of famous Python parsers like LXML and HTML, allowing
you to try different parsing strategies or trade speed for flexibility.
Installation
38
To install Beautifulsoup on Windows, Linux, or any operating system, one would need pip
package. To check how to install pip on your operating system, check out – PIP Installation
– Windows || Linux. Now run the below command in the terminal.
Inspecting Website
Before getting out any information from the HTML of the page, we must understand the
structure of the page. This is needed to be done in order to select the desired data from the
entire page. We can do this by right-clicking on the page we want to scrape and select
inspect element.
39
After clicking the inspect button the Developer Tools of the browser gets open. Now
almost all the browsers come with the developers tools installed, and we will be using
Chrome for this tutorial.
The developer’s tools allow seeing the site’s Document Object Model (DOM) . If you don’t
know about DOM then don’t worry just consider the text displayed as the HTML structure
of the page.
40
After getting the HTML of the page let’s see how to parse this raw HTML code into some
useful information. First of all, we will create a BeautifulSoup object by specifying the
parser we want to use.
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# check status code for response received
# success code - 200
print(r)
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
Output:
This information is still not useful to us, let’s see another example to make some clear
picture from this. Let’s try to extract the title of the page.
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
41
Finding Elements
Now, we would like to extract some useful data from the HTML content. The soup object
contains all the data in the nested structure which could be programmatically extracted. The
website we want to scrape contains a lot of text so now let’s scrape all those content. First,
let’s inspect the webpage we want to scrape.
42
In the above image, we can see that all the content of the page is under the div with class
entry-content. We will use the find class. This class will find the given tag with the given
attribute. In our case, it will find all the div having class as entry-content. We have got all
the content from the site but you can see that all the images and links are also scraped. So
our next task is to find only the content from the above-parsed HTML. On again inspecting
the HTML of our website –
43
We can see that the content of the page is under the <p> tag. Now we have to find all the p
tags present in this class. We can use the find_all class of the BeautifulSoup.
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
s = soup.find('div', class_='entry-content')
content = s.find_all('p')
print(content)
Output:
44
Finding Elements by ID
In the above example, we have found the elements by the class name but let’s see how to
find elements by id. Now for this task let’s scrape the content of the leftbar of the page. The
first step is to inspect the page and see the leftbar falls under which tag.
The above image shows that the leftbar falls under the <div> tag with id as main. Now
lets’s get the HTML content under this tag. Now let’s inspect more of the page get the
content of the leftbar.
45
We can see that the list in the leftbar is under the <ul> tag with the class as leftBarList and
our task is to find all the li under this ul.
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
# Finding by id
s = soup.find('div', id= 'main')
print(content)
Output:
46
In the above examples, you must have seen that while scraping the data the tags also get
scraped but what if we want only the text without any tags. Don’t worry we will discuss the
same in this section. We will be using the text property. It only prints the text from the tag.
We will be using the above example and will remove all the tags from them.
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
s = soup.find('div', class_='entry-content')
lines = s.find_all('p')
Extracting Links
Program
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
# find all the anchor tags with "href"
for link in soup.find_all('a'):
print(link.get('href'))
On again inspecting the page, we can see that images lie inside the img tag and the link of
that image is inside the src attribute. See the below image –
Program:
import requests
from bs4 import BeautifulSoup
# Making a GET request
r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')
# Parsing the HTML
48