0% found this document useful (0 votes)
10 views93 pages

Introduction to Pandas

The document outlines an agenda for a session on using Pandas in Python, covering topics such as data import, manipulation, analysis, and visualization. It introduces key concepts like Series and DataFrame objects, their features, and comparisons with Numpy. The document also provides practical examples and methods for working with data in Pandas.

Uploaded by

Satyam Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views93 pages

Introduction to Pandas

The document outlines an agenda for a session on using Pandas in Python, covering topics such as data import, manipulation, analysis, and visualization. It introduces key concepts like Series and DataFrame objects, their features, and comparisons with Numpy. The document also provides practical examples and methods for working with data in Pandas.

Uploaded by

Satyam Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Agenda for

Today’s Session
01 03 05

Import Data Analysis Summary


Convention

Introduction to Pandas Data Manipulation Data Visualization

02 04 06

www.intellipaat.com
Python Certification Course

www.intellipaat.com
Introduction to Pandas

www.intellipaat.com
Agenda for
Today’s Session
01 03

Who created Numpy vs Pandas


Pandas?

What is Pandas
Features of Pandas
02 04
www.intellipaat.com
What is Pandas?

Open-source Python
01
library

Introductio 02
Simple yet powerful and
expressive tool

n to Pandas Data Manipulation &


Analysis
03

www.intellipaat.com
Where did the name Pandas come from?

 The name Pandas is derived from the word Panel Data

Introductio  Panel Data is multi-dimensional data involving


measurements over time
n to Pandas

Panel Data
www.intellipaat.com
Who created Pandas?

Introductio
n to Pandas

Created in 2015 by Wes McKinney

www.intellipaat.com
Features of Pandas:

Introductio Series object Data alignment


Slicing,
Indexing,
and DataFrame
n to Pandas Subseting

01 02 03 04 05 06

Handling of Group by
missing data functionality

www.intellipaat.com
Features of Pandas:

Introductio Merging and


joining
Hierarchical
labeling of axes
Time
series-specific

n to Pandas functionality

06 06 07 08 09 10 06

Robust Input
Reshaping
Output tool

www.intellipaat.com
Pandas vs Numpy

Introductio
n to Pandas
Pandas performs better than Numpy performs better for 50k
06
numpy for 500k rows or more. rows or less.

www.intellipaat.com
Pandas vs Numpy

Introductio
n to Pandas
Pandas Series Object is more Elements in NumPy arrays are
06
flexible as you can define your accessed by their default integer
own labeled index to index and position
access elements of an array

www.intellipaat.com
www.intellipaat.com

India : +91-7847955955

US : 1-800-216-8930 (TOLL FREE)

[email protected]

www.intellipaat.com
How to import Pandas in Python?

www.intellipaat.com
How to import Pandas in Python?

Working
with import pandas as pd

Pandas
06

www.intellipaat.com
What kind of data does suit Pandas the most?

Working
with
Pandas Tabular data

Arbitrary Matrix Time Series Data

www.intellipaat.com
Data-set in Pandas

Working
with One
Dimensional
Multi
Dimensional

Pandas
Series Object DataFrame

One Dimensional Multi Dimensional

www.intellipaat.com
What is a series object? Series Object

 One-dimensional labeled array


One Dimensional

Working  Contains data of similar or mixed types

with  Example:
data= [1, 2, 3, 4]
Pandas series1 = pd.Series(data)
series1

One Multi
Dimensional Dimensional

www.intellipaat.com
How to check the type? Series Object

One Dimensional

Working
with
Pandas type(series1)

One Multi
Dimensional Dimensional

www.intellipaat.com
Create different Series Object Series Object

datatypes
One Dimensional

Working
with
Pandas Array Dictionary Scalar

One Multi
Dimensional Dimensional

www.intellipaat.com
How to create a series object? Series Object

One Dimensional

Introductio
n to Pandas pd.Series(data)

One Multi
Dimensional Dimensional

www.intellipaat.com
How to change the index name? Series Object

One Dimensional

Introductio
n to Pandas
a
b
c
d

One Multi
Dimensional Dimensional

www.intellipaat.com
How to change the index name? Series Object

One Dimensional

Introductio
n to Pandas series1 = pd.Series([1, 2, 3, 4]index=['a', 'b', 'c', 'd’]))

series1

One Multi
Dimensional Dimensional

www.intellipaat.com
What is a DataFrame? DataFrame

 Two-dimensional labeled data structures Multi Dimensional

with columns of potentially different types


Introductio

n to Pandas Example:

One Multi
Dimensional Dimensional

www.intellipaat.com
Features of DataFrame DataFrame

Mutable Size
Multi Dimensional
02

Introductio Different
Column 01 Labeled axes
03
n to Pandas types

Features Arithmetic
03 operations on
rows and
columns

One Multi
Dimensional Dimensional

www.intellipaat.com
How to create a DataFrame? DataFrame

Multi Dimensional

Introductio
n to Pandas pd.DataFrame(data)

One Multi
Dimensional Dimensional

www.intellipaat.com
How to create a DataFrame? DataFrame

Multi Dimensional

Introductio List Dictionary


01 02
n to Pandas
Series Numpy ND array
03 04

One Multi
Dimensional Dimensional

www.intellipaat.com
Create a DataFrame from a List DataFrame

Multi Dimensional

Introductio data = [1,2,3,4,5]

n to Pandas df = pd.DataFrame(data)

df

One Multi
Dimensional Dimensional

www.intellipaat.com
Create a DataFrame from a Dictionary DataFrame

Multi Dimensional

Introductio dict1 = {'fruit':['apple', 'mango', 'banana'],'count':[10,12,13]}

n to Pandas df = pd.DataFrame(dict1)

df

One Multi
Dimensional Dimensional

www.intellipaat.com
Create a DataFrame from a Series DataFrame

Multi Dimensional

Introductio data = pd.Series([6,12], index=['a','b'])

n to Pandas df = pd.DataFrame([data])

df

One Multi
Dimensional Dimensional

www.intellipaat.com
Create a DataFrame from a numpy ND array DataFrame

Multi Dimensional

Introductio import numpy as np

n to Pandas data= np.array([['a','b'], [6,12]])

df = pd.DataFrame({'A':data[:,0],'B':data[:,1]})

df

One Multi
Dimensional Dimensional

www.intellipaat.com
Understanding Pandas Operations with
example

www.intellipaat.com
Hands-on Demonstration

Import Convention

Data Analysis

Data Manipulation

Data Visualization

www.intellipaat.com
• Dataset is based on product reviews from Amazon

Exploring • Stored in the .csv format

the Data-set

www.intellipaat.com
Importing First read the data
Data-set
import pandas as pd
with Product_Review=pd.read_csv("Amazon_Products_Review.csv")

Pandas
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Let’s explore the type
Importing
Data with type(Product_Review)
Pandas
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
For files other than CSV format
Importing
Data with pd.read_table(“filename”)

pd.read_excel(“filename”)
Pandas pd.read_sql(query, connection_object)

pd.read_json(json_string)
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Read from SQL Query or Database Table
Importing
Data with
Pandas >>> from sqlnew import create_table

>>> engine = create_table('sqlite:///:memory:')

>>> pd.read_sql(SELECT * FROM my_table;, new1)


Importing Convention
>>> pd.read_sql_table('my_table’, new1)
Data Analysing
>>> pd.read_sql_query(SELECT * FROM my_table;’, new1)

Data Manipulation

Data Visualization
www.intellipaat.com
Analyzing Data-set

www.intellipaat.com
Basic Print the first 5 rows of the DataFrame

DataFrame Product_Review.head()

Functionality
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Basic Print the last 5 rows of the DataFrame

DataFrame Product_Review.tail()

Functionality
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Basic Print the number of rows and columns

DataFrame Product_Review.shape

Functionality
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Basic Information of Index, Datatype and Memory

DataFrame Product_Review.info

Functionality
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame for Pandas Merge

Merge, df1 df2 df1 df2

Join and Inner Merge/


Inner join
Right Merge
Right Join

Concatenate df1 df2 df1 df2

Left Merge/ Outer Merge/


Importing Convention Left Join Outer Join

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame for Pandas Merge

Merge, DataFrame-1
player = ['Player1', 'Player2', 'Player3']

Join and points = [8, 9, 5]

Concatenate title = ['Game1', 'Game2', 'Game3']

df1 = pd.DataFrame({'Player': player,'Points': points,'Title': title})

df1 = df1[['Player', 'Points', 'Title']]


Importing Convention
df1

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame for Pandas Merge

Merge, DataFrame-2
player = ['Player1','Player5','Player6']

Join and power = ['Punch','Kick', 'Elbow']

Concatenate title = ['Game1','Game5','Game6']

df2 = pd.DataFrame({'Player': player, 'Power': power,'Title': title})

df2 = df2[['Player', 'Power', 'Title’]]


Importing Convention
df2

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Inner Merge

Merge, Inner Merge

Join and df1.merge(df2, on='Title', how='inner')

Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Left Merge

Merge, Left Merge

Join and df1.merge(df2, on='Title', how=‘left')

Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Right Merge

Merge, Right Merge

Join and df1.merge(df2, on='Title', how=‘right')

Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Outer Merge

Merge, Outer merge

Join and df1.merge(df2, on='Title', how=‘outer')

Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame for Pandas Join

Merge, DataFrame-1
player = ['Player1', 'Player2', 'Player3']

Join and points = [8, 9, 5]

Concatenate title = ['Game1', 'Game2', 'Game3']

df1 = pd.DataFrame({'Player': player,'Points': points,'Title': title})

df1.set_index('Player')
Importing Convention
df1

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame for Pandas Join

Merge, DataFrame-2
player = ['Player1','Player5','Player6']

Join and power = ['Punch','Kick', 'Elbow']

Concatenate title = ['Game1','Game5','Game6']

df2 = pd.DataFrame({'Player': player, 'Power': power,'Title': title})

df2.set_index('Player')
Importing Convention
df2

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df1
df2

Inner Join

Merge, Inner Join


df1.join(df2, how='inner', lsuffix='_x', rsuffix='_y')

Join and
Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Left Merge

Merge, Left Join


df1.join(df2, how='left', lsuffix='_x', rsuffix='_y')

Join and
Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df1

right Join

Merge, Right Join


df1.join(df2, how='left', lsuffix='_x', rsuffix='_y')

Join and
Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
df1 df2

Outer Join

Merge, Outer join


df1.join(df2, how='outer', lsuffix='_x', rsuffix='_y')

Join and
Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Merge, Concatenate

Join and Pd.concat([df1,df2])

Concatenate
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Mean

DataFrame Product_Review.mean()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Median

DataFrame Product_Review.median()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Standard Deviation:

DataFrame Product_Review.std()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Maximum of each column:

DataFrame Product_Review.max()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Minimum of each column:

DataFrame Product_Review.min()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Count of non-null values in each column:

DataFrame Product_Review.count()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Summary statistics for numerical column

DataFrame Product_Review.describe()

Methods
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas e.g.: Divide every value in the Product Rating column by 2

Mathematical Product_Review[“Product_Rating”] /2

Operations
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Manipulating Data-set

www.intellipaat.com
01
Selecting by Position
DataFrame 01

Indexing
Indexing
02
Selecting by Label 02

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by Position
DataFrame Product_Review.iloc[:,0]

Indexing
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by Position
DataFrame Product_Review.iloc[0:5,4]

Indexing
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by Position
DataFrame Product_Review.iloc[:,:]

Indexing
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by Position
DataFrame Product_Review.iloc[6:,4:]

Indexing
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by Position
DataFrame Produt_Reviews= Product_Reviews.iloc[:,1]

Indexing Product_Reviews.head()

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by label:
DataFrame
Indexing Prodcut_Review.loc[:5,"Product_Title"]

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
01
Selecting by Position

Selecting by Label 02

Selecting by label:
DataFrame
Indexing Product_Review.loc[:5,"Product_Title","Product_Rating"]

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Setting a value to one specific column
DataFrame
Setting Product_Review['Platform'] = 6

Product_Review

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Double up all numeric values using lambda function
Applying
Functions on f = lambda x: x*2

DataFrame df.apply(f)

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
By default in ascending order.
DataFrame
Sorting Product_Review.sort_values(by=‘Product_Rating’)

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas For descending order make ascending=False

DataFrame
Product_Review.sort_values(‘Product_Rating’, ascending=False)
Sorting
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Rank the Product_Rating column

DataFrame
Ranking Product_Review["Product_Rating"].rank()

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Drop Product_Rating column from the dataset

DataFrame
Dropping Product_Review.drop('Product_Rating', axis=1)

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Filtering the column by value:

DataFrame
filter1 = Product_Review["Product_Rating"] > 3
Filtering filter1.head()

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Filtering the column by value:

DataFrame filter1 = Product_Review["Product_Rating"] > 3

filtered_new = Product_Review[filter1]

Filtering filtered_new.head()

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Pandas Filtering the column by numeric and Boolean value:
filter2 = (Product_Review["Product_Rating"] > 3)

DataFrame & (Product_Review["Product_Category"] == "


Footwear")

Filtering filtered_review = Product_Review[filter2]

filtered_review
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Data-set Visualization

www.intellipaat.com
Histogram:
Data %matplotlib inline

Visualization Product_Review[Product_Review["Product_Category"]
== "Footwear"]["Product_Rating"].plot(kind="hist")

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Data
Scatter plot
Visualization
Using Product_Review.plot.scatter(x="Product_Launch_Year",
y="Product_Rating")
Pandas
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame.plot():

Data import pandas as pd

import numpy as np
Visualization df = pd.DataFrame(np.random.randn(20,4),index=pd.date_range('1/1/2019',

periods=20), columns=list(‘PQRS'))

Importing Convention df.plot()

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame.plot.bar():

Data import pandas as pd

import numpy as np
Visualization df = pd.DataFrame(np.random.rand(20,4),columns=[‘p',’q',’r',’s')

df.plot.bar()

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame.plot.bar(stacked=True):

Data import pandas as pd

df = pd.DataFrame(np.random.rand(20,4),columns=[‘p',’q',’r',’s')

Visualization df.plot.bar(stacked=True)

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame.plot.barh(stacked=True):

Data import pandas as pd


import numpy as np

Visualization df = pd.DataFrame(np.random.rand(20,4),columns=[‘p',’q',’r',’s')
df.plot.barh(stacked=True)

Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
DataFrame.plot.hist():
import pandas as pd
Data import numpy as np

Visualization df = pd.DataFrame({'p':np.random.randn(500)+1,'q':np.random.randn(500),'c':

np.random.randn(500) - 1}, columns=['p', 'q', 'r'])

df.plot.hist(bins=20)
Importing Convention

Data Analysing

Data Manipulation

Data Visualization
www.intellipaat.com
Summary

Introduction to Pandas Data Visualization

Importing convention Data Analyzing

Data Manipulation

www.intellipaat.com
www.intellipaat.com

India : +91-7847955955

US : 1-800-216-8930 (TOLL FREE)

[email protected]

www.intellipaat.com
India : +91-7847955955

US : 1-800-216-8930 (TOLL FREE)

[email protected]

24X7 Chat with our Course Advisor

www.intellipaat.com

You might also like