0% found this document useful (0 votes)

14 views

Exercise 3

good

Uploaded by

Ram Aypn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Exercise 3

good

Uploaded by

Ram Aypn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Exercise3: Working with Pandas data frames

Aim:

To perform functions for analyzing, cleaning, exploring, and manipulating data

Description:

1. Import Pandas:

Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
Create an alias with the as keyword while importing:

Now the Pandas package can be referred to as pd instead of pandas

import pandas as pd

Checking Pandas Version:

The version string is stored under version attribute.

import pandas as pd
print(pd.__version__)

output : 1.0.3

2. Pandas serious

A Pandas Series is like a column in a table. It is a one-dimensional array holding data of any
type.
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Return the first value of the Series:

print(myvar[0])

Output: 1

Create Labels:

With the index argument can name the own labels.

import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

Key/Value Objects as Series:

Create a simple Pandas Series from a dictionary:

import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
3. DataFrames:

Data sets in Pandas are usually multi-dimensional tables, called DataFrames. Series is
like a column, a DataFrame is the whole table.

Create a DataFrame from two Series:

import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)

Locate Row:

The DataFrame is like a table with rows and columns. Pandas use the loc attribute to
return one or more specified row(s).

To return row 0:

#refer to the row index

print(df.loc[0])
To return row 0 and 1:

#use a list of indexes:

print(df.loc[[0, 1]])

Named Indexes:
With the index argument, you can name your own indexes.
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)

Load a CSV file into a Pandas DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Read CSV Files:
A simple way to store big data sets is to use CSV files (comma separated files). CSV files
contains plain text and is a well know format that can be read by everyone including Pandas.
Here a CSV file called 'data.csv’ is used. to_string() is used to print the entire DataFrame

import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())

Find max_rows:

The number of rows returned is defined in Pandas option settings. System's maximum
rows with the pd.options.display.max_rows statement.

Check the number of maximum returned rows:

import pandas as pd
print(pd.options.display.max_rows)

output: 60

4. Analyzing DataFrames:

Viewing the Data

The head() method returns the headers and a specified number of rows, starting from the top.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(10))

Print the first 5 rows of the DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())

Print the last 5 rows of the DataFrame:

There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting from the
bottom.

print(df.tail())
5. Data Cleaning

Data cleaning means fixing bad data in the data set.

Bad data could be:

 Empty cells
 Data in wrong format
 Wrong data
 Duplicates

Empty Cells

Empty cells can potentially give you a wrong result when you analyze data.

Remove Rows

One way to deal with empty cells is to remove rows that contain empty cells. Since data sets can
be very big, and removing a few rows will not have a big impact on the result.

import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
# the result that some rows have been removed (row 18, 22 and 28).

Replace Empty Values:

Another way of dealing with empty cells is to insert a new value instead. The fillna() method
allows us to replace empty cells with a value:

import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)

# Empty cells got the value 130 (in row 18, 22 and 28)

To remove duplicates, use the drop_duplicates() method.

import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace = True)
print(df.to_string())

#Notice that row 12 has been removed from the result

6. Data Correlations
A great aspect of the Pandas module is the corr() method. The corr() method calculates the
relationship between each column in your data set.
import pandas as pd
df = pd.read_csv('data.csv')
print(df.corr())

7. Plotting

Pandas uses the plot() method to create diagrams. Pyplot is used as submodule of the Matplotlib
library to visualize the diagram on the screen.

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
plt.show()
Scatter Plot:

Specify that you want a scatter plot with the kind argument:
kind = 'scatter'
A scatter plot needs an x- and a y-axis. Here "Duration" for the x-axis and "Calories" for the y-axis.
#Three lines to make our compiler able to draw:
import sys
import matplotlib
matplotlib.use('Agg')
import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()

#Two lines to make our compiler able to draw:

plt.savefig(sys.stdout.buffer)
sys.stdout.flush()
Histogram:

The kind argument is used to specify a histogram:

kind = 'hist'

A histogram needs only one column. Here, histogram shows that how many workouts lasted
between 50 and 60 minutes?

#Three lines to make our compiler able to draw:

import sys
import matplotlib
matplotlib.use('Agg')

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df["Duration"].plot(kind = 'hist')
plt.show()

#Two lines to make our compiler able to draw:

plt.savefig(sys.stdout.buffer)
sys.stdout.flush()

Benjamin Packwood - Great Great Grandfather of Barry McAllister
100% (2)
Benjamin Packwood - Great Great Grandfather of Barry McAllister
24 pages
ATK - Serving Low Income Consumer PDF
No ratings yet
ATK - Serving Low Income Consumer PDF
10 pages
Data Dictionary MP2 SQL 6.0
100% (1)
Data Dictionary MP2 SQL 6.0
90 pages
West Borough Primary School 2
No ratings yet
West Borough Primary School 2
2 pages
Pandas
No ratings yet
Pandas
21 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
introduction to pandas
No ratings yet
introduction to pandas
14 pages
Lecture 7 Understanding dataFrames in Python and R
No ratings yet
Lecture 7 Understanding dataFrames in Python and R
17 pages
Pandas
No ratings yet
Pandas
41 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
asfasdas
No ratings yet
asfasdas
36 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
Pandas cheat sheet
No ratings yet
Pandas cheat sheet
19 pages
Notes on Pandas.
No ratings yet
Notes on Pandas.
7 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas AI
No ratings yet
Pandas AI
14 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas
No ratings yet
Pandas
16 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Pandas
No ratings yet
Pandas
12 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
FDS EXP 3
No ratings yet
FDS EXP 3
5 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Pandas
No ratings yet
Pandas
25 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas
No ratings yet
Pandas
5 pages
1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
No ratings yet
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Week 2 Tutorial Questions (ECF5923)
No ratings yet
Week 2 Tutorial Questions (ECF5923)
4 pages
EST I - Math - December 2021
No ratings yet
EST I - Math - December 2021
14 pages
General Lottery Pool Agreement Template
No ratings yet
General Lottery Pool Agreement Template
1 page
BS 6143 1990 PDF
No ratings yet
BS 6143 1990 PDF
23 pages
Storey Shear Calculation
No ratings yet
Storey Shear Calculation
9 pages
f01_850_16_en
No ratings yet
f01_850_16_en
1 page
Gunjal shubham 2
No ratings yet
Gunjal shubham 2
47 pages
May 31, 2019 Strathmore Times
No ratings yet
May 31, 2019 Strathmore Times
20 pages
NFPA 52 Nautral Gas
No ratings yet
NFPA 52 Nautral Gas
52 pages
Night School 21 Session 1
No ratings yet
Night School 21 Session 1
109 pages
Astm E3-2011 - 5000
No ratings yet
Astm E3-2011 - 5000
3 pages
AML Policy 2024
No ratings yet
AML Policy 2024
33 pages
CNSSI-1253F Privacy Overlay
No ratings yet
CNSSI-1253F Privacy Overlay
127 pages
REPLEVIN
No ratings yet
REPLEVIN
16 pages
Capacity - RNC Capacity Management - Smart Phone Impact
No ratings yet
Capacity - RNC Capacity Management - Smart Phone Impact
21 pages
Immediate download Microprocessors and Microcontrollers 1st Edition N. Senthil Kumar ebooks 2025
100% (1)
Immediate download Microprocessors and Microcontrollers 1st Edition N. Senthil Kumar ebooks 2025
76 pages
Business Studies Exam Questions For JSS2 Third Term
No ratings yet
Business Studies Exam Questions For JSS2 Third Term
15 pages
50UCUPV Commercial-Brochure
No ratings yet
50UCUPV Commercial-Brochure
16 pages
Lightnin Compact Series Mixers
No ratings yet
Lightnin Compact Series Mixers
21 pages
LEASE 105-117
No ratings yet
LEASE 105-117
10 pages
Biomedical Equipment Technology Bet1
No ratings yet
Biomedical Equipment Technology Bet1
1 page
Ur m77 Rev3 Sep 2021 Ul
No ratings yet
Ur m77 Rev3 Sep 2021 Ul
3 pages
Journal of Alloys and Compounds
No ratings yet
Journal of Alloys and Compounds
10 pages
Beagleboard Beagleboneblack Elinux
No ratings yet
Beagleboard Beagleboneblack Elinux
18 pages
Handover Document
No ratings yet
Handover Document
6 pages
Blueray HT f5530k
No ratings yet
Blueray HT f5530k
59 pages

Exercise 3

Uploaded by

Exercise 3

Uploaded by

Exercise3: Working with Pandas data frames

To perform functions for analyzing, cleaning, exploring, and manipulating data

Now the Pandas package can be referred to as pd instead of pandas

Checking Pandas Version:

The version string is stored under __version__ attribute.

With the index argument can name the own labels.

Key/Value Objects as Series:

Create a simple Pandas Series from a dictionary:

Create a DataFrame from two Series:

#refer to the row index

#use a list of indexes:

Load a CSV file into a Pandas DataFrame:

Check the number of maximum returned rows:

Viewing the Data

Print the first 5 rows of the DataFrame:

Print the last 5 rows of the DataFrame:

Data cleaning means fixing bad data in the data set.

Bad data could be:

Replace Empty Values:

To remove duplicates, use the drop_duplicates() method.

#Notice that row 12 has been removed from the result

import matplotlib.pyplot as plt

#Two lines to make our compiler able to draw:

The kind argument is used to specify a histogram:

#Three lines to make our compiler able to draw:

#Two lines to make our compiler able to draw:

You might also like

The version string is stored under version attribute.