0% found this document useful (0 votes)

5 views

Python Pandas 1

The document is a comprehensive guide on data cleaning using Python Pandas, covering various topics such as Series and DataFrame operations, data manipulation techniques, and statistical functions. It includes practical examples and code snippets for creating, modifying, and analyzing data structures. Key sections include slicing, appending, and performing operations on Series and DataFrames, as well as handling NULL values and descriptive statistics.

Uploaded by

germastudio5b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Python Pandas 1

Uploaded by

germastudio5b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Master

Data Cleaning
with Python Pandas

Rajesh Jogi

1/86
Table of Contents
 1 Different Things on Series with Pandas
1.1 Slicing Series
1.2 Append Series
1.3 Operation on Series
 2 DataFrame
2.1 Create DataFrame
2.2 Dataframe of Random Numbers with Date Indices
2.3 Delete Column in DataFrame
2.4 Data Selection in Dataframe
2.5 Set Value
2.6 Dealing with NULL Values
2.7 Descriptive Statistics
 3 Apply function on Dataframe
3.1 Merge Dataframes
3.2 Importing multiple CSV files in DataFrame
4 LIKE OPERATION IN PANDAS
5 Regex in Pandas dataframe
6 Replace values in dataframe
7 Group By
8 Loading Data in Chunks
9 Stack & unstack in Pandas
10 PIVOT Tables
11 Hierarchical indexing
12 SWAP Columns in Hierarchical indexing
13 Crosstab in Pandas
 14 Row & Column Bind
14.1 Row Bind
14.2 Column Bind

In [275]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import glob
import re
import math

In [2]:▾ # For apply Theme

# ! pip install --upgrade jupyterthemes
#jt -t oceans16
# For reset theme
# !jt -r

2/86
In [3]:▾ # Create series from numpy array
v = np.array([1,2,3,4,5,6,7])
s1 = pd.Series(v)
s1

Out[3]: 0 1
1 2
2 3
3 4
4 5
5 6
6 7
dtype: int32

In [4]:▾ #Datatype of Series

s1.dtype

Out[4]: dtype('int32')

In [5]:▾ # number of bytes allocated to each item

s1.itemsize

D:\Software_installed\python 3.8.3\lib\site-packages\ipykernel_launcher.py:2: F
utureWarning: Series.itemsize is deprecated and will be removed in a future ver
sion

Out[5]: 4

In [6]:▾ # number of bytes consumed by series

s1.nbytes

Out[6]: 28

In [7]:▾ # shape of series

s1.shape

Out[7]: (7,)

In [8]:▾ # number of dimensions

s1.ndim

Out[8]: 1

In [9]:▾ # Length of Series

len(s1)

Out[9]: 7

In [10]: s1.count()

Out[10]: 7

3/86
In [11]: s1.size

Out[11]: 7

In [12]:▾ # Create Series from list

s0 = pd.Series([1,2,3], index = ['a','b','c'])
s0

Out[12]: a 1
b 2
c 3
dtype: int64

In [13]:▾ # Modifying index in Series

X= np.array(['a','b','c','d','e','f','g'])
s1.index = X
s1

Out[13]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int32

In [14]:▾ # Creating series using Random and Range function

v2 = np.random.random(10) ind2 = np.arange(0,10)
s = pd.Series(v2,ind2) v2

Out[14]: array([0.2557229 , 0.33307615, 0.52462706, 0.95613077, 0.93835988,

0.28024331, 0.32528214, 0.66792633, 0.92824013, 0.22945246])

In [15]: ind2,v2

Out[15]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),

array([0.2557229 , 0.33307615, 0.52462706, 0.95613077, 0.93835988,
0.28024331, 0.32528214, 0.66792633, 0.92824013, 0.22945246]))

In [16]: dict = {'a1':10, 'a2':20, 'a3':30, 'a4':40}

s3 = pd.Series(dict)
s3

Out[16]: a1 10
a2 20
a3 30
a4 40
dtype: int64

4/86
In [17]: pd.Series(99, index = [0,1,2,3,4,5])

Out[17]: 0 99
1 99
2 99
3 99
4 99
5 99
dtype: int64

 1 Different Things on Series with

Pandas

 1.1 Slicing Series

In [18]: s

Out[18]: 0 0.255723
1 0.333076
2 0.524627
3 0.956131
4 0.938360
5 0.280243
6 0.325282
7 0.667926
8 0.928240
9 0.229452
dtype: float64

In [19]:▾ # Return all elements of the series

s[:]

Out[19]: 0 0.255723
1 0.333076
2 0.524627
3 0.956131
4 0.938360
5 0.280243
6 0.325282
7 0.667926
8 0.928240
9 0.229452
dtype: float64

In [20]:▾ # First three element of the Series

s[0:3]

Out[20]: 0 0.255723
1 0.333076
2 0.524627
dtype: float64
5/86
In [21]:▾ # Last element of the Series
s[-1:]

Out[21]: 9 0.229452
dtype: float64

In [22]:▾ # Fetch first 4 elements in a series

s[:4]

Out[22]: 0 0.255723
1 0.333076
2 0.524627
3 0.956131
dtype: float64

In [23]:▾ # Return all elements of the series except last two elements.
s[:-2]

Out[23]: 0 0.255723
1 0.333076
2 0.524627
3 0.956131
4 0.938360
5 0.280243
6 0.325282
7 0.667926
dtype: float64

In [24]:▾ # Return all elements of the series except last element.

s[:-1]

Out[24]: 0 0.255723
1 0.333076
2 0.524627
3 0.956131
4 0.938360
5 0.280243
6 0.325282
7 0.667926
8 0.928240
dtype: float64

In [25]:▾ # Return last two elements of the series

s[-2:]

Out[25]: 8 0.928240
9 0.229452
dtype: float64

In [26]:▾ # Return last element of the series

s[-1:]

Out[26]: 9 0.229452
dtype: float64

6/86
In [27]: s[-3:-1]

Out[27]: 7 0.667926
8 0.928240
dtype: float64

 1.2 Append Series

In [28]: s2 = s1.copy()
s2

Out[28]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
dtype: int32

In [29]: s3

Out[29]: a1 10
a2 20
a3 30
a4 40
dtype: int64

In [30]:▾ # Append S2 & S3 Series

s4 = s2.append(s3)
s4

Out[30]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
a1 10
a2 20
a3 30
a4 40
dtype: int64

7/86
In [31]:▾ # When "inplace=False" it will return a new copy of data with the operation per
s4.drop('a4', inplace=False)

Out[31]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
a1 10
a2 20
a3 30
dtype: int64

In [32]: s4

Out[32]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
a1 10
a2 20
a3 30
a4 40
dtype: int64

In [33]:▾ # When we use "inplace=True" it will affect the dataframe

s4.drop('a4', inplace=True)
s4

Out[33]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
a1 10
a2 20
a3 30
dtype: int64

8/86
In [34]: s4 = s4.append(pd.Series({'a4':7}))
s4

Out[34]: a 1
b 2
c 3
d 4
e 5
f 6
g 7
a1 10
a2 20
a3 30
a4 7
dtype: int64

 1.3 Operation on Series

In [35]: v1 = np.array([10,20,30])
v2 = np.array([1,2,3]) s1
= pd.Series(v1) s2 =
pd.Series(v2) s1, s2

Out[35]: (0 10
1 20
2 30
dtype: int32, 0 1
1 2
2 3
dtype: int32)

In [36]:▾ # Addition of two series

s1.add(s2)

Out[36]: 0 11
1 22
2 33
dtype: int32

In [37]:▾ # Subtraction of two series

s1.sub(s2)

Out[37]: 0 9
1 18
2 27
dtype: int32

9/86
In [38]:▾ # Subtraction of two series
s1.subtract(s2)

Out[38]: 0 9
1 18
2 27
dtype: int32

In [39]:▾ # Increment all numbers in a series by 9

s1.add(9)

Out[39]: 0 19
1 29
2 39
dtype: int32

In [40]:▾ # Multiplication of two series

s1.mul(s2)

Out[40]: 0 10
1 40
2 90
dtype: int32

In [41]:▾ # Multiplication of two series

s1.multiply(s2)

Out[41]: 0 10
1 40
2 90
dtype: int32

In [42]:▾ # Multiply each element by 1000

s1.mul(1000)

Out[42]: 0 10000
1 20000
2 30000
dtype: int32

In [43]:▾ # Division
s1.divide(s2)

Out[43]: 0 10.0
1 10.0
2 10.0
dtype: float64

10/86
In [44]:▾ # Division
s1.div(s2)

Out[44]: 0 10.0
1 10.0
2 10.0
dtype: float64

In [45]:▾ # MAX number in a series

s1.max()

Out[45]: 30

In [46]:▾ # Min number in a series

s1.min()

Out[46]: 10

In [47]:▾ # Average
s1.mean()

Out[47]: 20.0

In [48]:▾ # median
s1.median()

Out[48]: 20.0

In [49]:▾ # Standard Deviation

s1.std()

Out[49]: 10.0

In [50]:▾ # Series comparison

s1.equals(s2)

Out[50]: False

In [51]: s4 =s1

In [52]:▾ # Series comparison

s1.equals(s4)

Out[52]: True

11/86
In [53]: s5 = pd.Series([1,1,2,2,3,3], index = [0,1,2,3,4,5])
s5

Out[53]: 0 1
1 1
2 2
3 2
4 3
5 3
dtype: int64

In [54]:▾ # afind frequency

s5.value_counts()

Out[54]: 3 2
2 2
1 2
dtype: int64

 2 DataFrame

 2.1 Create DataFrame

In [55]: df = pd.DataFrame()
df

Out[55]:

In [56]:▾ # Create Dataframe using List

lang = ['Java', 'Python', 'C', 'C++' ]
df = pd.DataFrame(lang)
df

Out[56]:
0

0 Java 1

Python 2C

3C++

12/86
In [57]:▾ # Add column in the Dataframe
rating = [1,2,3,4]
df[1] = rating
df

Out[57]:
0 1

0 Java 1 1

Python 2C 2

3C++ 3

In [58]: df.columns = ['Language', 'Rating']

In [59]: df

Out[59]:
Language Rating

0 Java 1

1 Python 2

2 C 3

3 C++ 4

In [60]:▾ # Create Dataframe from Dictionary

data = [{'a':1, 'b':2}, {'a':5, 'b':10, 'c':20}]

df2 = pd.DataFrame(data)
df3 = pd.DataFrame(data, index=['row1', 'row2'], columns = ['a','b'])
df4 = pd.DataFrame(data, index=['row1', 'row2'], columns = ['a','b','c'])
df5 = pd.DataFrame(data, index=['row1', 'row2'], columns = ['a','b','c','d'])

In [61]: df2

Out[61]:
a b c

0 1 2 NaN

1 5 10 20.0

13/86
In [62]: df3

Out[62]:
a b

row 1 2

1 5 10

row

In [63]: df4

Out[63]:
a b c

row 1 2 NaN

1 5 10 20.0

row

2
In [64]: df5

Out[64]:
a b c d

row 1 2 NaN NaN

1 5 10 20.0 NaN

row

2
In [65]:▾ # Create Dataframe from Dictionary
df0 = pd.DataFrame({'ID' :[1,2,3,4], 'Name' :['Aryan','Nayan','John','Rose']})
df0

Out[65]:
ID Name

0 1 Aryan

1 2 Nayan

2 3 John

3 4 Rose

14/86
In [66]:▾ # Create a DataFrame from Dictionary of Series
▾ dict = {'A': pd.Series([1,2,3,], index = ['a','b','c']),
'B': pd.Series([1,2,3,4], index = ['a','b','c','d'])}
df1 = pd.DataFrame(dict)
df1

Out[66]:
A B
a1.0
1
b2.0
2
c3.0
3
dNaN
4

 2.2 Dataframe of Random Numbers with Date

Indices
In [67]: dates = pd.date_range(start='2020-01-20', end='2020-01-26')
dates

Out[67]: DatetimeIndex(['2020-01-20', '2020-01-21', '2020-01-22', '2020-01-23',

'2020-01-24', '2020-01-25', '2020-01-26'],
dtype='datetime64[ns]', freq='D')

In [68]: dates = pd.date_range('today',periods= 7)

dates

Out[68]: DatetimeIndex(['2020-07-11 17:23:40.629449', '2020-07-12 17:23:40.629449',

'2020-07-13 17:23:40.629449', '2020-07-14 17:23:40.629449',
'2020-07-15 17:23:40.629449', '2020-07-16 17:23:40.629449',
'2020-07-17 17:23:40.629449'],
dtype='datetime64[ns]', freq='D')

In [69]: dates = pd.date_range(start='2020-01-20',periods= 7)

dates

Out[69]: DatetimeIndex(['2020-01-20', '2020-01-21', '2020-01-22', '2020-01-23',

'2020-01-24', '2020-01-25', '2020-01-26'],
dtype='datetime64[ns]', freq='D')

15/86
In [70]: M = np.random.random((7,7))
M

Out[70]: array([[0.68096777, 0.6898421 , 0.33241401, 0.8300275 , 0.13412361,

0.6369128 , 0.01566236],
[0.09504884, 0.32464077, 0.85838129, 0.8566076 , 0.78199635,
0.45819325, 0.77872014],
[0.66575749, 0.91854776, 0.61560233, 0.49874779, 0.76408254,
0.1064303 , 0.8557539 ],
[0.67773746, 0.51461509, 0.37812294, 0.82915894, 0.13519876,
0.27856374, 0.37872909],
[0.3849658 , 0.20786193, 0.89258391, 0.89102031, 0.92297737,
0.7354544 , 0.51148957],
[0.77159037, 0.16750849, 0.21146916, 0.96990117, 0.70372491,
0.7523114 , 0.59170043],
[0.03378018, 0.94361344, 0.74223457, 0.66373513, 0.43151117,
0.55326425, 0.23994092]])

In [71]: dframe = pd.DataFrame(M, index= dates)

dframe

Out[71]:
0 1 2 3 4 5 6

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 0.384966 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 0.033780 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

16/86
In [72]:▾ #Changing Column Names
dframe.columns = ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7']
dframe

Out[72]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 0.384966 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 0.033780 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

In [73]:▾ # List Index

dframe.index

Out[73]: DatetimeIndex(['2020-01-20', '2020-01-21', '2020-01-22', '2020-01-23',

'2020-01-24', '2020-01-25', '2020-01-26'],
dtype='datetime64[ns]', freq='D')

In [74]:▾ # List Column Names

dframe.columns

Out[74]: Index(['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7'], dtype='object')

In [75]:▾ # Datatype of each column

dframe.dtypes

Out[75]: C1 float64
C2 float64
C3 float64
C4 float64
C5 float64
C6 float64
C7 float64
dtype: object

17/86
In [76]:▾ # Sort Dataframe by Column 'C1' in Ascending Order
dframe.sort_values(by='C1')

Out[76]:
C1 C2 C3 C4 C5 C6 C7

2020-01-26 0.033780 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-24 0.384966 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

In [77]:▾ # Sort Dataframe by Column 'C1' in Descending Order

dframe.sort_values(by='C1' , ascending=False)

Out[77]:
C1 C2 C3 C4 C5 C6 C7

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-24 0.384966 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-26 0.033780 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

2.3 Delete Column in DataFrame

In [78]: df1

Out[78]:
A B
a1.0
1
b2.0
2
c3.0
3
dNaN
4

18/86
In [79]: del df1['B']

In [80]: df1

Out[80]:
A
a1.0

b2.0

c3.0

dNaN

In [81]: df5

Out[81]:
a b c d

row 1 2 NaN NaN

1 5 10 20.0 NaN

row

2
In [82]:▾ # Delete Column using pop()
df5.pop('c')

Out[82]: row1 NaN

row2 20.0
Name: c, dtype: float64

In [83]: df5

Out[83]:
a b d

row 1 2 NaN

1 5 10 NaN

row

 2.4 Data Selection in Dataframe

19/86
In [84]: df

Out[84]:
Language Rating

0 Java 1

1 Python 2

2 C 3

3 C++ 4

In [85]: df.index = [1,2,3,4]

Out[85]:
Language Rating

1 Java 1

2 Python 2

3 C 3

4 C++ 4

In [86]:▾ # Data selection using row label

df.loc[1]

Out[86]: Language Java

Rating 1
Name: 1, dtype: object

In [87]: df.iloc[1]

Out[87]: Language Python

Rating 2
Name: 2, dtype: object

In [88]: df.loc[1:2]

Out[88]:
Language Rating

1 Java 1

2 Python 2

20/86
In [89]: df.iloc[1:2]

Out[89]:
Language Rating

2 Python 2

In [90]:▾ # Data selection based on Condition

df.loc[df.Rating>2]

Out[90]:
Language Rating

3 C 3

4 C++ 4

In [91]: df1

Out[91]:
A
a1.0

b2.0

c3.0

dNaN

In [92]:▾ # Row & Column label based selection

df1.loc['a']

Out[92]: A 1.0
Name: a, dtype: float64

In [93]:▾ # df1.iloc['a']# This will throw error because iloc will not work on labels

21/86
In [94]: dframe

Out[94]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 0.384966 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 0.033780 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

In [95]:▾ # Data selection using Row Label

dframe['2020-01-20' : '2020-01-22' ]

Out[95]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-21 0.095049 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

In [96]:▾ # Selecting all rows & selected columns

dframe.loc[:, ['C1','C7']]

Out[96]:
C1 C7

2020-01-20 0.680968 0.015662

2020-01-21 0.095049 0.778720

2020-01-22 0.665757 0.855754

2020-01-23 0.677737 0.378729

2020-01-24 0.384966 0.511490

2020-01-25 0.771590 0.591700

2020-01-26 0.033780 0.239941

22/86
In [97]:▾ #row & column label based selection
dframe.loc['2020-01-20':'2020-01-22',['C1','C7']]

Out[97]:
C1 C7

2020-01-20 0.680968 0.015662

2020-01-21 0.095049 0.778720

2020-01-22 0.665757 0.855754

In [98]:▾ # Data selection based on Condition

dframe[dframe['C1']>0.5]

Out[98]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-22 0.665757 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

In [99]:▾ # Data selection based on Condition

dframe[(dframe['C1'] > 0.5) & (dframe['C4']>0.5)]

Out[99]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0.680968 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-23 0.677737 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-25 0.771590 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

In [100]:▾ # Data selection using position (Integer Index based)

dframe.iloc[0][0]

Out[100]: 0.6809677686418316

23/86
In [101]:▾ # # Select all rows & first three columns
dframe.iloc[:,0:3]

Out[101]:
C1 C2 C3

2020-01-20 0.680968 0.689842 0.332414

2020-01-21 0.095049 0.324641 0.858381

2020-01-22 0.665757 0.918548 0.615602

2020-01-23 0.677737 0.514615 0.378123

2020-01-24 0.384966 0.207862 0.892584

2020-01-25 0.771590 0.167508 0.211469

2020-01-26 0.033780 0.943613 0.742235

In [102]: dframe.iloc[0][0] = 10

In [103]:▾ # Display all rows where C1 has value of 10 or 20

dframe[dframe['C1'].isin([10,20])]

Out[103]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 10.0 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

 2.5 Set Value

In [104]:▾ # Set value of 888 for all elements in column 'C1'
dframe['C1'] = 888
dframe

Out[104]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 0.332414 0.830028 0.134124 0.636913 0.015662

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 0.458193 0.778720

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 0.106430 0.855754

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 888 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

24/86
In [105]:▾ # Set value of 777 for first three rows in Column 'C6'
dframe.at[0:3,'C6']=777
dframe

Out[105]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 0.332414 0.830028 0.134124 777.000000 0.015662

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 777.000000 0.778720

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 0.855754

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 888 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

In [106]:▾ # Set value of 333 in first row and third column

dframe.iat[0,2] = 333
dframe

Out[106]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 333.000000 0.830028 0.134124 777.000000 0.015662

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 777.000000 0.778720

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 0.855754

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 888 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

25/86
In [107]: dframe.iloc[0,2] = 555
dframe

Out[107]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 777.000000 0.015662

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 777.000000 0.778720

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 0.855754

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 888 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

In [108]:▾ # Create Copy of the calling objects data along with indices.
# Modifications to the data or indices of the copy will not be reflected in the
dframe1 = dframe.copy(deep=True)

In [109]: dframe1[(dframe1['C1']>0.5) & (dframe1['C4']>0.5)]=0

In [110]: dframe1[dframe1['C1']==0]

Out[110]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 0 0.0 0.0 0.0 0.0 0.0 0.0

2020-01-21 0 0.0 0.0 0.0 0.0 0.0 0.0

2020-01-23 0 0.0 0.0 0.0 0.0 0.0 0.0

2020-01-24 0 0.0 0.0 0.0 0.0 0.0 0.0

2020-01-25 0 0.0 0.0 0.0 0.0 0.0 0.0

2020-01-26 0 0.0 0.0 0.0 0.0 0.0 0.0

26/86
In [111]:▾ # Replace zeros in Column C1 with 99
dframe1[dframe1['C1'].isin([0])]=99
dframe1

Out[111]:
C1 C2 C3 C4 C5 C6 C7
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-20 99 99.0
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-21 99 99.0
0.918548 0.615602 0.498748 0.764083 0.855754
2020-01-22 888 777.0
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-23 99 99.0
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-24 99 99.0
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-25 99 99.0
99.000000 99.000000 99.000000 99.000000 99.000000
2020-01-26 99 99.0

In [112]: dframe

Out[112]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 777.000000 0.015662

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 777.000000 0.778720

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 0.855754

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 0.378729

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 0.511490

2020-01-25 888 0.167508 0.211469 0.969901 0.703725 0.752311 0.591700

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 0.239941

In [113]:▾ # Display all rows where value of C1 is 99

dframe1[dframe1['C1']==99]

Out[113]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 99 99.0 99.0 99.0 99.0 99.0 99.0

2020-01-21 99 99.0 99.0 99.0 99.0 99.0 99.0

2020-01-23 99 99.0 99.0 99.0 99.0 99.0 99.0

2020-01-24 99 99.0 99.0 99.0 99.0 99.0 99.0

2020-01-25 99 99.0 99.0 99.0 99.0 99.0 99.0

2020-01-26 99 99.0 99.0 99.0 99.0 99.0 99.0

27/86
 2.6 Dealing with NULL Values
In [114]: dframe.at[0:8,'C7']=np.NaN
dframe.at[0:2,'C6']=np.NaN
dframe.at[5:6,'C5']=np.NaN
dframe

Out[114]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 NaN NaN

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 NaN NaN

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 NaN

2020-01-25 888 0.167508 0.211469 0.969901 NaN 0.752311 NaN

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 NaN

In [115]:▾ # Detect Non-Missing Values

# It will return True for NOT-NULL values and False for NULL values
dframe.notna()

Out[115]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 Tru Tru Tru Tru True False False

2020-01-21 e e e e True False False

2020-01-22 Tru Tru Tru Tru True True False

2020-01-23 e e e e True True False

2020-01-24 Tru Tru Tru Tru True True False

2020-01-25 e e e e False True False

2020-01-26 Tru Tru Tru Tru True True False

e e e e

Tru Tru Tru Tru

e e e e

Tru Tru Tru Tru

e e e e

Tru Tru Tru Tru

e e e e

28/86
In [116]:▾ # Detect Missing or NULL Values
# It will return True for NULL values and False for NOT-NULL values
dframe.isna()

Out[116]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 False False False False False True Tru

2020-01-21 False False False False False True e

2020-01-22 False False False False False False Tru

2020-01-23 False False False False False False e

2020-01-24 False False False False False False Tru

2020-01-25 False False False False True False e

2020-01-26 False False False False False False Tru

Tru

Tru
In [117]:▾ # Fill all NULL values with 1020
dframe= dframe.fillna(1020) e
dframe Tru

Out[117]: e
C1 C2 C3 C4 C5 C6 C7
1020.0
2020-01-20 888 0.689842 555.000000 0.830028 0.134124 1020.000000
1020.0
2020-01-21 888 0.324641 0.858381 0.856608 0.781996 1020.000000
1020.0
2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000
1020.0
2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564
1020.0
2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454
1020.0
2020-01-25 888 0.167508 0.211469 0.969901 1020.000000 0.752311
1020.0
2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264

29/86
In [118]: dframe.at[0:5 , 'C7'] = np.NaN
dframe.at[0:2 , 'C6'] = np.NaN
dframe.at[5:6 , 'C5'] = np.NaN
dframe

Out[118]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 NaN NaN

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 NaN NaN

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 NaN

2020-01-25 888 0.167508 0.211469 0.969901 NaN 0.752311 1020.0

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 1020.0

In [119]:▾ # Replace Null values in Column 'C5' with number 123

# Replace Null values in Column 'C6' with number 789
dframe.fillna(value={'C5':123, 'C6':789})

Out[119]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 789.000000 NaN

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 789.000000 NaN

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN 0.891020

2020-01-24 888 0.207862 0.892584 0.922977 0.735454 NaN 0.969901

2020-01-25 888 0.167508 0.211469 123.000000 0.752311 1020.0 0.663735

2020-01-26 888 0.943613 0.742235 0.431511 0.553264 1020.0

30/86
In [120]:▾ #Replace first NULL value in Column C7 with 789
dframe.fillna(value={'C7':789}, limit=1)

Out[120]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 NaN 789.0

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 NaN NaN

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 NaN

2020-01-25 888 0.167508 0.211469 0.969901 NaN 0.752311 1020.0

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 1020.0

In [121]:▾ # Drop Rows with NULL values

dframe.dropna()

Out[121]:
C1 C2 C3 C4 C5 C6 C7
0.431511 1020.0
2020-01-26 888 0.943613 0.742235 0.663735 0.553264

In [122]:▾ # Drop Columns with NULL values

dframe.dropna(axis='columns')

Out[122]:
C1 C2 C3 C4

2020-01-20 888 0.689842 555.000000 0.830028

2020-01-21 888 0.324641 0.858381 0.856608

2020-01-22 888 0.918548 0.615602 0.498748

2020-01-23 888 0.514615 0.378123 0.829159

2020-01-24 888 0.207862 0.892584 0.891020

2020-01-25 888 0.167508 0.211469 0.969901

2020-01-26 888 0.943613 0.742235 0.663735

31/86
In [123]: dframe

Out[123]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 NaN NaN

2020-01-21 888 0.324641 0.858381 0.856608 0.781996 NaN NaN

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 NaN

2020-01-25 888 0.167508 0.211469 0.969901 NaN 0.752311 1020.0

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 1020.0

In [124]:▾ # Drop Rows with NULL values present in C5 or C6

dframe.dropna(subset=['C5' ,'C6'])

Out[124]:
C1 C2 C3 C4 C5 C6 C7

2020-01-22 888 0.918548 0.615602 0.498748 0.764083 777.000000 NaN

2020-01-23 888 0.514615 0.378123 0.829159 0.135199 0.278564 NaN

2020-01-24 888 0.207862 0.892584 0.891020 0.922977 0.735454 NaN

2020-01-26 888 0.943613 0.742235 0.663735 0.431511 0.553264 1020.0

 2.7 Descriptive Statistics

32/86
In [125]:▾ # Fill NULL values with 55
dframe.fillna(55, inplace=True)
dframe

Out[125]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 55.000000 55.0 0.856608

2020-01-21 888 0.324641 0.858381 0.781996 55.000000 55.0 0.498748 0.764083

2020-01-22 888 0.918548 0.615602 777.000000 55.0 0.829159 0.135199

2020-01-23 888 0.514615 0.378123 0.278564 55.0 0.891020 0.922977 0.735454

2020-01-24 888 0.207862 0.892584 55.0 0.969901 55.000000 0.752311 1020.0

2020-01-25 888 0.167508 0.211469 0.663735 0.431511 0.553264 1020.0

2020-01-26 888 0.943613 0.742235

In [126]:▾ # Mean of all Columns

dframe.mean()

Out[126]: C1 888.000000
C2 0.538090
C3 79.814056
C4 0.791314
C5 8.309984
C6 127.045656
C7 330.714286
dtype: float64

In [127]:▾ # Max value per column

dframe.max()

Out[127]: C1 888.000000
C2 0.943613
C3 555.000000
C4 0.969901
C5 55.000000
C6 777.000000
C7 1020.000000
dtype: float64

In [128]:▾ # Min value per column

dframe.min()

Out[128]: C1 888.000000
C2 0.167508
C3 0.211469
C4 0.498748
C5 0.134124
C6 0.278564
C7 55.000000
dtype: float64

33/86
In [129]:▾ # Median
dframe.median()

Out[129]: C1 888.000000
C2 0.514615
C3 0.742235
C4 0.830028
C5 0.764083
C6 0.752311
C7 55.000000
dtype: float64

In [130]:▾ #Standard Deviation

dframe.std()

Out[130]: C1 0.000000
C2 0.322676
C3 209.537453
C4 0.158588
C5 20.590770
C6 287.748820
C7 470.871785
dtype: float64

In [131]:▾ # Variance
dframe.var()

Out[131]: C1 0.000000
C2 0.104120
C3 43905.944333
C4 0.025150
C5 423.979805
C6 82799.383192
C7 221720.238095
dtype: float64

In [132]:▾ #Lower Quartile / First Quartile

dframe.quantile(0.25)

Out[132]: C1 888.000000
C2 0.266251
C3 0.496863
C4 0.746447
C5 0.283355
C6 0.644359
C7 55.000000
Name: 0.25, dtype: float64

34/86
In [133]:▾ #Second Quartile / Median
dframe.quantile(0.5)

Out[133]: C1 888.000000
C2 0.514615
C3 0.742235
C4 0.830028
C5 0.764083
C6 0.752311
C7 55.000000
Name: 0.5, dtype: float64

In [134]:▾ # Upper Quartile

dframe.quantile(0.75)

Out[134]: C1 888.000000
C2 0.804195
C3 0.875483
C4 0.873814
C5 0.852487
C6 55.000000
C7 537.500000
Name: 0.75, dtype: float64

In [135]:▾ # IQR (Inter Quartile range)

dframe.quantile(0.75)-dframe.quantile(0.25)

Out[135]: C1 0.000000
C2 0.537944
C3 0.378620
C4 0.127367
C5 0.569132
C6 54.355641
C7 482.500000
dtype: float64

In [136]:▾ # SUM of column values

dframe.sum()

Out[136]: C1 6216.000000
C2 3.766630
C3 558.698394
C4 5.539198
C5 58.169890
C6 889.319594
C7 2315.000000
dtype: float64

35/86
In [137]:▾ # GENERATES DESCRIPTIVE STATS
dframe.describe()

Out[137]:
C1 C2 C3 C4 C5 C6 C7

count 7.0 7.000000 7.000000 7.000000 7.000000 7.000000 7.000000

mean 888.0 0.538090 79.814056 0.791314 8.309984 127.045656 330.714286

std 0.0 0.322676 209.537453 0.158588 20.590770 287.748820 470.871785

min 888.0 0.167508 0.211469 0.498748 0.134124 0.278564 55.000000

25% 888.0 0.266251 0.496863 0.746447 0.283355 0.644359 55.000000

50% 888.0 0.514615 0.742235 0.830028 0.764083 0.752311 55.000000

75% 888.0 0.804195 0.875483 0.873814 0.852487 55.000000 537.500000

max 888.0 0.943613 555.000000 0.969901 55.000000 777.000000 1020.000000

In [138]:▾ #Return unbiased skew

# https://www.youtube.com/watch?v=HnMGKsupF8Q
dframe.skew()

Out[138]: C1 0.000000
C2 0.198711
C3 2.645743
C4 -1.172444
C5 2.644451
C6 2.602402
C7 1.229634
dtype: float64

In [139]:▾ # Return unbiased kurtosis using Fisher’s definition of kurtosis

# https://www.youtube.com/watch?v=HnMGKsupF8Q
dframe.kurt()

Out[139]: C1 0.000000
C2 -1.897991
C3 6.999968
C4 1.040791
C5 6.994764
C6 6.820734
C7 -0.840000
dtype: float64

36/86
In [140]:▾ # Correlation
# https://www.youtube.com/watch?v=qtaqvPAeEJY&list=PLblh5JKOoLUK0FLuzwntyYI10UQ
dframe.corr()

Out[140]:
C1 C2 C3 C4 C5 C6 C7

C1 NaN NaN NaN NaN NaN NaN NaN

C2 NaN 1.000000 0.207536 -0.882971 -0.510997 0.523999 0.036988

C3 NaN 0.207536 1.000000 0.107374 -0.175931 -0.110371 -0.258654

C4 NaN -0.882971 0.107374 1.000000 0.494935 -0.808822 0.109861

C5 NaN -0.510997 -0.175931 0.494935 1.000000 -0.188575 0.643816

C6 NaN 0.523999 -0.110371 -0.808822 -0.188575 1.000000 -0.300063

C7 NaN 0.036988 -0.258654 0.109861 0.643816 -0.300063 1.000000

In [141]:▾ # Covariance
# https://www.youtube.com/watch?v=xZ_z8KWkhXE&list=PLblh5JKOoLUK0FLuzwntyYI10UQ
dframe.cov()

Out[141]:
C1 C2 C3 C4 C5 C6 C7

C1 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

C2 0.0 0.104120 14.032072 -0.045184 -3.395137 48.653092 5.619847

C3 0.0 14.032072 43905.944333 3.568046 -759.061432 -6654.723105 -25520.134097

C4 0.0 -0.045184 3.568046 0.025150 1.616182 -36.909328 8.203815

C5 0.0 -3.395137 -759.061432 1.616182 423.979805 -1117.303460 6242.189778

C6 0.0 48.653092 -6654.723105 -36.909328 -1117.303460 82799.383192 -40656.372679

C7 0.0 5.619847 -25520.134097 8.203815 6242.189778 -40656.372679 221720.238095

37/86
In [142]: import statistics as st
dframe.at[3:6,'C1'] = 22
dframe

Out[142]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 55.000000 55.0 0.856608

2020-01-21 888 0.324641 0.858381 0.781996 55.000000 55.0 0.498748 0.764083

2020-01-22 888 0.918548 0.615602 777.000000 55.0 0.829159 0.135199

2020-01-23 22 0.514615 0.378123 0.278564 55.0 0.891020 0.922977 0.735454

2020-01-24 22 0.207862 0.892584 55.0 0.969901 55.000000 0.752311 1020.0

2020-01-25 22 0.167508 0.211469 0.663735 0.431511 0.553264 1020.0

2020-01-26 888 0.943613 0.742235

In [143]:▾ # Average
st.mean(dframe['C1'])
# dframe['C1'].mean()

Out[143]: 516.8571428571429

In [144]:▾ # Hormonic Mean

st.harmonic_mean(dframe['C1'])

Out[144]: 49.69186046511628

In [145]:▾ #Returns average of the two middle numbers when length is EVEN
arr = np.array([1,2,3,4,5,6,7,8])
st.median(arr)

Out[145]: 4.5

In [146]:▾ # low median of the data with EVEN length

st.median_low(arr)

Out[146]: 4

In [147]:▾ # High median of the data with EVEN length

st.median_high(arr)

Out[147]: 5

In [148]:▾ # Mode of Dataset

st.mode(dframe['C7'])

Out[148]: 55.0

38/86
In [149]:▾ # Sample Variance
st.variance(dframe['C1'])

Out[149]: 214273.14285714287

In [150]:▾ #Population Variance

st.pvariance(dframe['C1'])

Out[150]: 183662.69387755104

In [151]:▾ #Sample Standard Deviation

st.stdev(dframe['C1'])

Out[151]: 462.89647099231905

In [152]:▾ #Population Standard Deviation

st.pstdev(dframe['C1'])

Out[152]: 428.5588569584708

 3 Apply function on Dataframe

In [153]: dframe

Out[153]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888 0.689842 555.000000 0.830028 0.134124 55.000000 55.0 0.856608

2020-01-21 888 0.324641 0.858381 0.781996 55.000000 55.0 0.498748 0.764083

2020-01-22 888 0.918548 0.615602 777.000000 55.0 0.829159 0.135199

2020-01-23 22 0.514615 0.378123 0.278564 55.0 0.891020 0.922977 0.735454

2020-01-24 22 0.207862 0.892584 55.0 0.969901 55.000000 0.752311 1020.0

2020-01-25 22 0.167508 0.211469 0.663735 0.431511 0.553264 1020.0

2020-01-26 888 0.943613 0.742235

In [154]:▾ # Finding MAX value in Columns

dframe.apply(max)

Out[154]: C1 888.000000
C2 0.943613
C3 555.000000
C4 0.969901
C5 55.000000
C6 777.000000
C7 1020.000000
dtype: float64

39/86
In [155]:▾ # Finding minimum value in Columns
dframe.apply(min)

Out[155]: C1 22.000000
C2 0.167508
C3 0.211469
C4 0.498748
C5 0.134124
C6 0.278564
C7 55.000000
dtype: float64

In [156]:▾ #Sum of Column Values

dframe.apply(sum)

Out[156]: C1 3618.000000
C2 3.766630
C3 558.698394
C4 5.539198
C5 58.169890
C6 889.319594
C7 2315.000000
dtype: float64

In [157]:▾ #Sum of Column Values

dframe.apply(np.sum)

Out[157]: C1 3618.000000
C2 3.766630
C3 558.698394
C4 5.539198
C5 58.169890
C6 889.319594
C7 2315.000000
dtype: float64

In [158]:▾ # sum of row

dframe.apply(np.sum, axis=1)

Out[158]: 2020-01-20 1554.653993

2020-01-21 1000.821626
2020-01-22 1722.796980
2020-01-23 79.135659
2020-01-24 80.649898
2020-01-25 1099.101190
2020-01-26 1911.334359
Freq: D, dtype: float64

40/86
In [159]:▾ # Square root of all values in dataframe
dframe.applymap(np.sqrt)

Out[159]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 29.799329 0.830567 23.558438 0.911058 0.366229 7.416198 7.416198

2020-01-21 29.799329 0.569773 0.926489 0.925531 0.884306 7.416198 7.416198

2020-01-22 29.799329 0.958409 0.784603 0.706221 0.874118 27.874720 7.416198

2020-01-23 4.690416 0.717367 0.614917 0.910582 0.367694 0.527791 7.416198

2020-01-24 4.690416 0.455919 0.944767 0.943939 0.960717 0.857586 7.416198

2020-01-25 4.690416 0.409278 0.459858 0.984836 7.416198 0.867359 31.937439

2020-01-26 29.799329 0.971398 0.861530 0.814699 0.656895 0.743817 31.937439

In [160]:▾ # Square root of all values in a DataFrame

dframe.applymap(math.sqrt)

Out[160]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 29.799329 0.830567 23.558438 0.911058 0.366229 7.416198 7.416198

2020-01-21 29.799329 0.569773 0.926489 0.925531 0.884306 7.416198 7.416198

2020-01-22 29.799329 0.958409 0.784603 0.706221 0.874118 27.874720 7.416198

2020-01-23 4.690416 0.717367 0.614917 0.910582 0.367694 0.527791 7.416198

2020-01-24 4.690416 0.455919 0.944767 0.943939 0.960717 0.857586 7.416198

2020-01-25 4.690416 0.409278 0.459858 0.984836 7.416198 0.867359 31.937439

2020-01-26 29.799329 0.971398 0.861530 0.814699 0.656895 0.743817 31.937439

41/86
In [161]: dframe.applymap(float)

Out[161]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 888.0 0.689842 555.000000 0.830028 0.134124 55.000000 55.0 0.856608

2020-01-21 888.0 0.324641 0.858381 0.781996 55.000000 55.0 0.498748 0.764083

2020-01-22 888.0 0.918548 0.615602 777.000000 55.0 0.829159 0.135199

2020-01-23 22.0 0.514615 0.378123 0.278564 55.0 0.891020 0.922977 0.735454

2020-01-24 22.0 0.207862 0.892584 55.0 0.969901 55.000000 0.752311 1020.0

2020-01-25 22.0 0.167508 0.211469 0.663735 0.431511 0.553264 1020.0

2020-01-26 888.0 0.943613 0.742235

In [162]:▾ # Using Lambda function in Dataframes for find min value

dframe.apply(lambda x: min(x))

Out[162]: C1 22.000000
C2 0.167508
C3 0.211469
C4 0.498748
C5 0.134124
C6 0.278564
C7 55.000000
dtype: float64

In [163]:▾ # Using Lambda function in Dataframes for square all values

dframe.apply(lambda x: x*x)

Out[163]:
C1 C2 C3 C4 C5 C6 C7

2020-01-20 788544 0.475882 308025.000000 0.688946 0.017989 3025.000000 3025.0

2020-01-21 788544 0.105392 0.736818 0.733777 0.611518 3025.000000 3025.0

2020-01-22 788544 0.843730 0.378966 0.248749 0.583822 603729.000000 3025.0

2020-01-23 484 0.264829 0.142977 0.687505 0.018279 0.077598 3025.0

2020-01-24 484 0.043207 0.796706 0.793917 0.851887 0.540893 3025.0

2020-01-25 484 0.028059 0.044719 0.940708 3025.000000 0.565972 1040400.0

2020-01-26 788544 0.890406 0.550912 0.440544 0.186202 0.306101 1040400.0

 3.1 Merge Dataframes

42/86
In [164]: daf1 = pd.DataFrame({'Id': ['1','2','3','4','5'], 'Name': ['Aryan','Rose','Bran
daf1

Out[164]:
Id Name

0 1 Aryan

1 2 Rose

2 3 Bran

3 4 Shiv

4 5 Joy

In [165]: daf2 = pd.DataFrame({'Id':['1','2','6','7','8'], 'Score':[40, 60, 80, 90, 70]})

daf2

Out[165]:
Id Score

0 1 40

1 2 60

2 6 80

3 7 90

4 8 70

In [166]:▾ # Inner Join

pd.merge(daf1,daf2, on = 'Id',how ='inner')

Out[166]:
Id Name Score

0 1 Aryan 40

1 2 Rose 60

43/86
In [167]:▾ # Full Outer Join
pd.merge(daf1,daf2, on= 'Id', how = 'outer')

Out[167]:
Id Name Score

0 1 Aryan 40.0

1 2 Rose 60.0

2 3 Bran NaN

3 4 Shiv NaN

4 5 Joy NaN

5 6 NaN 80.0

6 7 NaN 90.0

7 8 NaN 70.0

In [168]:▾ # Left Outer Join

pd.merge(daf1,daf2, on= 'Id', how='left')

Out[168]:
Id Name Score

0 1 Aryan 40.0

1 2 Rose 60.0

2 3 Bran NaN

3 4 Shiv NaN

4 5 Joy NaN

In [169]:▾ # Left Outer Join

pd.merge(daf1,daf2, on= 'Id', how='right')

Out[169]:
Id Name Score

0 1 Aryan 40

1 2 Rose 60

2 6 NaN 80

3 7 NaN 90

4 8 NaN 70

 3.2 Importing multiple CSV files in DataFrame

44/86
In [170]:▾ # Append all CSV files csv1 =
pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/m csv2 =
pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/m csv3 =
pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/m

# csv1.to_csv('/Data_File1.csv')
# csv2.to_csv('/Data_File2.csv')
# csv3.to_csv('/Data_File3.csv')

In [171]:▾ # Append all CSV files

path =r'D:\\MObile\\'
filenames = glob.glob(path + "/*.csv")
covid = pd.DataFrame()
▾ for f in filenames:
df = pd.read_csv(f)
covid = covid.append(df,ignore_index=True,sort=True)

In [172]:▾ # Top 10 rows of the Dataframe

covid.head(3)

Out[172]:
Case-
# Active Admin2 Attack Fatality_Ratio Combined_Key Confirmed Country_R

0 NaN 133.0 Abbeville NaN 0.746269 Abbeville, South 134.0

Carolina, US

1 NaN 1025.0 Acadia NaN 4.026217 Acadia, Louisiana, 1068.0

US
Accomack,
2 NaN 1028.0 Accomack NaN 1.343570 Virginia, US 1042.0

3 rows × 28 columns

In [173]:▾ # Bottom 10 rows of the Dataframe

covid.tail(3)

Out[173]:
Case-
# Active Admin2 Attack Fatality_Ratio Combined_Key Confirmed Count

13801 720.0 NaN NaN 110.0 NaN NaN NaN

13802 720.0 NaN NaN 160.0 NaN NaN NaN

13803 721.0 NaN NaN 110.0 NaN NaN NaN

3 rows × 28 columns

45/86
In [174]:▾ # Reading columns
covid['Country_Region'].head(3)

Out[174]: 0 US
1 US
2 US
Name: Country_Region, dtype: object

In [175]:▾ # Reading columns

df1 = covid[['Country_Region' ,'Province_State','Confirmed' , 'Last_Update']]
df1.head(5)

Out[175]:
Country_Region Province_State Confirmed Last_Update
2020-07-08 05:33:48
0 US South Carolina 134.0
2020-07-08 05:33:48
1 US Louisiana 1068.0
2020-07-08 05:33:48
2 US Virginia 1042.0
2020-07-08 05:33:48
3 US Idaho 3252.0
2020-07-08 05:33:48
4 US Iowa 16.0

In [176]:▾ #Read specific rows

df1.iloc[1:4]

Out[176]:
Country_Region Province_State Confirmed Last_Update

1 US Louisiana 1068.02020-07-08 05:33:48

2 US Virginia 1042.02020-07-08 05:33:48

3 US Idaho 3252.02020-07-08 05:33:48

46/86
In [177]:▾ #Filter data
df1.loc[df1['Country_Region']=='India']

Out[177]:
Country_Region Province_State Confirmed Last_Update
Andaman and Nicobar Islands 2020-07-08 05:33:48
3141 India 147.0
Andhra Pradesh 2020-07-08 05:33:48
3142 India 21197.0
Arunachal Pradesh 2020-07-08 05:33:48
3156 India 276.0
Assam 2020-07-08 05:33:48
3157 India 12522.0
Bihar 2020-07-08 05:33:48
3179 India 12570.0
... ...
... ... ...
Tripura 2020-07-06 04:33:57
11174 India 1568.0
Unknown 2020-07-06 04:33:57
11187 India 4913.0
Uttar Pradesh 2020-07-06 04:33:57
11193 India 27707.0
Uttarakhand 2020-07-06 04:33:57
11194 India 3124.0
West Bengal 2020-07-06 04:33:57
11217 India 22126.0

108 rows × 4 columns

In [178]:▾ #Sort Data Frame

display('Sorted Data Frame', df1.sort_values(['Country_Region'], ascending=True

'Sorted Data Frame'

Country_Region Province_State Confirmed Last_Update

2020-07-07 04:34:00
7434 Afghanistan NaN 33190.0
2020-07-06 04:33:57
11234 Afghanistan NaN 32951.0
2020-07-08 05:33:48
3632 Afghanistan NaN 33384.0
2020-07-06 04:33:57
11235 Albania NaN 2893.0
2020-07-08 05:33:48
3633 Albania NaN 3038.0

47/86
In [179]:▾ #Sort Data Frame
display('Sorted Data Frame', df1.sort_values(['Country_Region'], ascending=Fals

'Sorted Data Frame'

Country_Region Province_State Confirmed Last_Update

2020-07-07 04:34:00
7602 Zimbabwe NaN 734.0
2020-07-06 04:33:57
11402 Zimbabwe NaN 716.0
2020-07-08 05:33:48
3800 Zimbabwe NaN 787.0
2020-07-08 05:33:48
3799 Zambia NaN 1895.0
2020-07-07 04:34:00
7601 Zambia NaN 1632.0

In [180]:▾ #Sort Data Frame - Ascending on "Country" & descending on "Last update"
display('Sorted data Frame', df1.sort_values(['Country_Region', 'Last_Update'],

'Sorted data Frame'

Country_Region Province_State Confirmed Last_Update

2020-07-08 05:33:48
3632 Afghanistan NaN 33384.0
2020-07-07 04:34:00
7434 Afghanistan NaN 33190.0
2020-07-06 04:33:57
11234 Afghanistan NaN 32951.0
2020-07-08 05:33:48
3633 Albania NaN 3038.0
2020-07-07 04:34:00
7435 Albania NaN 2964.0
2020-07-06 04:33:57
11235 Albania NaN 2893.0

In [181]:▾ #Iterating through the dataset

▾ for index, row in df1.iterrows():
▾ if(row['Country_Region']=='Indonesia'):
display(row[['Country_Region','Confirmed']])

Country_Region Indonesia
Confirmed 66226
Name: 3704, dtype: object

Country_Region Indonesia
Confirmed 64958
Name: 7506, dtype: object

Country_Region Indonesia
Confirmed 63749
Name: 11306, dtype: object

48/86
In [182]:▾ #Unique Values
covid['Country_Region'].drop_duplicates(keep='first').head(10)

Out[182]: 0 US
3124 Italy
3125 Brazil
3126 Russia
3127 Mexico
3128 Japan
3131 Canada
3136 Colombia
3137 Peru
3140 Spain
Name: Country_Region, dtype: object

49/86
In [183]:▾ # Countries impacted with Coronavirus
countries= covid['Country_Region'].unique()
type(countries), countries

Out[183]: (numpy.ndarray,
array(['US', 'Italy', 'Brazil', 'Russia', 'Mexico', 'Japan', 'Canada',
'Colombia', 'Peru', 'Spain', 'India', 'United Kingdom', 'China',
'Chile', 'Netherlands', 'Australia', 'Pakistan', 'Germany',
'Sweden', 'Ukraine', 'Denmark', 'France', 'Afghanistan', 'Albania',
'Algeria', 'Andorra', 'Angola', 'Antigua and Barbuda', 'Argentina',
'Armenia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
'Brunei', 'Bulgaria', 'Burkina Faso', 'Burma', 'Burundi',
'Cabo Verde', 'Cambodia', 'Cameroon', 'Central African Republic',
'Chad', 'Comoros', 'Congo (Brazzaville)', 'Congo (Kinshasa)',
'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus',
'Czechia', 'Diamond Princess', 'Djibouti', 'Dominica',
'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia',
'Fiji', 'Finland', 'Gabon', 'Gambia', 'Georgia', 'Ghana', 'Greece',
'Grenada', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana',
'Haiti', 'Holy See', 'Honduras', 'Hungary', 'Iceland', 'Indonesia',
'Iran', 'Iraq', 'Ireland', 'Israel', 'Jamaica', 'Jordan',
'Kazakhstan', 'Kenya', 'Korea, South', 'Kosovo', 'Kuwait',
'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia',
'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'MS Zaandam',
'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta',
'Mauritania', 'Mauritius', 'Moldova', 'Monaco', 'Mongolia',
'Montenegro', 'Morocco', 'Mozambique', 'Namibia', 'Nepal',
'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'North Macedonia',
'Norway', 'Oman', 'Panama', 'Papua New Guinea', 'Paraguay',
'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania', 'Rwanda',
'Saint Kitts and Nevis', 'Saint Lucia',
'Saint Vincent and the Grenadines', 'San Marino',
'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia',
'Seychelles', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia',
'Somalia', 'South Africa', 'South Sudan', 'Sri Lanka', 'Sudan',
'Suriname', 'Switzerland', 'Syria', 'Taiwan*', 'Tajikistan',
'Tanzania', 'Thailand', 'Timor-Leste', 'Togo',
'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Uganda',
'United Arab Emirates', 'Uruguay', 'Uzbekistan', 'Venezuela',
'Vietnam', 'West Bank and Gaza', 'Western Sahara', 'Yemen',
'Zambia', 'Zimbabwe', nan], dtype=object))

50/86
In [184]:▾ #https://data.world/data-society/pokemon-with-stats

df2 = pd.read_csv('Pokemon.csv')
df2.head(5)

Out[184]:
Type Type Sp. Sp.
# Name 1 2 Total HP Attack Defense Atk De Speed
f
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 4

1 2 Ivysaur Grass Poison 405 60 62 63 80 80 6

2 3 Venusaur Grass Poison 525 80 82 83 100 100 8

3 3 VenusaurMega Grass Poison 625 80 100 123 122 120 8

Venusaur
Charmander
4 4 Fire NaN 309 39 52 43 60 50 6

In [185]:▾ # Sum of Columns

df2['Total'] = df2['HP'] + df2['Attack']
df2.head(5)

Out[185]:
Type Type Sp. Sp.
# Name 1 2 Total HP Attack Defense Atk De Speed
f
0 1 Bulbasaur Grass Poison 94 45 49 49 65 65 4

1 2 Ivysaur Grass Poison 122 60 62 63 80 80 6

2 3 Venusaur Grass Poison 162 80 82 83 100 100 8

3 3 VenusaurMega Grass Poison 180 80 100 123 122 120 8

Venusaur
Charmander
4 4 Fire NaN 91 39 52 43 60 50 6

51/86
In [186]:▾ # Sum of Columns
df2['Total'] = df2.iloc[:,5:11].sum(axis=1)
df2.head()

Out[186]:
Type Type Sp. Sp.
# Name 1 2 Total HP Attack Defense Atk De Speed
f
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 4

1 2 Ivysaur Grass Poison 405 60 62 63 80 80 6

2 3 Venusaur Grass Poison 525 80 82 83 100 100 8

3 3 VenusaurMega Grass Poison 625 80 100 123 122 120 8

Venusaur
Charmander
4 4 Fire NaN 309 39 52 43 60 50 6

In [187]:▾ #Shifting "Total" column

cols = list(df2.columns)

df2 = df2[cols[:4] + cols[5:11] + [cols[4]] + cols[11:]]

df2.head()

Out[187]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Total
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 31

1 2 Ivysaur Grass Poison 60 62 63 80 80 60 40

2 3 Venusaur Grass Poison 80 82 83 100 100 80 52

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80 62

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65 30

52/86
In [188]:▾ #Shifting "Legendary" column - Its Index location -1 or 12

cols = list(df2.columns)

df2 = df2[cols[:10] + [cols[-1]] + cols[10:12]]

df2.head()

Out[188]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Legen
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65

In [189]:▾ #Shifting "Generation" column - Index location -1 or 12

cols = list(df2.columns)
df2 = df2[cols[0:10] + [cols[12]] + cols[10:12]]
df2.head(5)

Out[189]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65

In [190]:▾ #Save to CSV file

df2.to_csv('poke_updated.csv')

In [191]:▾ #Save to CSV file without index column

df2.to_csv('poke_updated1.csv', index=False)

53/86
In [192]: df2.head(7)

Out[192]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65
Charmeleon
5 5 Fire NaN 58 64 58 80 65 80
Charizard
6 6 Fire Flying 78 84 78 109 85 100

In [193]:▾ # Save Dataframe as text file

df2.to_csv('poke.txt', sep = '\t', index= False)

In [194]:▾ # Save Dataframe as xlsx file

df2.to_excel('poke.xlsx')

In [195]:▾ # Save Dataframe as xlsx file without row names

df2.to_excel('poke.xlsx',index = 0)

54/86
In [196]:▾ #Filtering using loc
df2.loc[df2['Type 2']=='Dragon']

Out[196]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gen
f
7 6 CharizardMega Fire Dragon 78 130 111 130 85 100
Charizard X

196 181 AmpharosMega Electric Dragon 90 95 105 165 110 45

Ampharos
Kingdra
249 230 Water Drago 75 95 95 95 95 85
SceptileMega
275 254 Sceptile Grass n 70 110 75 145 85 145
Vibrava Drago
360 329 Flygon Ground Drago 50 70 50 50 50 70
n
361 330 Ground n 80 100 80 80 80 100
Dialga
540 483 Steel Drago 100 120 120 150 100 90
Palkia
541 484 Water n 90 120 100 150 120 100
GiratinaAltered
544 487 Forme Ghost Drago 150 100 120 100 120 90
GiratinaOrigin
n
Forme
545 487 Deino Ghost Dragon
Drago 150 120 100 120 100 90

Zweilous n
694 633 Dark Drago 52 65 50 45 50 38
Hydreigon Drago
695 634 Dark n 72 85 70 65 70 58
Dragalge n
696 635 Dark Drago 92 105 90 125 90 98
Tyrunt
761 691 Poison n 65 75 90 97 123 44
Tyrantrum
766 696 Rock Drago 58 89 77 45 45 48
Noibat
767 697 Rock n 82 121 119 69 59 71
Noivern
790 714 Flying Drago 40 30 35 45 40 55

791 n 85 70 80 97 80 123
715 Flying
Drago

Drago

55/86
In [197]:▾ #Filtering using loc
df3 = df2.loc[(df2['Type 2']=='Dragon') & (df2['Type 1']=='Dark')]
df3

Out[197]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Generatio
f
694 633 Deino 634 Dar Drago 52 65 50 45 50 38

695 Zweilous 635 k n 72 85 70 65 70 58

696 Hydreigon Dar Drago 92 105 90 125 90 98

k n

Dar Drago

k n

In [198]:▾ #Reset index for Dataframe df3 keeping old index column
df4 = df3.reset_index()
df4

Out[198]:
Type Type Sp. Sp.
index # Name 1 2 HP Attack Defense Atk De Speed G
f
0 694 633 Deino 634 Dar Drago 52 65 50 45 50 38

1 695 Zweilous 635 k n 72 85 70 65 70 58

2 696 Hydreigon Dar Drago 92 105 90 125 90 98

k n

Dar Drago

In [199]:▾ k df3n removing old index column

#Reset index for Dataframe
df3.reset_index(drop=True, inplace=True)
df3

Out[199]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Generation
f
0 633 Deino 634 Dar Drago 52 65 50 45 50 38

1 Zweilous 635 k n 72 85 70 65 70 58

2 Hydreigon Dar Drago 92 105 90 125 90 98

k n

Dar Drago

k n

56/86
In [200]: df2.head(5)

Out[200]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65

 4 LIKE OPERATION IN PANDAS

In [201]: df2.Name.str.contains('rill').head(7)

Out[201]: 0 False
1 False
2 False
3 False
4 False
5 False
6 False
Name: Name, dtype: bool

57/86
In [202]:▾ # Display all rows containing Name "rill"
df2.loc[df2.Name.str.contains("rill")]

Out[202]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Genera
f
18 15 Beedrill Bug Poison 65 90 40 45 80 75

19 15 BeedrillMega Bug Poison 65 150 40 15 80 145

Beedrill
Marill
198 183 Water Fairy 70 20 50 20 50 40
Azumarill
199 184 Water Fairy 100 50 80 60 80 50
Azurill
322 298 Normal Fairy 50 20 40 20 40 20
Excadrill
589 530 Ground Steel 110 135 60 50 65 88
Frillish
653 592 Water Ghost 55 40 50 65 85 40

In [203]:▾ # Exclude all rows containing "rill"

df2.loc[~df2.Name.str.contains("rill")].head(7)

Out[203]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65
Charmeleon
5 5 Fire NaN 58 64 58 80 65 80
Charizard
6 6 Fire Flying 78 84 78 109 85 100

58/86
In [204]:▾ #Display all rows with Type-1 as "Grass" and Type-2 as "Poison"
df2.loc[df2['Type 1'].str.contains("Grass") & df2['Type 2'].str.contains("Poiso

Out[204]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gene
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Oddish
48 43 Grass Poison 45 50 55 75 65 30
Gloom
49 44 Grass Poison 60 65 70 85 75 40
Vileplume
50 45 Grass Poison 75 80 85 110 90 50
Bellsprout
75 69 Grass Poison 50 75 35 70 30 40
Weepinbell
76 70 Grass Poison 65 90 50 85 45 55
Victreebel
77 71 Grass Poison 80 105 65 100 70 70
Roselia
344 315 Grass Poison 50 60 45 100 80 65
Budew
451 406 Grass Poison 40 30 35 50 70 55
Roserade
452 407 Grass Poison 60 70 65 125 105 90

651 590 Foongus Grass Poison 69 55 45 55 55 15

652 591 Amoonguss Grass Poison 114 85 70 85 80 30

59/86
In [205]: df2.loc[df2['Type 1'].str.contains('Grass|Water',regex = True)].head(7)

Out[205]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Squirtle
9 7 Wate NaN 44 48 65 50 64 43
Wartortle
10 8 r NaN 59 63 80 65 80 58
Blastoise
11 9 Wate NaN 79 83 100 85 105 78

Wate

In [206]:▾ # Due to Case-sensitive it will not return any data

df2.loc[df2['Type 1'].str.contains("grass|water",regex=True)].head(7)

Out[206]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Generation
f

In [207]:▾ # To ignore case we can use "case = False"

df2.loc[df2['Type 1'].str.contains("grass|water",case=False,regex=True)].head(7

Out[207]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Squirtle
9 7 Wate NaN 44 48 65 50 64 43
Wartortle
10 8 r NaN 59 63 80 65 80 58
Blastoise
11 9 Wate NaN 79 83 100 85 105 78

Wate

r
60/86
In [208]:▾ # To ignore case we can use "Flags = re.I"
df2.loc[df2['Type 1'].str.contains("grass|water", flags = re.I, regex=True)].he

Out[208]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Squirtle
9 7 Wate NaN 44 48 65 50 64 43
Wartortle
10 8 r NaN 59 63 80 65 80 58
Blastoise
11 9 Wate NaN 79 83 100 85 105 78

Wate

r
 5 Regex in Pandas dataframe
In [209]:▾ #Get all rows with name starting with "wa"
df2.loc[df2.Name.str.contains('^Wa', flags =re.I, regex = True)].head(7)

Out[209]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Generation
8 Wartortle f
10 Water NaN 59 63 80 65 80 58
Wailmer
350 320 Water NaN 130 70 35 70 35 60
Wailord
351 321 Water NaN 170 90 45 90 45 60
Walrein
400 365 Ice Water 110 80 90 95 90 65
Watchog
564 505 Normal NaN 60 85 69 60 69 77

61/86
In [210]:▾ #Get all rows with name starting with "wa" followed by any letter between a-l
df2.loc[df2.Name.str.contains('^wa[a-l]+',flags = re.I, regex= True)].head(7)

Out[210]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Generation
f
350 320 Wailmer Water NaN 130 70 35 70 35 60

351 321 Wailord Water NaN 170 90 45 90 45 60

400 365 Walrein Ice Water 110 80 90 95 90 65

In [211]:▾ #Get all rows with name starting with x , y, z

df2.loc[df2.Name.str.contains('^[x-z]', flags =re.I,regex=True)]

Out[211]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Genera
f
46 41 Zubat Poison Flying 40 45 35 30 40 55

157 145 Zapdos Electric Flying 90 90 85 125 90 100

192 178 Xatu Psychic Flying 65 75 70 95 70 95

208 193 Yanma Bug Flying 65 65 45 75 45 95

286 263 Zigzagoon Normal NaN 38 30 41 30 41 60

367 335 Zangoose Normal NaN 73 115 60 60 60 90

520 469 Yanmega Bug Flying 86 76 86 116 56 95

582 523 Zebstrika Electric NaN 75 100 63 80 63 116

623 562 Yamask Ghost NaN 38 30 85 55 65 30

631 570 Zorua Dark NaN 40 65 40 80 40 65

632 571 Zoroark Dark NaN 60 105 60 120 60 105

695 634 Zweilous Dark Dragon 72 85 70 65 70 58

707 644 Zekrom Dragon Electric 100 150 120 120 100 90

792 716 Xerneas Fairy NaN 126 131 95 131 98 99

793 717 Yveltal Dark Flying 126 131 95 131 98 99

794 718 Zygarde50% Dragon Ground 108 100 121 81 95 95

Forme

62/86
In [212]:▾ # Extracting first 3 characters from "Name" column
df2['Name2'] = df2.Name.str.extract(r'(^\w{3})')
df2.head(5)

Out[212]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65

In [213]:▾ # Return all rows with "Name" starting with character 'B or b'
df2.loc[df2.Name.str.match(r'(^[B|b].*)')].head(5)

Out[213]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

11 9 Blastoise Wate NaN 79 83 100 85 105 78

12 9 BlastoiseMega r NaN 79 103 120 135 115 78

Blastoise
Wate
Butterfree
15 12 Bug Flying 60 45 50 90 80 70
r
Beedrill
18 15 Bug Poison 65 90 40 45 80 75

 6 Replace values in dataframe

63/86
In [214]: df2.head(7)

Out[214]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65
Charmeleon
5 5 Fire NaN 58 64 58 80 65 80
Charizard
6 6 Fire Flying 78 84 78 109 85 100

In [215]: df2['Type 1'] = df2['Type 1'].replace({"Grass": "Medow", "Fire":"Blaze"})

df2.head(7)

Out[215]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Medo Poison 45 49 49 65 65 45

1 2 Ivysaur w Poison 60 62 63 80 80 60

2 3 Venusaur Medo Poison 80 82 83 100 100 80

3 3 VenusaurMega w Poison 80 100 123 122 120 80

Venusaur
Medo
Charmander
4 4
wBlaze NaN 39 52 43 60 50 65
Charmeleon
5 5 Blaze
Medo NaN 58 64 58 80 65 80
Charizard
6 6 wBlaze Flying 78 84 78 109 85 100

64/86
In [216]: df2['Type 2'] = df2['Type 2'].replace({"Poison": "Venom"})
df2.head()

Out[216]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gener
f
0 1 Bulbasaur Medo Veno 45 49 49 65 65 45

1 2 Ivysaur w m 60 62 63 80 80 60

2 3 Venusaur Medo Veno 80 82 83 100 100 80

3 3 VenusaurMega w m 80 100 123 122 120 80

Venusaur
Medo Veno
Charmander
4 4 Blaze NaN 39 52 43 60 50 65
w m

Medo Veno

w m

In [217]: df2["Type 2"] = df2['Type 2'].replace(['Venom','Dragon'],'DANGER')

df2.head(7)

Out[217]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gen
f
0 1 Bulbasaur Medow DANGER 45 49 49 65 65 45

1 2 Ivysaur Medow DANGER 60 62 63 80 80 60

2 3 Venusaur Medow DANGER 80 82 83 100 100 80

3 3 VenusaurMega Medow DANGER 80 100 123 122 120 80

Venusaur
Charmander
4 4 Blaze NaN 39 52 43 60 50 65
Charmeleon
5 5 Blaze NaN 58 64 58 80 65 80
Charizard
6 6 Blaze Flying 78 84 78 109 85 100

65/86
In [218]: df2.loc[df2['Type 2'] == 'DANGER', 'Name2'] =np.NaN
df2.head(7)

Out[218]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gen
f
0 1 Bulbasaur Medow DANGER 45 49 49 65 65 45

1 2 Ivysaur Medow DANGER 60 62 63 80 80 60

2 3 Venusaur Medow DANGER 80 82 83 100 100 80

3 3 VenusaurMega Medow DANGER 80 100 123 122 120 80

Venusaur
Charmander
4 4 Blaze NaN 39 52 43 60 50 65
Charmeleon
5 5 Blaze NaN 58 64 58 80 65 80
Charizard
6 6 Blaze Flying 78 84 78 109 85 100

In [219]: df2.loc[df2['Total'] > 400, ['Name2', 'Legendary']]='ALERT'

df2.head(7)

Out[219]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gen
f
0 1 Bulbasaur Medow DANGER 45 49 49 65 65 45

1 2 Ivysaur Medow DANGER 60 62 63 80 80 60

2 3 Venusaur Medow DANGER 80 82 83 100 100 80

3 3 VenusaurMega Medow DANGER 80 100 123 122 120 80

Venusaur
Charmander
4 4 Blaze NaN 39 52 43 60 50 65
Charmeleon
5 5 Blaze NaN 58 64 58 80 65 80
Charizard
6 6 Blaze Flying 78 84 78 109 85 100

66/86
In [220]: df2.loc[df2['Total'] > 100, ['Legendary','Name2']] = ['Aleart-1','Aleart-2']
df2.head(7)

Out[220]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gen
f
0 1 Bulbasaur Medow DANGER 45 49 49 65 65 45

1 2 Ivysaur Medow DANGER 60 62 63 80 80 60

2 3 Venusaur Medow DANGER 80 82 83 100 100 80

3 3 VenusaurMega Medow DANGER 80 100 123 122 120 80

Venusaur
Charmander
4 4 Blaze NaN 39 52 43 60 50 65
Charmeleon
5 5 Blaze NaN 58 64 58 80 65 80
Charizard
6 6 Blaze Flying 78 84 78 109 85 100

 7 Group By
In [221]: df = pd.read_csv('poke_updated.csv')
df.head()

Out[221]:
Unnamed: Type Type Sp. Sp.
0 # Name 1 2 HP Attack Defense Atk De Sp
f
0 0 1 Bulbasaur Grass Poison 45 49 49 65 65

1 1 2 Ivysaur Grass Poison 60 62 63 80 80

2 2 3 Venusaur Grass Poison 80 82 83 100 100

3 3 3 VenusaurMega Grass Poison 80 100 123 122 120

Venusaur
Charmander
4 4 4 Fire NaN 39 52 43 60 50

67/86
In [222]: df.groupby(['Type 1']).mean().head(7)

Out[222]:
Unnamed:
0 # HP Attack Defense Sp. Atk Sp. Def Speed

Type
1

Bug 368.072464 334.492754 56.884058 70.971014 70.724638 53.869565 64.797101 61.681159

Dark 507.387097 461.354839 66.806452 88.387097 70.225806 74.645161 69.516129 76.161290

Dragon 521.843750 474.375000 83.312500 112.125000 86.375000 96.843750 88.843750 83.031250

Electric 400.590909 363.500000 59.795455 69.090909 66.295455 90.022727 73.704545 84.500000

Fairy 494.529412 449.529412 74.117647 61.529412 65.705882 78.529412 84.705882 48.588235

Fighting 400.444444 363.851852 69.851852 96.777778 65.925926 53.111111 64.703704 66.074074

Fire 360.942308 327.403846 69.903846 84.769231 67.769231 88.980769 72.211538 74.442308

In [223]: df.groupby(['Type 1']).mean().sort_values('Attack', ascending = False).head(7)

Out[223]:
Unnamed:
0 # HP Attack Defense Sp. Atk Sp. Def Speed

Type
1

Dragon 521.843750 474.375000 83.312500 112.125000 86.375000 96.843750 88.843750 83.031250 400.444444

Fighting 363.851852 69.851852 96.777778 65.925926 53.111111 64.703704 66.074074 392.312500 356.281250

Ground 73.781250 95.750000 84.843750 56.468750 62.750000 63.906250 431.840909 392.727273 65.363636

Rock 92.863636 100.795455 63.340909 75.477273 55.909091 486.296296 442.851852 65.222222 92.703704

Steel 126.370370 67.518519 80.629630 55.259259 507.387097 461.354839 66.806452 88.387097 70.225806

Dark 74.645161 69.516129 76.161290 360.942308 327.403846 69.903846 84.769231 67.769231 88.980769

Fire 72.211538 74.442308

68/86
In [224]: df.groupby(['Type 1']).mean().sort_values('Defense' , ascending = False).head(1

Out[224]:
Unnamed:
0 # HP Attack Defense Sp. Atk Sp. Def Speed

Type
1

Steel 486.296296 442.851852 65.222222 92.703704 126.370370 67.518519 80.629630 55.259259 431.840909

Rock 392.727273 65.363636 92.863636 100.795455 63.340909 75.477273 55.909091 521.843750 474.375000

Dragon 83.312500 112.125000 86.375000 96.843750 88.843750 83.031250 392.312500 356.281250 73.781250

Ground 95.750000 84.843750 56.468750 62.750000 63.906250 536.281250 486.500000 64.437500 73.781250

Ghost 81.187500 79.343750 76.468750 64.343750 333.312500 303.089286 72.062500 74.151786 72.946429

Water 74.812500 70.517857 65.964286 465.666667 423.541667 72.000000 72.750000 71.416667 77.541667

Ice 76.291667 63.458333 380.414286 344.871429 67.271429 73.214286 70.800000 77.500000 70.428571

Grass 61.928571 368.072464 334.492754 56.884058 70.971014 70.724638 53.869565 64.797101 61.681159

Bug 507.387097 461.354839 66.806452 88.387097 70.225806 74.645161 69.516129 76.161290

Dark

In [225]: df.groupby(['Type 1']).mean().sort_values('Speed', ascending= False).head(7)

Out[225]:
Unnamed:
0 # HP Attack Defense Sp. Atk Sp. Def Speed

Type
1

Flying 746.500000 677.750000 70.750000 78.750000 66.250000 94.250000 72.500000 102.50000

Electric 400.590909 363.500000 59.795455 69.090909 66.295455 90.022727 73.704545 84.50000

Dragon 521.843750 474.375000 83.312500 112.125000 86.375000 96.843750 88.843750 83.03125

Psychic 420.105263 380.807018 70.631579 71.456140 67.684211 98.403509 86.280702 81.49122

Dark 507.387097 461.354839 66.806452 88.387097 70.225806 74.645161 69.516129 76.16129

Fire 360.942308 327.403846 69.903846 84.769231 67.769231 88.980769 72.211538 74.44230

Normal 351.581633 319.173469 77.275510 73.469388 59.846939 55.816327 63.724490 71.55102

69/86
In [226]: df.sum()

Out[226]: Unnamed: 0 319600

# 290251
Name BulbasaurIvysaurVenusaurVenusaurMega VenusaurC...
Type 1 GrassGrassGrassGrassFireFireFireFireFireWaterW...
HP 55407
Attack 63201
Defense 59074
Sp. Atk 58256
Sp. Def 57522
Speed 54622
Generation 2659
Legendary 65
Total 348082
dtype: object

In [227]: df.groupby(['Type 2']).sum().head()

Out[227]:
Unnamed: Sp. Sp.
0 # HP Attack Defense Atk De Speed Generation Lege
f
Type
2

Bug 1260 1146 160 270 240 140 185 185 10

Dark 9112 8277 1511 2196 1441 1636 1397 1507 75

Dragon 9578 8686 1479 1700 1567 1773 1502 1450 75

Electric 3068 2794 529 436 410 487 441 429 24

Fairy 9572 8718 1479 1417 1699 1725 1885 1408 82

In [228]: df.count()

Out[228]: Unnamed: 0 800

# 800
Name 800
Type 1 800
Type 2 414
HP 800
Attack 800
Defense 800
Sp. Atk 800
Sp. Def 800
Speed 800
Generation 800
Legendary 800
Total 800
dtype: int64

70/86
In [229]: df['count1'] = 0
df.groupby(['Type 2']).count()['count1']

Out[229]: Type 2
Bug 3
Dark 20
Dragon 18
Electric 6
Fairy 23
Fighting 26
Fire 12
Flying 97
Ghost 14
Grass 25
Ground 35
Ice 14
Normal 4
Poison 34
Psychic 33
Rock 14
Steel 22
Water 14
Name: count1, dtype: int64

In [230]: df['count1'] = 0
df.groupby(['Type 1']).count()['count1']

Out[230]: Type 1
Bug 69
Dark 31
Dragon 32
Electric 44
Fairy 17
Fighting 27
Fire 52
Flying 4
Ghost 32
Grass 70
Ground 32
Ice 24
Normal 98
Poison 28
Psychic 57
Rock 44
Steel 27
Water 112
Name: count1, dtype: int64

71/86
In [231]: df['count1'] = 0
df.groupby(['Type 1','Type 2','Legendary']).count()['count1']

Out[231]: Type 1 Type 2 Legendary

Bug Electric False 2
Fighting False 2
Fire False 2
Flying False 14
Ghost False 1
..
Water Ice False 3
Poison False 3
Psychic False 5
Rock False 4
Steel False 1
Name: count1, Length: 150, dtype: int64

 8 Loading Data in Chunks

In [232]:▾ for df in pd.read_csv('poke_updated1.csv', chunksize=3):
print(df)

# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed
\
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45
1 2 Ivysaur Grass Poison 60 62 63 80 80 60
2 3 Venusaur Grass Poison 80 82 83 100 100 80

Generation Legendary Total

0 1 False 318
1 1 False 405
2 1 False 525
# Name Type 1 Type 2 HP Attack Defense Sp. Atk \
3 3 VenusaurMega Venusaur Grass Poison 80 100 123 122
4 4 Charmander Fire NaN 39 52 43 60
5 5 Charmeleon Fire NaN 58 64 58 80

Sp. Def Speed Generation Legendary Total

3 120 80 1 False 625
4 50 65 1 False 309
5 65 80 1 False 405

72/86
In [233]: df

Out[233]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Genera
HoopaHoopa f
798 720 Unbound Psychic Dark 80 160 60 170 130 80

Volcanion
799 721 Fire Water 80 110 120 130 90 70

In [234]: df1 = pd.DataFrame()

▾ for df in pd.read_csv('poke_updated1.csv',chunksize=10):
df1 = pd.concat([df1,df])
df1.head(15)

Out[234]:
Type Type Sp. Sp.
# Name 1 2 HP Attack Defense Atk De Speed Gene
f
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45

1 2 Ivysaur Grass Poison 60 62 63 80 80 60

2 3 Venusaur Grass Poison 80 82 83 100 100 80

3 3 VenusaurMega Grass Poison 80 100 123 122 120 80

Venusaur
Charmander
4 4 Fire NaN 39 52 43 60 50 65
Charmeleon
5 5 Fire NaN 58 64 58 80 65 80
Charizard
6 6 Fire Flying 78 84 78 109 85 100
CharizardMega
7 6 Charizard X Fire Dragon 78 130 111 130 85 100

CharizardMega
8 6 Charizard Y Fire Flying 78 104 78 159 115 100
Squirtle

9 7 Wartortle Wate NaN 44 48 65 50 64 43

10 8 Blastoise r NaN 59 63 80 65 80 58

11 9 BlastoiseMega Wate NaN 79 83 100 85 105 78

Blastoise
12 9 r NaN 79 103 120 135 115 78
Caterpie
Wate
Metapod
13 10 r Bug NaN 45 30 35 20 20 45

14 11 Bug
Wate NaN 50 20 55 25 25 30

 9 Stack & unstack in Pandas

73/86
In [235]: col = pd.MultiIndex.from_product([['2010','2015'],['Literacy','GDP']])

data = ([[80,7,88,6],[90,8,92,7],[89,7,91,8],[87,6,93,8]])

df6 = pd.DataFrame(data, index=['India','USA' , 'Russia' , 'China'], columns=co

df6

Out[235]:
2010 2015

Literacy GDP Literacy GDP

India 80 7 88 6

USA 90 8 92 7

Russia 89 7 91 8

China 87 6 93 8

In [236]:▾ # Stack() Function stacks the columns to rows.

st_df = df6.stack()
st_df

Out[236]:
2010 2015

India GDP 7 6

Literacy 80 88

USA GDP 8 7

Literacy 90 92

Russia GDP 7 8

Literacy 89 91

China GDP 6 8

Literacy 87 93

74/86
In [237]:▾ #Unstacks the row to columns
unst_df = st_df.unstack()
unst_df

Out[237]:
201 201

0 Literacy 5 Literacy

India GDP 7 80 GDP 6 88

USA 8 90 7 92

Russia 7 89 8 91

China 6 87 8 93

In [238]: unst_df = unst_df.unstack()

unst_df

Out[238]: 2010 GDP

India 7
USA 8
Russia 7
China 6
Literacy India 80
USA 90
Russia 89
China 87
2015 GDP
India 6
USA 7
Russia 8
China 8
Literacy India 88
USA 92
Russia 91
China 93
dtype: int64

In [239]: unst_df = unst_df.unstack()

unst_df

Out[239]:
India USA Russia China

2010 GDP 7 8 7 6

Literacy 80 90 89 87

2015 GDP 6 7 8 8

Literacy 88 92 91 93

 10 PIVOT Tables

75/86
In [240]: data = { 'Country':['India','USA','Russia' , 'China','India','USA' ,
'Russia','China','I 'Year':['2010','2010','2010' , '2010','2010','2010' ,
'2015','2015','2015' , '2 'Literacy/GDP':['GDP' , 'GDP' , 'GDP' ,
'GDP','Literacy' , 'Literacy', 'Literac 'Value':[7,8,7,6,80,90,89,87,6,7]} df7 =
pd.DataFrame(data,columns=['Country','Year','Literacy/GDP','Value']) df7

Out[240]:
Country Year Literacy/GDP Value

0 India 2010 GDP 7

1 USA 2010 GDP 8

2 Russia 2010 GDP 7

3 China 2010 GDP 6

4 India 2010 Literacy 80

5 USA 2010 Literacy 90

6 Russia 2015 Literacy 89

7 China 2015 Literacy 87

8 India 2015 GDP 6

9 USA 2015 GDP 7

In [241]: pd.pivot_table(df7, index = ['Year', 'Literacy/GDP'], aggfunc='sum')

Out[241]:
Value

Year Literacy/GDP

2010 GDP 28

Literacy 170

2015 GDP 13

Literacy 176

76/86
In [242]:▾ #Pivot table with MEAN aggregation
pd.pivot_table(df7 , index= ['Year' , 'Literacy/GDP'] , aggfunc='mean')

Out[242]:
Value

Year Literacy/GDP

2010 GDP 7.0

Literacy 85.0

2015 GDP 6.5

Literacy 88.0

 11 Hierarchical indexing
In [243]: df7.head()

Out[243]:
Country Year Literacy/GDP Value

0 India 2010 GDP 7

1 USA 2010 GDP 8

2 Russia 2010 GDP 7

3 China 2010 GDP 6

4 India 2010 Literacy 80

77/86
In [244]: df8 = df7.set_index(['Year', 'Literacy/GDP'])
df8

Out[244]:
Country Value

Year Literacy/GDP

2010 GDP India 7

GDP USA 8

GDP Russia 7

GDP China 6

Literacy India 80

Literacy USA 90

2015 Literacy Russia 89

Literacy China 87

GDP India 6

GDP USA 7

In [245]: df8.index

Out[245]: MultiIndex([('2010', 'GDP'),

('2010', 'GDP'),
('2010', 'GDP'),
('2010', 'GDP'),
('2010', 'Literacy'),
('2010', 'Literacy'),
('2015', 'Literacy'),
('2015', 'Literacy'),
('2015', 'GDP'),
('2015', 'GDP')],
names=['Year', 'Literacy/GDP'])

78/86
In [246]: df8.loc['2010']

Out[246]:
Country Value

Literacy/GDP

GDP India 7

GDP USA 8

GDP Russia 7

GDP China 6

Literacy India 80

Literacy USA 90

In [247]: df8.loc[['2010']]

Out[247]:
Country Value

Year Literacy/GDP

2010 GDP India 7

GDP USA 8

GDP Russia 7

GDP China 6

Literacy India 80

Literacy USA 90

In [248]: df8.loc['2015','Literacy']

D:\Software_installed\python 3.8.3\lib\site-packages\ipykernel_launcher.py:1: P
erformanceWarning: indexing past lexsort depth may impact performance.
"""Entry point for launching an IPython kernel.

Out[248]:
Country Value

Year Literacy/GDP

2015 Literacy Russia 89

Literacy China 87

79/86
In [249]: df8 = df7.set_index(['Year', 'Literacy/GDP', 'Country'])
df8

Out[249]:
Value

Year Literacy/GDP Country

2010 GDP India 7

USA 8

Russia 7

China 6

Literacy India 80

USA 90

2015 Literacy Russia 89

China 87

India 6
GDP
USA 7

 12 SWAP Columns in Hierarchical

indexing
In [250]: df7.head()

Out[250]:
Country Year Literacy/GDP Value

0 India 2010 GDP 7

1 USA 2010 GDP 8

2 Russia 2010 GDP 7

3 China 2010 GDP 6

4 India 2010 Literacy 80

80/86
In [251]: df8=df7.set_index(['Year', 'Literacy/GDP'])
df8

Out[251]:
Country Value

Year Literacy/GDP

2010 GDP India 7

GDP USA 8

GDP Russia 7

GDP China 6

Literacy India 80

Literacy USA 90

2015 Literacy Russia 89

Literacy China 87

GDP India 6

GDP USA 7

In [252]:▾ # Swaping the columns in Hierarchical index

df9 = df8.swaplevel('Year','Literacy/GDP')
df9

Out[252]:
Country Value

Literacy/GDP Year

GDP 2010 India 7

2010 USA 8

2010 Russia 7

2010 China 6

Literacy 2010 India 80

2010 USA 90

2015 Russia 89

2015 China 87

2015 India 6
GDP
2015 USA 7

81/86
In [253]:▾ # Swaping the columns in Hierarchical index
df9 = df9.swaplevel('Year', 'Literacy/GDP')
df9

Out[253]:
Country Value

Year Literacy/GDP

2010 GDP India 7

GDP USA 8

GDP Russia 7

GDP China 6

Literacy India 80

Literacy USA 90

2015 Literacy Russia 89

Literacy China 87

GDP India 6

GDP USA 7

 13 Crosstab in Pandas
In [254]: df7.head()

Out[254]:
Country Year Literacy/GDP Value

0 India 2010 GDP 7

1 USA 2010 GDP 8

2 Russia 2010 GDP 7

3 China 2010 GDP 6

4 India 2010 Literacy 80

82/86
In [255]: pd.crosstab(df7['Literacy/GDP'],df7.Value, margins =True)

Out[255]:
Value 6 7 8 80 87 89 90 All

Literacy/GDP

GDP 2 3 1 0 0 0 0 6

Literacy 0 0 0 1 1 1 1 4

All 2 3 1 1 1 1 1 10

In [256]:▾ # 2 way cross table

pd.crosstab(df7.Year, df7['Literacy/GDP'], margins = True)

Out[256]:
Literacy/GDP GDP Literacy All

Year

2010 4 2 6

2015 2 2 4

All 6 4 10

In [257]:▾ # 3 way cross table

pd.crosstab([df7.Year, df7['Literacy/GDP']],df7.Country, margins = True)

Out[257]:
Country China India Russia USA All

Year Literacy/GDP

2010 GDP 1 1 1 1 4

Literacy 0 1 0 1 2

2015 GDP 0 1 0 1 2

Literacy 1 0 1 0 2

All 2 3 2 3 10

 14 Row & Column Bind

 14.1 Row Bind

83/86
In [258]: df8 = pd.DataFrame({'ID' :[1,2,3,4], 'Name':['Aryan' , 'Basitro' , 'Rose' , 'Jo
df8

Out[258]:
ID Name Score

0 1 Aryan 99

1 2 Basitro 66

2 3 Rose 44

3 4 John 33

In [259]: df9 = pd.DataFrame({'ID' :[5,6,7,8] , 'Name' :['Michelle' , 'Ramiro' , 'Vignesh

df9

Out[259]:
ID Name Score

0 5 Michelle 78

1 6 Ramiro 55

2 7 Vignesh 77

3 8 Damon 87

In [260]:▾ # Row Bind with concat() function

pd.concat([df8,df9])

Out[260]:
ID Name Score

0 1 Aryan 99

1 2 Basitro 66

2 3 Rose 44

3 4 John 33

0 5 Michelle 78

1 6 Ramiro 55

2 7 Vignesh 77

3 8 Damon 87

84/86
In [261]:▾ # Row Bind with append() function
df8.append(df9)

Out[261]:
ID Name Score

0 1 Aryan 99

1 2 Basitro 66

2 3 Rose 44

3 4 John 33

0 5 Michelle 78

1 6 Ramiro 55

2 7 Vignesh 77

3 8 Damon 87

 14.2 Column Bind

In [262]: df10 = pd.DataFrame({'ID' :[1,2,3,4] , 'Name' :['Aryan' , 'Basitro' , 'Rose' ,
df10

Out[262]:
ID Name

0 1 Aryan

1 2 Basitro

2 3 Rose

3 4 John

In [263]: df11 = pd.DataFrame({'Age' :[20,30,35,40] , 'Score' :[99 , 66 , 44 , 33]})

df11

Out[263]:
Age Score

0 20 99

1 30 66

2 35 44

3 40 33

85/86
In [264]: pd.concat([df10,df11] , axis = 1)

Out[264]:
ID Name Age Score

0 1 Aryan 20 99

1 2 Basitro 30 66

2 3 Rose 35 44

3 4 John 40 33

86/86

C#.NET 17.0 Material - Harsha
100% (1)
C#.NET 17.0 Material - Harsha
1,266 pages
PANDAS - Series Dataframes
No ratings yet
PANDAS - Series Dataframes
118 pages
? Pandas ?
No ratings yet
? Pandas ?
116 pages
IpNotes
No ratings yet
IpNotes
72 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
IP NOTES
No ratings yet
IP NOTES
20 pages
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
No ratings yet
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
15 pages
Unit-1 Python Pandas (1)
No ratings yet
Unit-1 Python Pandas (1)
56 pages
Basic Data Processing with Pandas
No ratings yet
Basic Data Processing with Pandas
29 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
11.2 Pandas
No ratings yet
11.2 Pandas
24 pages
Pandas Pract
No ratings yet
Pandas Pract
48 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas
No ratings yet
Pandas
86 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Pandas Ip PDF
100% (1)
Pandas Ip PDF
48 pages
Python Code
No ratings yet
Python Code
44 pages
DATA HANDLING AND CSV 2024- 2025
No ratings yet
DATA HANDLING AND CSV 2024- 2025
12 pages
DV FINAL QB
No ratings yet
DV FINAL QB
60 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas - Series - Short - Notes
No ratings yet
Pandas - Series - Short - Notes
7 pages
Exp 25_26
No ratings yet
Exp 25_26
17 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Exp8 SBLC
No ratings yet
Exp8 SBLC
9 pages
ATA Andling - 25 MARKS: D H Pandas
No ratings yet
ATA Andling - 25 MARKS: D H Pandas
102 pages
Class XII Python Practical File
No ratings yet
Class XII Python Practical File
19 pages
PANDAS
No ratings yet
PANDAS
24 pages
XII IP Ch 1 Python Pandas - I Series
No ratings yet
XII IP Ch 1 Python Pandas - I Series
45 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
Practical Xii 11-25
No ratings yet
Practical Xii 11-25
14 pages
Vector Operations On Pandas Series
No ratings yet
Vector Operations On Pandas Series
7 pages
Project FileRonak IP
No ratings yet
Project FileRonak IP
27 pages
Practical Record Programs - Solutions
No ratings yet
Practical Record Programs - Solutions
23 pages
Chapter 2 Data Handling using pandas - I(Series)
No ratings yet
Chapter 2 Data Handling using pandas - I(Series)
13 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
ip study
No ratings yet
ip study
18 pages
12 IP Notes On Series
No ratings yet
12 IP Notes On Series
5 pages
Study Material IP 2022
No ratings yet
Study Material IP 2022
55 pages
Series1 Sol
No ratings yet
Series1 Sol
6 pages
Unit 2
No ratings yet
Unit 2
81 pages
numpy_dataframe
No ratings yet
numpy_dataframe
12 pages
IP Practical File Project
No ratings yet
IP Practical File Project
60 pages
Panda
No ratings yet
Panda
33 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
NumPy Arrays and Pandas Series Object
No ratings yet
NumPy Arrays and Pandas Series Object
19 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
pandas notes
No ratings yet
pandas notes
19 pages
pandas notes
No ratings yet
pandas notes
4 pages
Pandas
No ratings yet
Pandas
14 pages
11722121469601
No ratings yet
11722121469601
49 pages
11377681249398
No ratings yet
11377681249398
49 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
11688821574779
No ratings yet
11688821574779
49 pages
Practical File IP Class 12 2022 23
No ratings yet
Practical File IP Class 12 2022 23
49 pages
11745063235731
No ratings yet
11745063235731
49 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
apex trading and manufacturing PLC
No ratings yet
apex trading and manufacturing PLC
5 pages
Event Management-Company Profile
No ratings yet
Event Management-Company Profile
2 pages
ARMSTRONG. The Architecture of The Intelligible Universe in The Philosophy
100% (1)
ARMSTRONG. The Architecture of The Intelligible Universe in The Philosophy
148 pages
Intensive Forestry and Biodiversity-Use of Poplar Plantation by Woodpecker in A Lowland Area of Northern Italy 2021
No ratings yet
Intensive Forestry and Biodiversity-Use of Poplar Plantation by Woodpecker in A Lowland Area of Northern Italy 2021
13 pages
Characterization of Acetone-Solution Casting Film of PMMA
No ratings yet
Characterization of Acetone-Solution Casting Film of PMMA
3 pages
FCE Project - UOE Part 4 Key Word Transformations
100% (1)
FCE Project - UOE Part 4 Key Word Transformations
12 pages
Checklist For Administrative Monitoring
No ratings yet
Checklist For Administrative Monitoring
5 pages
Exp22 Excel Ch01 ML1 Rentals Instructions
No ratings yet
Exp22 Excel Ch01 ML1 Rentals Instructions
3 pages
Eportfolio Copy of Jayda Ziegler Resume
No ratings yet
Eportfolio Copy of Jayda Ziegler Resume
3 pages
Al 2 Cu MG
No ratings yet
Al 2 Cu MG
10 pages
2.2.4 Boost Converter Example
No ratings yet
2.2.4 Boost Converter Example
7 pages
Advanced 7 American Headways 5 Final Exam Unit 4
No ratings yet
Advanced 7 American Headways 5 Final Exam Unit 4
5 pages
Learning Activity SG 2
No ratings yet
Learning Activity SG 2
28 pages
Grade 9 Revision q4
No ratings yet
Grade 9 Revision q4
6 pages
Sika PDS E SikaProof Membrane
No ratings yet
Sika PDS E SikaProof Membrane
2 pages
The Allegory of The Cave Essay
100% (2)
The Allegory of The Cave Essay
3 pages
Summer Fun!: Phil-IRI (English - Oral) Grade Two
No ratings yet
Summer Fun!: Phil-IRI (English - Oral) Grade Two
5 pages
Ultrasonic Level Transmitter and Sensor: SW 838023 GB Shuttle Manual 100316
No ratings yet
Ultrasonic Level Transmitter and Sensor: SW 838023 GB Shuttle Manual 100316
70 pages
Job Description (Bharti Airtel)
No ratings yet
Job Description (Bharti Airtel)
2 pages
Servo Prime & LP: Description Application
No ratings yet
Servo Prime & LP: Description Application
1 page
UCSP Presentation 2
No ratings yet
UCSP Presentation 2
8 pages
The Effect of Welding Parameters On The Corrosion Resistance of Austenitic Stainless Steel
No ratings yet
The Effect of Welding Parameters On The Corrosion Resistance of Austenitic Stainless Steel
7 pages
Comp 1 Synthesis Essay
No ratings yet
Comp 1 Synthesis Essay
4 pages
Bugreport Davinci QKQ1.190825.002 2023 11 22 22 26 02 Dumpstate - Log 29351
No ratings yet
Bugreport Davinci QKQ1.190825.002 2023 11 22 22 26 02 Dumpstate - Log 29351
30 pages
Tau 2k Army List
No ratings yet
Tau 2k Army List
4 pages
D-1 D-2 D-3 D-4: Door Schedule
No ratings yet
D-1 D-2 D-3 D-4: Door Schedule
1 page
Bromma Yard
100% (1)
Bromma Yard
20 pages
Modular Arithmetic
No ratings yet
Modular Arithmetic
12 pages
BESNIK SYKJA HIGH SCHOOL. 2017docx
No ratings yet
BESNIK SYKJA HIGH SCHOOL. 2017docx
10 pages