0% found this document useful (0 votes)
21 views14 pages

Dataframe PDF

The document provides an overview of DataFrames in Python's pandas library, describing their structure, properties, and methods for creation. It details various ways to create DataFrames, including from Series, lists, dictionaries, and CSV files, along with examples of each method. Additionally, it explains how to manipulate row and column indices during DataFrame creation.

Uploaded by

jkgvpcsgww
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Dataframe PDF

The document provides an overview of DataFrames in Python's pandas library, describing their structure, properties, and methods for creation. It details various ways to create DataFrames, including from Series, lists, dictionaries, and CSV files, along with examples of each method. Additionally, it explains how to manipulate row and column indices during DataFrame creation.

Uploaded by

jkgvpcsgww
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

PYTHON PANDAS

CLASS : XII
DATAFRAMES
DataFrame is a two-dimensional object in the python pandas library which stores
heterogeneous data (differing data types) in form of rows and columns. It is similar to
an Excel spreadsheet or a SQL table. This is the most commonly used pandas object.
Once we store the data into the Dataframe, we can perform various operations that are
useful in analyzing and understanding the data.

PROPERTIES OF DATAFRAME :
1. A Dataframe has two axes (indices) –

 axis 0 represents Row ( vertical


movement ) and

 axis 1 represents Column ( horizontal


movement) .

Page 1 of 14
2. It is similar to a spreadsheet , whose

 row index is called Index and

 column index is called Columns.

3. Dataframe contains Heterogeneous data.


4. Dataframe’s Size is Mutable.
5. Dataframe’s Data is Mutable.

FOR USING THE DATAFRAME OBJECT WE MUST IMPORT THE PANDAS


LIBRARY BY USING THE STATEMENT:
import pandas as pd
A DATA FRAME CAN BE CREATED USING ANY OF THE FOLLOWING
1. Series
2. Lists of Lists
3. 2D dictionary - Dictionary of Series / ndarrays / Lists
4. List of Dictionaries
5. CSV Files
6. A numpy 2D array
7. Another DataFrame Object

1. CREATION OF EMPTY DATAFRAMES :

EXAMPLE - PROGRAM OUTPUT


import pandas as pd The DataFrame is:

df = pd.DataFrame() Empty DataFrame

print ("The DataFrame is:\n",df ) Columns: []


Index: []

Page 2 of 14
2. CREATION OF DATAFRAMES from SERIES :

EXAMPLE - PROGRAM OUTPUT


import pandas as pd The DataFrame is:
0
s=pd.Series(["XII A","XII B", "XII C", "XII D", "XII E"]) 0 XII A

df= pd.DataFrame(s) 1 XII B

print ("The DataFrame is:\n",df) 2 XII C


3 XII D

Default Column name is 0 4 XII E

import pandas as pd The DataFrame is:


CLASS NAME
s=pd.Series(["XII A","XII B", "XII C","XII D", "XII E"]) 0 XII A
1 XII B
df= pd.DataFrame(s , columns=["CLASS NAME"]) 2 XII C
3 XII D
print ("The DataFrame is:\n",df ) 4 XII E

3. CREATION OF DATAFRAMES from Lists of Lists :


A two-dimensional nested list can be used to create a DataFrame. The columns
parameter is used to pass the name of the columns as a list.

EXAMPLE - PROGRAM OUTPUT


import pandas as pd The DataFrame is:
L = [ ['abc', 15] , ['def', 16] , ['ghi', 17] ] name age
0 abc 15
1 def 16
df1=pd.DataFrame( L, columns=['name', 'age'])
2 ghi 17
print(„The DataFrame is:\n‟, df1)

#Creating a DataFrame from List of Lists – WITHOUT


COLUMNS PARAMETER
import pandas as pd
Page 3 of 14
L = [ ['abc', 15], ['def', 16], ['ghi', 17] ] The Data Frame is :
df1=pd.DataFrame( L) 0 1
print ("The Data Frame is :\n",df1) 0 abc 15
1 def 16
2 ghi 17

#Creating a DataFrame from List of Lists – WITH columns The DataFrame is:
& index PARAMETER name age
import pandas as pd a abc 15
L = [ ['abc', 15] , ['def', 16] , ['ghi', 17] ] b def 16
df1=pd.DataFrame( L, columns=['name', 'age'], index c ghi 17
=['a','b','c'])
print(„The DataFrame is:\n‟, df1)

Note : columns parameter sets the column index and index


parameter row index

2D DICTIONARY :A 2D dictionary is a dictionary having items as


{key:value} where value part is a data structure of any type – another
dictionary , ndarray , list , series etc.
Note : The value part of all the keys should have similar structure and
same lengths

4. CREATION OF DATAFRAMES from Dictionary of SERIES , Lists


/ ndarray
A dictionary can also be used to create a DataFrame.
The key of the dictionary becomes the column label and the
values- which can be lists / ndarrays / Series objects, become the elements appearing
under that column.
The row labels can be specified by passing the values to the index parameter, a list of
row labels.( so that the default row labels 0 , 1 , 2….will be replaced by user’s choice)

Page 4 of 14
EXAMPLE - PROGRAM OUTPUT
# creating a dataframe from Dictionary of Series
import pandas as pd Class Name
0 XII A vikrant
clas = pd.Series(["XII A", "XII B","XII C","XII D","XII E"])
1 XII B Kevin
name = pd.Series(["vikrant", "Kevin","Nitisha","Manoj","Artha"])
2 XII C Nitisha
dic= {"Class" : clas , "Name" : name}
3 XII D Manoj
df=pd.DataFrame(dic) 4 XII E Artha
print (df)

# creating dataframe from Dictionary of Lists


import pandas as pd5
L1 = ['abc', 'def', 'ghi'] The DataFrame is :
L2 = [15,16,17] name age
d5 = {'name': L1, 'age':L2 } 0 abc 15
df6 = pd5.DataFrame(d5) 1 def 16

print('The DataFrame is : \n',df6) 2 ghi 17

# creating dataframe from Dictionary of Lists


import pandas as pd The DataFrame is :
L1 = ['abc', 'def', 'ghi'] name age
L2 = [15,None,17] 0 abc 15.0
d1 = {'name': L1, 'age':L2 } 1 def NaN
df1 = pd.DataFrame(d1) 2 ghi 17.0
print('The DataFrame is : \n',df1)

# creating dataframe from Dictionary of Lists


import pandas as pd The DataFrame is :
L1 = ['abc', 'def', 'ghi'] name age
L2 = [15,16,17] ROW1 abc 15
d1 = {'name': L1, 'age':L2 } ROW2 def 16
df1 = pd.DataFrame(d1,index =["ROW1","ROW2","ROW3"]) ROW3 ghi 17
print('The DataFrame is : \n',df1)

Page 5 of 14
# creating dataframe from Dictionary of Lists
import pandas as pd ValueError : arrays must
L1 = ['abc', 'def', 'ghi'] all be same length
L2 = [15,17]
d1 = {'name': L1, 'age':L2 } Reason - mismatch in no.
df1 = pd.DataFrame(d1) of elements between the
lists L1 & L2.
print('The DataFrame is : \n',df1)

# Creating a DataFrame from Dictionary of ndarrays


import pandas as pd The DataFrame is :
import numpy as np name age
a1 = np.array(['jkl','mno','pqr']) r1 jkl 20
a2 = np.array([20,21,22]) r2 mno 21
d2 = { 'name' : a1, 'age' : a2 } r3 pqr 22
df2 = pd.DataFrame(d2, index=['r1', 'r2', 'r3'])
print('The DataFrame is : \n', df2)

5. CREATION OF DATAFRAMES from LIST of DICTIONARIES :


A DataFrame can be created from a List of Dictionaries. The elements of the dictionary
are {key:value} pairs. The keys of the dictionary become the column names in the
DataFrame object and the values of the dictionary become the column-values of the
DataFrame object. If any one of the column-name (keys) is missing from a particular
dictionary, then that column has a NaN value associated with it in the DataFrame
object.

Page 6 of 14
EXAMPLE - PROGRAM OUTPUT
# Creating a DataFrame from List of Dictionaries

import pandas as pd
The Data Frame is
L1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ]
name age class
df1 = pd.DataFrame(L1)
0 abc 15 NaN
print (" The Data Frame is \n",df1)
1 def 16 5.0

( Note : The DataFrame is created using a list L1 which inturn


contains dictionaries , the keys becomes the columns of the
dataframe ,row 1 will be as per dictionary 1 and row 2 will be as per
dictionary 2 and so on….)

# Creating a DataFrame from List of Dictionary with row The Data Frame is
index
name age class
import pandas as pd
d1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16, 'class':5} ] r1 abc 15 NaN

df1 = pd.DataFrame(d1, index=['r1', 'r2'] ) r2 def 16 5.0


print (" The Data Frame is \n",df1)
Note : we can specify our own row labels by using the
index=[list_of _row_labels] parameters when using the
DataFrame() method in all concepts of creating a
DataFrame.

Define index & columns parameters of DataFrame Object :

We can use the index=[list_of_row_labels] and columns=[list_of_column_labels] to


specify the row index as well as the column index in DataFrame creation.

Here while specifying the column labels we have the flexibility of specifying only a
limited list of column names in the column list in which case only the columns
appearing in the list appear in the DataFrame object.

Another flexibility is that if any additional column name is specified which does not
exist in any of the dictionary then that column is created in the DataFrame object and all

Page 7 of 14
the values appear as NaN under that column.

EXAMPLE - PROGRAM OUTPUT


# Creating a DataFrame from List of Dictionary with row /
column index
import pandas as pd
L1 = [{'name':'abc', 'age':15 }, {'name': 'def', 'age':16,
'class':5} ]
DataFrame1 :
df1 = pd.DataFrame( L1, index=['r1', 'r2'] , name age
columns=['name','age'] ) #column class is left out
r1 abc 15
print ('DataFrame1 :\n',df1) r2 def 16
print()

df2 = pd.DataFrame(L1, index=['r1', 'r2'] , DataFrame2 :


columns=['name','age','marks']) # column marks is added name age marks
r1 abc 15 NaN
print ('DataFrame2 : \n',df2)
r2 def 16 NaN

6. CREATING A DATAFRAME USING CSV FILES / WRITING


TO CSV FILE ( Comma Seperated Values )
A csv file can be imported directly to a DataFrame object using the read_csv( ) method.
The read_csv( ) method has many parameters to control the kind of data imported.
The parameter sep='char' can be used to specify the character used to separate the
column values, by default it is the comma(,).
The parameter index_col=int can be used to specify the row labels are to be taken from
which column. An int is specified to highlight the int column number containing the
row labels. The first column has index 0, second column has index 1, and so on.
Similar to importing of data from a csv file, data from a DataFrame object can be
exported to a csv file using the to_csv() method. The to_csv() method has many
parameters to control the kind of data to be exported. The parameter index=False will
not export the index as a column in the csv file. The parameter header=False will omit
writing of the column names to the csv file being exported.
STUDENT.CSV
Page 8 of 14
ROLLNO NAME GENDER AGE MARKS
1201 A ANOOP M 16 76
1201 B RAVI M 15 80
1201 C SARANYA F 14 94
1201 D RAHUL M 16 55
1201 E DIYA F 15 65

EXAMPLE - PROGRAM OUTPUT


# Creating a DataFrame using
csv file - "student.csv"
import pandas as pd
df1 = pd.read_csv('student.csv')
print ( 'DataFrame is :\n',df1)

Note : The above dataframe ‘df1’ got created with the entire contents of the csv file( student.csv saved
in the same folder of python file ) , the topmost row of the csv file became the column index of the data
frame and the row index got automatically assigned with values 0,1,2,3,4

Writing header = 0 or not writing header will make the first row of the csv as header for the
dataframe object. <header = <index no.> marks the header and also marks the start of the data to be
fetched.

# Creating a DataFrame using


csv file - "student.csv" stored in desktop
df2 =
pd.read_csv('C:/Users/abc/Desktop/stude
nt.csv')
print ( 'DataFrame is :\n',df2)

Note : The above dataframe ‘df2’ got created with the entire contents of the csv file( student.csv saved
in desktop ) , the topmost row of the csv file became the column index of the data frame and the row
index got automatically assigned with values 0,1,2,3,4

Page 9 of 14
# Creating a DataFrame using
csv file - "student.csv" with parameters
import pandas as pd

df2 = pd.read_csv('student.csv', sep= ','


, index_col=0 )
print ('DataFrame is :\n',df2)
Note : in this dataframe , the column
with index value ‘0’ has become the row
index

# Creating a DataFrame using


csv file - "student.csv"
import pandas as pd

df3 = pd.read_csv('student.csv',
index_col=1 )
print ( 'DataFrame is :\n',df3)
Note : in this dataframe , the column with
index value ‘1’ has become the row index

# student.csv stored in python folder


df1 = pd.read_csv('STUDENT.csv',
header = None)
print ( 'DataFrame is :\n',df1)
print()
Note : header=None , pandas automatically
assign the first row of csv (which is the actual
column names) as the first row in the
dataframe , hence your columns no longer have
names.

Page 10 of 14
# student.csv stored in python folder DataFrame is with header adjustments:

df1 = pd.read_csv('STUDENT.csv', header = 3) 1201 C SARANYA F 14 94


0 1201 D RAHUL M 16 55
print ('DataFrame is with header
1 1201 E DIYA F 15 65
adjustments:\n',df1)
print()
note : row with index number 3 of the csv is made as the
header and the row following index 3 has becomes the
rows of the dataframe.

# marks.csv stored in python folder DataFrame is :


Rno Nam Gen Ag Mk
df1 = pd.read_csv('STUDENT5.csv',
names =['Rno','Nam','Gen','Ag','Mk']) 0 1201 A ANOOP M 16 76
1 1201 B RAVI M 15 80
print ( 'DataFrame is :\n',df1)
2 1201 C SARANYA F 14 94
print() 3 1201 D RAHUL M 16 55
note : names =[col1 , col2…] will give a 4 1201 E DIYA F 15 65
customized heading for the dataframe , in this
case use a csv without column name….

# student.csv stored in python folder


df1 =
pd.read_csv('STUDENT.csv',usecols =
['NAME','AGE'], nrows = 3)
print ( 'DataFrame is :\n',df1)
print()
Note : usecols = [‘col1’,’col2’,….] will retrieve only
those specified columns from csv and nrows = <int> will
retrieve only those many rows from the top of the csv.

# Creating a csv file “student1.csv “


from the DataFrame df1
df1.to_csv('student1.csv')
print "student1.csv is created in python
folder , pls check"

# Creating a csv file “student1.csv “


ROLLNO NAME GENDER AGE MARKS
from the DataFrame df1
1201 A ANOOP M 16 76
1201 B RAVI M 15 80
1201 C SARANYA F 14 94

Page 11 of 14
df1.to_csv('student2.csv', index=False) 1201 D RAHUL M 16 55
1201 E DIYA F 15 65
( no row index because index = false)
ROLLNO NAME GENDER AGE MARKS
df1.to_csv('student3.csv',index=False , 0 1201
1201 A A ANOOPANOOPM M 1616 76
76
header = False) 1 1201
1201 B B
RAVI RAVI
M M 1515 80
80
2 1201 C SARANYA F 14 94
( no row index because index = False 1201 C SARANYA F 14 94
3 1201 D RAHUL M 16 55
1201 D RAHUL M 16 55
and no column header because header = 4 1201 E DIYA F 15 65
1201 E DIYA F 15 65
False)

ATTRIBUTES / PROPERTIES OF DATAFRAMES


All information related to a DataFrame object can be made available to the user by the
attributes of the dataframe object.
Some common attributes of the dataframe object is discussed below :

Sl No ATTRIBUTES DESCRIPTION
1 index Returns the index (row labels) of DataFrame

2 columns Returns the column labels of the DataFrame


Returns a list containing both the axes index values
3 axes

4 dtypes Returns the data type of each individual columns

5 shape Returns a tuple representing the dimensions


( row,col ) of the dataframe
6 ndim Returns an int representing the number of axes
Returns the number of elements in the dataframe object
7 size

8 empty Returns True if the dataframe is empty , else False

9 T Displays the Transpose of the DataFrame

10 len Displays the number of rows of the DataFrame

11 values Returns a numpy representation of the dataframe

Page 12 of 14
EXAMPLES FOR DATAFRAME ATTRIBUTES
PROGRAM OUTPUT
import pandas as pd
age =
pd.Series([16,15,13,15,15])

clas = pd.Series(["XII A", "XII


B","XII C","XII D","XII E"])
name = pd.Series(["vikrant",
"Kevin","Nitisha","Manoj","Arth
a"])

dic= {"Class" : clas , "Name" :


name, “Age”:age}
df=pd.DataFrame(dic)
print (df)
print ("The index is : \n The index is :
",df.index ) RangeIndex(start=0, stop=5, step=1)

print (" The columns are : \n ",


df.columns)

print (" The axes are : \n ",


df.axes)

print (" The data types are : \n ",


df.dtypes)

print (" The shape is : \t ", The shape is : (5, 3)


df.shape)

print (" The dimension is : \t ", The dimension is : 2


df.ndim)

Page 13 of 14
print (" The size ( no. of The size ( no. of elements) is : 15
elements) is : \t ", df.size)

print (" Is the DataFrame Is the DataFrame Empty : False


Empty : \t ", df.empty)

print (" The transpose of the


DataFrame is : \n ", df.T)

print (" The no. of rows in the The no. of rows in the dataframe is : 5
dataframe is : \t ", len(df))

print (" The values of the


dataframe is : \n ", df.values)

**************************

Page 14 of 14

You might also like