pandas: Get the number of rows, columns, elements (size) in DataFrame

Modified: | Tags: Python, pandas

This article explains how to get the number of rows, columns, and total elements (i.e., size) in a pandas.DataFrame and pandas.Series.

As an example, we will use the Titanic survivor dataset, which can be downloaded from Kaggle.

import pandas as pd

print(pd.__version__)
# 2.0.0

df = pd.read_csv('data/src/titanic_train.csv')
print(df.head())
#    PassengerId  Survived  Pclass   
# 0            1         0       3  \
# 1            2         1       1   
# 2            3         1       3   
# 3            4         1       1   
# 4            5         0       3   
# 
#                                                 Name     Sex   Age  SibSp   
# 0                            Braund, Mr. Owen Harris    male  22.0      1  \
# 1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
# 2                             Heikkinen, Miss. Laina  female  26.0      0   
# 3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
# 4                           Allen, Mr. William Henry    male  35.0      0   
# 
#    Parch            Ticket     Fare Cabin Embarked  
# 0      0         A/5 21171   7.2500   NaN        S  
# 1      0          PC 17599  71.2833   C85        C  
# 2      0  STON/O2. 3101282   7.9250   NaN        S  
# 3      0            113803  53.1000  C123        S  
# 4      0            373450   8.0500   NaN        S  

Get the number of rows, columns, and elements in a pandas.DataFrame

Display the number of rows and columns: df.info()

The info() method of a DataFrame displays a summary that includes the number of rows and columns, memory usage, data types of each column, and the number of non-null values.

df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 891 entries, 0 to 890
# Data columns (total 12 columns):
#  #   Column       Non-Null Count  Dtype  
# ---  ------       --------------  -----  
#  0   PassengerId  891 non-null    int64  
#  1   Survived     891 non-null    int64  
#  2   Pclass       891 non-null    int64  
#  3   Name         891 non-null    object 
#  4   Sex          891 non-null    object 
#  5   Age          714 non-null    float64
#  6   SibSp        891 non-null    int64  
#  7   Parch        891 non-null    int64  
#  8   Ticket       891 non-null    object 
#  9   Fare         891 non-null    float64
#  10  Cabin        204 non-null    object 
#  11  Embarked     889 non-null    object 
# dtypes: float64(2), int64(5), object(5)
# memory usage: 83.7+ KB

The result is printed to the standard output and cannot be assigned to a variable or used in calculations.

Get the number of rows and columns: df.shape

The shape attribute of a DataFrame returns a tuple in the form (number of rows, number of columns).

print(df.shape)
# (891, 12)

print(df.shape[0])
# 891

print(df.shape[1])
# 12

You can unpack this tuple to assign the row and column counts to individual variables:

row, col = df.shape
print(row)
# 891

print(col)
# 12

Get the number of rows: len(df)

You can get the number of rows in a DataFrame using the built-in len() function:

print(len(df))
# 891

Get the number of columns: len(df.columns)

To get the number of columns, apply len() to the columns attribute:

print(len(df.columns))
# 12

Get the total number of elements: df.size

The total number of elements in a DataFrame is available via the size attribute, which equals row_count * column_count.

print(df.size)
# 10692

print(df.shape[0] * df.shape[1])
# 10692

Notes when setting an index

When using the set_index() method to set one or more columns as the index, those columns are removed from the main data (i.e., they are no longer part of the values). Consequently, they are excluded from the total column count.

df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId'])

print(df_multiindex.shape)
# (891, 8)

print(len(df_multiindex))
# 891

print(len(df_multiindex.columns))
# 8

print(df_multiindex.size)
# 7128

For details on set_index(), refer to the following article:

Get the number of elements in a pandas.Series

To demonstrate with a Series, we extract a single column from a DataFrame:

s = df['PassengerId']
print(s.head())
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# Name: PassengerId, dtype: int64

Get the number of elements: len(s) and s.size

Since a Series is one-dimensional, you can obtain its total number of elements using len(), the size attribute, or the shape attribute. Note that shape returns a one-element tuple.

print(len(s))
# 891

print(s.size)
# 891

print(s.shape)
# (891,)

print(type(s.shape))
# <class 'tuple'>

The info() method was introduced for Series in pandas 1.4. It provides similar metadata as DataFrame.info(), including the number of non-null values and memory usage.

s.info()
# <class 'pandas.core.series.Series'>
# RangeIndex: 891 entries, 0 to 890
# Series name: PassengerId
# Non-Null Count  Dtype
# --------------  -----
# 891 non-null    int64
# dtypes: int64(1)
# memory usage: 7.1 KB

Related Categories

Related Articles