pandas: Get the number of rows, columns, elements (size) in DataFrame
This article explains how to get the number of rows, columns, and total elements (i.e., size) in a pandas.DataFrame
and pandas.Series
.
As an example, we will use the Titanic survivor dataset, which can be downloaded from Kaggle.
import pandas as pd
print(pd.__version__)
# 2.0.0
df = pd.read_csv('data/src/titanic_train.csv')
print(df.head())
# PassengerId Survived Pclass
# 0 1 0 3 \
# 1 2 1 1
# 2 3 1 3
# 3 4 1 1
# 4 5 0 3
#
# Name Sex Age SibSp
# 0 Braund, Mr. Owen Harris male 22.0 1 \
# 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1
# 2 Heikkinen, Miss. Laina female 26.0 0
# 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
# 4 Allen, Mr. William Henry male 35.0 0
#
# Parch Ticket Fare Cabin Embarked
# 0 0 A/5 21171 7.2500 NaN S
# 1 0 PC 17599 71.2833 C85 C
# 2 0 STON/O2. 3101282 7.9250 NaN S
# 3 0 113803 53.1000 C123 S
# 4 0 373450 8.0500 NaN S
Get the number of rows, columns, and elements in a pandas.DataFrame
Display the number of rows and columns: df.info()
The info()
method of a DataFrame
displays a summary that includes the number of rows and columns, memory usage, data types of each column, and the number of non-null values.
df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 891 entries, 0 to 890
# Data columns (total 12 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 PassengerId 891 non-null int64
# 1 Survived 891 non-null int64
# 2 Pclass 891 non-null int64
# 3 Name 891 non-null object
# 4 Sex 891 non-null object
# 5 Age 714 non-null float64
# 6 SibSp 891 non-null int64
# 7 Parch 891 non-null int64
# 8 Ticket 891 non-null object
# 9 Fare 891 non-null float64
# 10 Cabin 204 non-null object
# 11 Embarked 889 non-null object
# dtypes: float64(2), int64(5), object(5)
# memory usage: 83.7+ KB
The result is printed to the standard output and cannot be assigned to a variable or used in calculations.
Get the number of rows and columns: df.shape
The shape
attribute of a DataFrame
returns a tuple in the form (number of rows, number of columns)
.
print(df.shape)
# (891, 12)
print(df.shape[0])
# 891
print(df.shape[1])
# 12
You can unpack this tuple to assign the row and column counts to individual variables:
row, col = df.shape
print(row)
# 891
print(col)
# 12
Get the number of rows: len(df)
You can get the number of rows in a DataFrame
using the built-in len()
function:
print(len(df))
# 891
Get the number of columns: len(df.columns)
To get the number of columns, apply len()
to the columns
attribute:
print(len(df.columns))
# 12
Get the total number of elements: df.size
The total number of elements in a DataFrame
is available via the size
attribute, which equals row_count * column_count
.
print(df.size)
# 10692
print(df.shape[0] * df.shape[1])
# 10692
Notes when setting an index
When using the set_index()
method to set one or more columns as the index, those columns are removed from the main data (i.e., they are no longer part of the values
). Consequently, they are excluded from the total column count.
df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId'])
print(df_multiindex.shape)
# (891, 8)
print(len(df_multiindex))
# 891
print(len(df_multiindex.columns))
# 8
print(df_multiindex.size)
# 7128
For details on set_index()
, refer to the following article:
Get the number of elements in a pandas.Series
To demonstrate with a Series
, we extract a single column from a DataFrame
:
s = df['PassengerId']
print(s.head())
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
# Name: PassengerId, dtype: int64
Get the number of elements: len(s)
and s.size
Since a Series
is one-dimensional, you can obtain its total number of elements using len()
, the size
attribute, or the shape
attribute. Note that shape
returns a one-element tuple.
print(len(s))
# 891
print(s.size)
# 891
print(s.shape)
# (891,)
print(type(s.shape))
# <class 'tuple'>
The info()
method was introduced for Series
in pandas 1.4. It provides similar metadata as DataFrame.info()
, including the number of non-null values and memory usage.
s.info()
# <class 'pandas.core.series.Series'>
# RangeIndex: 891 entries, 0 to 890
# Series name: PassengerId
# Non-Null Count Dtype
# -------------- -----
# 891 non-null int64
# dtypes: int64(1)
# memory usage: 7.1 KB