pandas: Get the mode (the most frequent value) with mode()

Posted: | Tags: Python, pandas

In pandas, the mode() method is used to find the mode, the most frequent value, of a column or row in a DataFrame. This method is also available on Series.

To get unique values and their counts, use the unique(), value_counts(), and nunique() methods.

The describe() method is useful to compute summary statistics including the mode.

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.4

pandas.Series.mode()

mode() on a Series returns a Series, even if there is only one mode.

s = pd.Series(['X', 'X', 'X', 'Y'])
print(s)
# 0    X
# 1    X
# 2    X
# 3    Y
# dtype: object

print(s.mode())
# 0    X
# dtype: object

print(type(s.mode()))
# <class 'pandas.core.series.Series'>

print(s.mode()[0])
# X

print(type(s.mode()[0]))
# <class 'str'>

If there are multiple modes, the result is as follows. A Series can be converted to a list using the tolist() method.

s_multi = pd.Series(['X', 'X', 'Y', 'Y'])
print(s_multi)
# 0    X
# 1    X
# 2    Y
# 3    Y
# dtype: object

print(s_multi.mode())
# 0    X
# 1    Y
# dtype: object

print(s_multi.mode()[0])
# X

print(s_multi.mode().tolist())
# ['X', 'Y']

print(type(s_multi.mode().tolist()))
# <class 'list'>

By default, the missing value NaN is excluded. If you set the dropna argument to False, NaN will also be counted.

s_nan = pd.Series(['X', float('nan'), float('nan'), float('nan')])
print(s_nan)
# 0      X
# 1    NaN
# 2    NaN
# 3    NaN
# dtype: object

print(s_nan.mode())
# 0    X
# dtype: object

print(s_nan.mode(dropna=False))
# 0    NaN
# dtype: object

See the following article for handling missing values in pandas.

pandas.DataFrame.mode()

Consider the following DataFrame.

df = pd.DataFrame({'col1': ['X', 'X', 'X', 'Y'],
                   'col2': ['X', 'X', 'Y', 'Y']},
                  index=['row1', 'row2', 'row3', 'row4'])
print(df)
#      col1 col2
# row1    X    X
# row2    X    X
# row3    X    Y
# row4    Y    Y

Get the mode for each column

By default, the mode() method on a DataFrame returns a DataFrame with the modes of each column as elements. Even if there is only one mode, a one-row DataFrame is returned.

If the number of modes varies by column, the empty part is filled with the missing value NaN.

print(df.mode())
#   col1 col2
# 0    X    X
# 1  NaN    Y

print(type(df.mode()))
# <class 'pandas.core.frame.DataFrame'>

The number of modes in each column can be obtained using the count() method, which counts the number of non-NaN elements.

print(df.mode().count())
# col1    1
# col2    2
# dtype: int64

The first row of the resulting DataFrame shows the mode for each column, or one of the modes if multiple exist. The first row can be obtained using iloc[0].

print(df.mode().iloc[0])
# col1    X
# col2    X
# Name: 0, dtype: object

Calling mode() from a DataFrame might include missing values NaN, but selecting a column first and then calling mode() from a Series does not include NaN.

print(df.mode()['col1'])
# 0      X
# 1    NaN
# Name: col1, dtype: object

print(df['col1'].mode())
# 0    X
# Name: col1, dtype: object

By applying the mode() and tolist() methods to each column using apply(), you can obtain a Series where each element is a list of modes.

s_list = df.apply(lambda x: x.mode().tolist())
print(s_list)
# col1       [X]
# col2    [X, Y]
# dtype: object

print(s_list.at['col2'])
# ['X', 'Y']

print(type(s_list.at['col2']))
# <class 'list'>

Get the mode for each row: axis

Setting the axis argument to 1 or 'columns' allows you to obtain the mode for each row. The count() method, which counts the number of non-NaN elements, also has the axis argument.

print(df.mode(axis=1))
#       0    1
# row1  X  NaN
# row2  X  NaN
# row3  X    Y
# row4  Y  NaN

print(df.mode(axis=1).count(axis=1))
# row1    1
# row2    1
# row3    2
# row4    1
# dtype: int64

Note that in pandas, each column has a specific data type (dtype), with the assumption that similar types of data are contained within each column. Transposing the DataFrame may be more suitable if similar data types are aligned across rows.

print(df.T)
#      row1 row2 row3 row4
# col1    X    X    X    Y
# col2    X    X    Y    Y

print(df.T.mode())
#   row1 row2 row3 row4
# 0    X    X    X    Y
# 1  NaN  NaN    Y  NaN

Specify whether to include missing values NaN: dropna

By default, missing values NaN are excluded. If you set the dropna argument to False, NaN will also be counted.

df_nan = df.copy()
df_nan.iloc[1:, 1] = float('nan')
print(df_nan)
#      col1 col2
# row1    X    X
# row2    X  NaN
# row3    X  NaN
# row4    Y  NaN

print(df_nan.mode())
#   col1 col2
# 0    X    X

print(df_nan.mode(dropna=False))
#   col1 col2
# 0    X  NaN

Specify whether to target only numerical columns: numeric_only

By default, mode() targets both numerical and non-numerical columns. Setting the numeric_only argument to True changes the focus to only numerical columns.

df_num = df.copy()
df_num['col3'] = [1, 1, 1, 0]
print(df_num)
#      col1 col2  col3
# row1    X    X     1
# row2    X    X     1
# row3    X    Y     1
# row4    Y    Y     0

print(df_num.mode())
#   col1 col2  col3
# 0    X    X   1.0
# 1  NaN    Y   NaN

print(df_num.mode(numeric_only=True))
#    col3
# 0     1

To exclusively target non-numeric columns, you can use the select_dtypes() method, which filters columns based on their data type (dtype).

print(df_num.select_dtypes(exclude='number').mode())
#   col1 col2
# 0    X    X
# 1  NaN    Y

Get the frequency (number of occurrences) of the mode

The frequency (number of occurrences) of the mode can be obtained using the value_counts() method of a Series.

value_counts() returns a Series with unique elements as labels and their counts as elements. By default, it is sorted in order of frequency, so the first value of the returned Series represents the frequency of the mode.

df = pd.DataFrame({'col1': ['X', 'X', 'X', 'Y'],
                   'col2': ['X', 'X', 'Y', 'Y']},
                  index=['row1', 'row2', 'row3', 'row4'])
print(df)
#      col1 col2
# row1    X    X
# row2    X    X
# row3    X    Y
# row4    Y    Y

print(df['col1'].value_counts())
# col1
# X    3
# Y    1
# Name: count, dtype: int64

print(df['col1'].value_counts().iat[0])
# 3

The original Series elements become the index labels in the resulting Series. In the above example, where the labels are strings, you can use [number]. However, to avoid errors when the index is numerical, it is better to use iat[number] for precise indexing.

The describe() method, which calculates summary statistics for each column, can also provide the mode and its frequency.

print(df.describe())
#        col1 col2
# count     4    4
# unique    2    2
# top       X    X
# freq      3    2

top represents the mode, and freq indicates its frequency, with only one mode returned in cases of multiple modes. Since the result is a DataFrame, rows and specific elements can be accessed using loc or at.

print(df.describe().loc['freq'])
# col1    3
# col2    2
# Name: freq, dtype: object

print(df.describe().at['freq', 'col2'])
# 2

describe() does not have the axis argument, so if you want to apply it to rows, transpose it first.

print(df.T.describe())
#        row1 row2 row3 row4
# count     2    2    2    2
# unique    1    1    2    1
# top       X    X    X    Y
# freq      2    2    1    2

Related Categories

Related Articles