pandas: Filter rows/columns by labels with filter()

Modified: | Tags: Python, pandas

In pandas, use the filter() method to select rows or columns in the DataFrame based on their labels (names). This method is provided for both DataFrame and Series.

To extract rows that meet conditions based on values, rather than row labels, refer to the following articles.

You can also select columns based on data types.

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.1.4

df = pd.DataFrame([[0, 1, 2], [3, 4, 5], [6, 7, 8]],
                  index=['apple', 'banana', 'pineapple'],
                  columns=['A', 'B', 'C'])
print(df)
#            A  B  C
# apple      0  1  2
# banana     3  4  5
# pineapple  6  7  8

Basic usage of pandas.DataFrame.filter()

The filter() method of a DataFrame can be used to extract rows or columns based on their labels (index and columns).

Specify the condition using either items, like, or regex as an argument. Specifying them simultaneously results in an error. Details on each are discussed later.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

# print(df.filter(items=['A', 'C'], like='A'))
# TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive

If no row or column meets the condition, an empty DataFrame is returned. It does not result in an error.

print(df.filter(items=['X']))
# Empty DataFrame
# Columns: []
# Index: [apple, banana, pineapple]

The axis to filter on: axis

You can specify the axis to filter on with the axis argument. Use 0 or 'index' for rows and 1 or 'columns' for columns.

print(df.filter(items=['apple', 'pineapple'], axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

print(df.filter(items=['apple', 'pineapple'], axis='index'))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

print(df.filter(items=['A', 'C'], axis=1))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

print(df.filter(items=['A', 'C'], axis='columns'))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

If omitted, columns are filtered by default.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

It is not possible to filter both rows and columns simultaneously. To filter both, repeat filter().

print(df.filter(items=['A', 'C']).filter(items=['apple', 'pineapple'], axis=0))
#            A  C
# apple      0  2
# pineapple  6  8

Filter by exact match: items

To extract rows or columns with exactly matching labels, specify a list in the items argument. The order of the rows or columns in the result corresponds to the order in the specified list, not the original order.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

print(df.filter(items=['C', 'A']))
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

You can do the same by specifying a list of labels with [] or loc[].

print(df[['C', 'A']])
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

print(df.loc[:, ['C', 'A']])
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

print(df.loc[['pineapple', 'apple']])
#            A  B  C
# pineapple  6  7  8
# apple      0  1  2

Filter by partial match: like

To extract rows or columns with partially matching labels, specify a string in the like argument.

print(df.filter(like='apple', axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

Rows or columns whose labels contain the specified string in like (as in like in label == True) are extracted. For more on the in operator, see the following article.

To specify multiple conditions, use regular expressions (regex) as described next.

Filter by regex: regex

To extract rows or columns based on regex for their labels, use a regex pattern specified in the regex argument.

See the official documentation for special characters in regex.

Extract rows whose labels end with 'e':

print(df.filter(regex='e$', axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

Extract rows whose labels begin with 'a' or 'b'.

print(df.filter(regex='^(a|b)', axis=0))
#         A  B  C
# apple   0  1  2
# banana  3  4  5

Extract rows whose labels contain 'na' or 'ne':

print(df.filter(regex='(na|ne)', axis=0))
#            A  B  C
# banana     3  4  5
# pineapple  6  7  8

Rows or columns whose labels match the regex pattern (as in re.search(regex, label) == True) are extracted. For more information on re.search(), refer to the following article.

pandas.Series.filter()

The filter() method is also provided for Series. The usage is the same as with DataFrame.

s = pd.Series([0, 1, 2], index=['apple', 'banana', 'pineapple'])
print(s)
# apple        0
# banana       1
# pineapple    2
# dtype: int64

print(s.filter(items=['pineapple', 'banana']))
# pineapple    2
# banana       1
# dtype: int64

print(s.filter(like='apple'))
# apple        0
# pineapple    2
# dtype: int64

print(s.filter(regex='^(a|b)'))
# apple     0
# banana    1
# dtype: int64

Related Categories

Related Articles