pandas: Filter rows/columns by labels with filter()

Modified: 2024-01-24 | Tags: Python, pandas

In pandas, use the filter() method to select rows or columns in the DataFrame based on their labels (names). This method is provided for both DataFrame and Series.

Contents

Basic usage of pandas.DataFrame.filter()
- The axis to filter on: axis
Filter by exact match: items
Filter by partial match: like
Filter by regex: regex
pandas.Series.filter()

To extract rows that meet conditions based on values, rather than row labels, refer to the following articles.

You can also select columns based on data types.

pandas: Select columns by dtype with select_dtypes()

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.1.4

df = pd.DataFrame([[0, 1, 2], [3, 4, 5], [6, 7, 8]],
                  index=['apple', 'banana', 'pineapple'],
                  columns=['A', 'B', 'C'])
print(df)
#            A  B  C
# apple      0  1  2
# banana     3  4  5
# pineapple  6  7  8

source: pandas_filter.py

Basic usage of `pandas.DataFrame.filter()`

The filter() method of a DataFrame can be used to extract rows or columns based on their labels (index and columns).

Specify the condition using either items, like, or regex as an argument. Specifying them simultaneously results in an error. Details on each are discussed later.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

# print(df.filter(items=['A', 'C'], like='A'))
# TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive

source: pandas_filter.py

If no row or column meets the condition, an empty DataFrame is returned. It does not result in an error.

pandas: Check if DataFrame/Series is empty

print(df.filter(items=['X']))
# Empty DataFrame
# Columns: []
# Index: [apple, banana, pineapple]

source: pandas_filter.py

The axis to filter on: `axis`

You can specify the axis to filter on with the axis argument. Use 0 or 'index' for rows and 1 or 'columns' for columns.

print(df.filter(items=['apple', 'pineapple'], axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

print(df.filter(items=['apple', 'pineapple'], axis='index'))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

print(df.filter(items=['A', 'C'], axis=1))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

print(df.filter(items=['A', 'C'], axis='columns'))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

source: pandas_filter.py

If omitted, columns are filtered by default.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

source: pandas_filter.py

It is not possible to filter both rows and columns simultaneously. To filter both, repeat filter().

print(df.filter(items=['A', 'C']).filter(items=['apple', 'pineapple'], axis=0))
#            A  C
# apple      0  2
# pineapple  6  8

source: pandas_filter.py

Filter by exact match: `items`

To extract rows or columns with exactly matching labels, specify a list in the items argument. The order of the rows or columns in the result corresponds to the order in the specified list, not the original order.

print(df.filter(items=['A', 'C']))
#            A  C
# apple      0  2
# banana     3  5
# pineapple  6  8

print(df.filter(items=['C', 'A']))
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

source: pandas_filter.py

You can do the same by specifying a list of labels with [] or loc[].

print(df[['C', 'A']])
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

print(df.loc[:, ['C', 'A']])
#            C  A
# apple      2  0
# banana     5  3
# pineapple  8  6

print(df.loc[['pineapple', 'apple']])
#            A  B  C
# pineapple  6  7  8
# apple      0  1  2

source: pandas_filter.py

Filter by partial match: `like`

To extract rows or columns with partially matching labels, specify a string in the like argument.

print(df.filter(like='apple', axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

source: pandas_filter.py

Rows or columns whose labels contain the specified string in like (as in like in label == True) are extracted. For more on the in operator, see the following article.

The in operator in Python (for list, string, dictionary, etc.)

To specify multiple conditions, use regular expressions (regex) as described next.

Filter by regex: `regex`

To extract rows or columns based on regex for their labels, use a regex pattern specified in the regex argument.

See the official documentation for special characters in regex.

re - Regular Expression Syntax — Python 3.12.1 documentation

Extract rows whose labels end with 'e':

print(df.filter(regex='e$', axis=0))
#            A  B  C
# apple      0  1  2
# pineapple  6  7  8

source: pandas_filter.py

Extract rows whose labels begin with 'a' or 'b'.

print(df.filter(regex='^(a|b)', axis=0))
#         A  B  C
# apple   0  1  2
# banana  3  4  5

source: pandas_filter.py

Extract rows whose labels contain 'na' or 'ne':

print(df.filter(regex='(na|ne)', axis=0))
#            A  B  C
# banana     3  4  5
# pineapple  6  7  8

source: pandas_filter.py

Rows or columns whose labels match the regex pattern (as in re.search(regex, label) == True) are extracted. For more information on re.search(), refer to the following article.

Regular expressions with the re module in Python

`pandas.Series.filter()`

The filter() method is also provided for Series. The usage is the same as with DataFrame.

s = pd.Series([0, 1, 2], index=['apple', 'banana', 'pineapple'])
print(s)
# apple        0
# banana       1
# pineapple    2
# dtype: int64

print(s.filter(items=['pineapple', 'banana']))
# pineapple    2
# banana       1
# dtype: int64

print(s.filter(like='apple'))
# apple        0
# pineapple    2
# dtype: int64

print(s.filter(regex='^(a|b)'))
# apple     0
# banana    1
# dtype: int64

source: pandas_filter.py

pandas: Filter rows/columns by labels with filter()

Basic usage of `pandas.DataFrame.filter()`

The axis to filter on: `axis`

Filter by exact match: `items`

Filter by partial match: `like`

Filter by regex: `regex`

`pandas.Series.filter()`

Related Categories

Related Articles

pandas: Filter rows/columns by labels with filter()

Basic usage of pandas.DataFrame.filter()

The axis to filter on: axis

Filter by exact match: items

Filter by partial match: like

Filter by regex: regex

pandas.Series.filter()

Related Categories

Related Articles

Basic usage of `pandas.DataFrame.filter()`

The axis to filter on: `axis`

Filter by exact match: `items`

Filter by partial match: `like`

Filter by regex: `regex`

`pandas.Series.filter()`