pandas: Filter rows/columns by labels with filter()
In pandas, use the filter()
method to select rows or columns in the DataFrame
based on their labels (names). This method is provided for both DataFrame
and Series
.
- pandas.DataFrame.filter — pandas 2.1.4 documentation
- pandas.Series.filter — pandas 2.1.4 documentation
To extract rows that meet conditions based on values, rather than row labels, refer to the following articles.
- pandas: Query DataFrame and extract rows with query()
- pandas: Select rows by multiple conditions
- pandas: Extract rows that contain specific strings from a DataFrame
You can also select columns based on data types.
The pandas version used in this article is as follows. Note that functionality may vary between versions. The following DataFrame
is used as an example.
import pandas as pd
print(pd.__version__)
# 2.1.4
df = pd.DataFrame([[0, 1, 2], [3, 4, 5], [6, 7, 8]],
index=['apple', 'banana', 'pineapple'],
columns=['A', 'B', 'C'])
print(df)
# A B C
# apple 0 1 2
# banana 3 4 5
# pineapple 6 7 8
Basic usage of pandas.DataFrame.filter()
The filter()
method of a DataFrame
can be used to extract rows or columns based on their labels (index
and columns
).
Specify the condition using either items
, like
, or regex
as an argument. Specifying them simultaneously results in an error. Details on each are discussed later.
print(df.filter(items=['A', 'C']))
# A C
# apple 0 2
# banana 3 5
# pineapple 6 8
# print(df.filter(items=['A', 'C'], like='A'))
# TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive
If no row or column meets the condition, an empty DataFrame
is returned. It does not result in an error.
print(df.filter(items=['X']))
# Empty DataFrame
# Columns: []
# Index: [apple, banana, pineapple]
The axis to filter on: axis
You can specify the axis to filter on with the axis
argument. Use 0
or 'index'
for rows and 1
or 'columns'
for columns.
print(df.filter(items=['apple', 'pineapple'], axis=0))
# A B C
# apple 0 1 2
# pineapple 6 7 8
print(df.filter(items=['apple', 'pineapple'], axis='index'))
# A B C
# apple 0 1 2
# pineapple 6 7 8
print(df.filter(items=['A', 'C'], axis=1))
# A C
# apple 0 2
# banana 3 5
# pineapple 6 8
print(df.filter(items=['A', 'C'], axis='columns'))
# A C
# apple 0 2
# banana 3 5
# pineapple 6 8
If omitted, columns are filtered by default.
print(df.filter(items=['A', 'C']))
# A C
# apple 0 2
# banana 3 5
# pineapple 6 8
It is not possible to filter both rows and columns simultaneously. To filter both, repeat filter()
.
print(df.filter(items=['A', 'C']).filter(items=['apple', 'pineapple'], axis=0))
# A C
# apple 0 2
# pineapple 6 8
Filter by exact match: items
To extract rows or columns with exactly matching labels, specify a list in the items
argument. The order of the rows or columns in the result corresponds to the order in the specified list, not the original order.
print(df.filter(items=['A', 'C']))
# A C
# apple 0 2
# banana 3 5
# pineapple 6 8
print(df.filter(items=['C', 'A']))
# C A
# apple 2 0
# banana 5 3
# pineapple 8 6
You can do the same by specifying a list of labels with []
or loc[]
.
- pandas: Select rows/columns by index (numbers and names)
- pandas: Get/Set values with loc, iloc, at, iat
print(df[['C', 'A']])
# C A
# apple 2 0
# banana 5 3
# pineapple 8 6
print(df.loc[:, ['C', 'A']])
# C A
# apple 2 0
# banana 5 3
# pineapple 8 6
print(df.loc[['pineapple', 'apple']])
# A B C
# pineapple 6 7 8
# apple 0 1 2
Filter by partial match: like
To extract rows or columns with partially matching labels, specify a string in the like
argument.
print(df.filter(like='apple', axis=0))
# A B C
# apple 0 1 2
# pineapple 6 7 8
Rows or columns whose labels contain the specified string in like
(as in like in label == True
) are extracted. For more on the in
operator, see the following article.
To specify multiple conditions, use regular expressions (regex) as described next.
Filter by regex: regex
To extract rows or columns based on regex for their labels, use a regex pattern specified in the regex
argument.
See the official documentation for special characters in regex.
Extract rows whose labels end with 'e'
:
print(df.filter(regex='e$', axis=0))
# A B C
# apple 0 1 2
# pineapple 6 7 8
Extract rows whose labels begin with 'a'
or 'b'
.
print(df.filter(regex='^(a|b)', axis=0))
# A B C
# apple 0 1 2
# banana 3 4 5
Extract rows whose labels contain 'na'
or 'ne'
:
print(df.filter(regex='(na|ne)', axis=0))
# A B C
# banana 3 4 5
# pineapple 6 7 8
Rows or columns whose labels match the regex pattern (as in re.search(regex, label) == True
) are extracted. For more information on re.search()
, refer to the following article.
pandas.Series.filter()
The filter()
method is also provided for Series
. The usage is the same as with DataFrame
.
s = pd.Series([0, 1, 2], index=['apple', 'banana', 'pineapple'])
print(s)
# apple 0
# banana 1
# pineapple 2
# dtype: int64
print(s.filter(items=['pineapple', 'banana']))
# pineapple 2
# banana 1
# dtype: int64
print(s.filter(like='apple'))
# apple 0
# pineapple 2
# dtype: int64
print(s.filter(regex='^(a|b)'))
# apple 0
# banana 1
# dtype: int64