pandas: Select rows/columns by index (numbers and names)
You can select and get rows, columns, and elements in pandas.DataFrame
and pandas.Series
by index (numbers and names) using []
(square brackets).
You can use at
, iat
, loc
, and iloc
to select a range more explicitly. It is also possible to select columns by slice and rows by row name/number or a list of them.
Use set_index()
to assign a column to index
.
The sample code in this article is based on pandas version 2.0.3
. The following DataFrame
is used as an example.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'],
'col_1': ['01', '11', '21', '31', '41'],
'col_2': ['02', '12', '22', '32', '42'],
'col_3': ['03', '13', '23', '33', '43']},
index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])
print(df)
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
# row_4 40 41 42 43
Select columns by column numbers/names using []
[Column name]
: Get a single column as pandas.Series
You can get the column as Series
by specifying the column name in []
.
print(df['col_2'])
print(type(df['col_2']))
# row_0 02
# row_1 12
# row_2 22
# row_3 32
# row_4 42
# Name: col_2, dtype: object
# <class 'pandas.core.series.Series'>
You may also specify a column name like .<column_name>
. Note that if the column name conflicts with existing method names, the method takes precedence.
print(df.col_2)
# row_0 02
# row_1 12
# row_2 22
# row_3 32
# row_4 42
# Name: col_2, dtype: object
[List of column names]
: Get single or multiple columns as pandas.DataFrame
You can get multiple columns as DataFrame
by specifying a list of column names in []
. The columns will be in the order of the specified list.
print(df[['col_2', 'col_0']])
print(type(df[['col_2', 'col_0']]))
# col_2 col_0
# row_0 02 00
# row_1 12 10
# row_2 22 20
# row_3 32 30
# row_4 42 40
# <class 'pandas.core.frame.DataFrame'>
If you specify a list with one element, a single-column DataFrame
is returned, not a Series
.
print(df[['col_2']])
print(type(df[['col_2']]))
# col_2
# row_0 02
# row_1 12
# row_2 22
# row_3 32
# row_4 42
# <class 'pandas.core.frame.DataFrame'>
loc
and iloc
You can also use loc
to specify a slice based on column names, and iloc
to select columns by their numbers, either individually or as a range (list or slice).
print(df.loc[:, 'col_1':'col_3'])
# col_1 col_2 col_3
# row_0 01 02 03
# row_1 11 12 13
# row_2 21 22 23
# row_3 31 32 33
# row_4 41 42 43
print(df.iloc[:, 2])
# row_0 02
# row_1 12
# row_2 22
# row_3 32
# row_4 42
# Name: col_2, dtype: object
Select rows by row numbers/names using []
[Slice of row number/name]
: Get single or multiple rows as pandas.DataFrame
You can get multiple rows as a DataFrame
by specifying a slice in []
.
print(df[1:4])
print(type(df[1:4]))
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
# <class 'pandas.core.frame.DataFrame'>
You may specify a negative value and step
(start:stop:step
) as in a normal slice. For example, you can use slices to extract odd or even rows.
print(df[:-3])
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
print(df[::2])
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_2 20 21 22 23
# row_4 40 41 42 43
print(df[1::2])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_3 30 31 32 33
An error is raised if a row number is specified alone instead of a slice.
# print(df[1])
# KeyError: 1
If only one row is selected, a single-row DataFrame
is returned, not a Series
.
print(df[1:2])
print(type(df[1:2]))
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# <class 'pandas.core.frame.DataFrame'>
You may also specify a slice of row name instead of row number. In the case of a slice with row name, the stop
row is included.
print(df['row_1':'row_3'])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_2 20 21 22 23
# row_3 30 31 32 33
loc
and iloc
You can also use loc
to select rows by their names, and iloc
to select rows by their numbers, either individually or in a list. See the following article for details.
print(df.loc[['row_1', 'row_3']])
# col_0 col_1 col_2 col_3
# row_1 10 11 12 13
# row_3 30 31 32 33
print(df.iloc[1])
# col_0 10
# col_1 11
# col_2 12
# col_3 13
# Name: row_1, dtype: object
[Boolean array/Series]
: Get True
rows as pandas.DataFrame
By specifying a Boolean array (list
or numpy.ndarray
) in []
, you can extract the True
rows as DataFrame
.
l_bool = [True, False, False, True, True]
print(df[l_bool])
print(type(df[l_bool]))
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_3 30 31 32 33
# row_4 40 41 42 43
# <class 'pandas.core.frame.DataFrame'>
If there is only one True
, a single-row DataFrame
is returned, not a Series
.
l_bool = [True, False, False, False, False]
print(df[l_bool])
print(type(df[l_bool]))
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# <class 'pandas.core.frame.DataFrame'>
An error is raised if the number of elements does not match.
# print(df[[True, False, False]])
# ValueError: Item wrong length 3 instead of 5.
You can also specify the Boolean Series
. Rows are extracted based on labels, not order.
s_bool = pd.Series([True, False, False, True, True], index=reversed(df.index))
print(s_bool)
# row_4 True
# row_3 False
# row_2 False
# row_1 True
# row_0 True
# dtype: bool
print(df[s_bool])
# col_0 col_1 col_2 col_3
# row_0 00 01 02 03
# row_1 10 11 12 13
# row_4 40 41 42 43
An error is raised if the number of elements or labels does not match.
s_bool_wrong = pd.Series([True, False, False, True, True],
index=['A', 'B', 'C', 'D', 'E'])
# print(df[s_bool_wrong])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Using a Boolean Series
, you can select rows by conditions. Refer to the following article for details.
Select elements of pandas.Series
by numbers/names
Consider the following Series
as an example.
s = df['col_0']
print(s)
# row_0 00
# row_1 10
# row_2 20
# row_3 30
# row_4 40
# Name: col_0, dtype: object
[Number/name]
: Get the value of a single element
You can get the value of the element by specifying the numbers (positions) or names (labels).
print(s[3])
print(type(s[3]))
# 30
# <class 'str'>
When specifying by number, you can use a negative value to indicate the position from the end. -1
represents the last element.
You may also specify the name like .<name>
. Note that if the name conflicts with existing method names, the method takes precedence.
print(s[-1])
# 40
print(s['row_0'])
# 00
print(s.row_0)
# 00
[List of numbers/names]
: Get single or multiple elements as pandas.Series
You can select multiple values as Series
by specifying a list of numbers/names. The elements will be in the order of the specified list. If a list with one element is specified, a single-element Series
is returned.
print(s[[3, 1]])
print(type(s[[3, 1]]))
# row_3 30
# row_1 10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
print(s[[1]])
print(type(s[[1]]))
# row_1 10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
You can specify a list of names.
print(s[['row_3', 'row_1']])
# row_3 30
# row_1 10
# Name: col_0, dtype: object
print(s[['row_1']])
# row_1 10
# Name: col_0, dtype: object
[Slice of numbers/names]
: Get single or multiple elements as pandas.Series
You can also select multiple values as Series
by specifying a slice of numbers/names. If only one element is selected, a single-element Series
is returned.
print(s[1:3])
print(type(s[1:3]))
# row_1 10
# row_2 20
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
print(s[1:2])
print(type(s[1:2]))
# row_1 10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
In the case of name slices, the stop
element of start:stop
is also included.
print(s['row_1':'row_3'])
# row_1 10
# row_2 20
# row_3 30
# Name: col_0, dtype: object
print(s['row_1':'row_1'])
# row_1 10
# Name: col_0, dtype: object
[Boolean array/Series]
: Get True
elements as pandas.Series
By specifying a Boolean array (list
or numpy.ndarray
) in []
, you can extract the True
elements as Series
.
l_bool = [True, False, False, True, True]
print(s[l_bool])
print(type(s[l_bool]))
# row_0 00
# row_3 30
# row_4 40
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
If there is only one True
, a single-element Series
is returned.
l_bool = [True, False, False, False, False]
print(s[l_bool])
print(type(s[l_bool]))
# row_0 00
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>
An error is raised if the number of elements does not match.
# print(s[[True, False, False]])
# IndexError: Boolean index has wrong length: 3 instead of 5
You can also specify the Boolean Series
. Elements are extracted based on labels, not order.
s_bool = pd.Series(l_bool, index=reversed(df.index))
print(s_bool)
# row_4 True
# row_3 False
# row_2 False
# row_1 False
# row_0 False
# dtype: bool
print(s[s_bool])
# row_4 40
# Name: col_0, dtype: object
An error is raised if the number of elements or labels does not match.
s_bool_wrong = pd.Series([True, False, False, True, True],
index=['A', 'B', 'C', 'D', 'E'])
# print(s[s_bool_wrong])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Select elements of pandas.DataFrame
You can get the value of an element from a DataFrame
by first extracting a Series
and then retrieving the value from that Series
.
print(df['col_1']['row_2'])
# 21
You may also extract any group by slices or lists.
print(df['row_1':'row_3'][['col_1', 'col_3']])
# col_1 col_3
# row_1 11 13
# row_2 21 23
# row_3 31 33
However, using this approach ([...][...]
), known as chained indexing, might trigger a SettingWithCopyWarning
when you assign values.
You can select rows or columns at once with at
, iat
, loc
, or iloc
.
print(df.at['row_2', 'col_1'])
# 21
print(df.loc['row_1':'row_3', ['col_1', 'col_3']])
# col_1 col_3
# row_1 11 13
# row_2 21 23
# row_3 31 33
Caution when row and column names are integers
Be careful when row and column names are integers.
Consider the following DataFrame
as an example.
df = pd.DataFrame([[0, 10, 20], [30, 40, 50], [60, 70, 80]],
index=[2, 0, 1], columns=[1, 2, 0])
print(df)
# 1 2 0
# 2 0 10 20
# 0 30 40 50
# 1 60 70 80
If you use a [scalar value]
or [list]
, the specified value is interpreted as a column name.
print(df[0])
# 2 20
# 0 50
# 1 80
# Name: 0, dtype: int64
print(df[[0, 2]])
# 0 2
# 2 20 10
# 0 50 40
# 1 80 70
If [slice]
, the specified value is considered a row number, not a row name. Negative values are also allowed.
print(df[:2])
# 1 2 0
# 2 0 10 20
# 0 30 40 50
print(df[-2:])
# 1 2 0
# 0 30 40 50
# 1 60 70 80
Use loc
or iloc
to clearly specify whether it is a name (label) or a number (position). In the case of loc
, the stop
element of start:stop
is also included.
print(df.loc[:2])
# 1 2 0
# 2 0 10 20
print(df.iloc[:2])
# 1 2 0
# 2 0 10 20
# 0 30 40 50
For Series
:
s = df[2]
print(s)
# 2 10
# 0 40
# 1 70
# Name: 2, dtype: int64
In Series
, the specified value is considered a name, not a number.
print(s[0])
# 40
Use at
or iat
to clearly specify whether it is a name or a number.
print(s.at[0])
# 40
print(s.iat[0])
# 10
Note that specifying [-1]
is considered as referring to a label named -1
, not the last element. You can use iat
.
# print(s[-1])
# KeyError: -1
print(s.iat[-1])
# 70
Thus, it is better to use at
, iat
, loc
, or iloc
when the row or column name is an integer.