pandas: Select rows/columns by index (numbers and names)

Modified: | Tags: Python, pandas

You can select and get rows, columns, and elements in pandas.DataFrame and pandas.Series by index (numbers and names) using [] (square brackets).

You can use at, iat, loc, and iloc to select a range more explicitly. It is also possible to select columns by slice and rows by row name/number or a list of them.

Use set_index() to assign a column to index.

The sample code in this article is based on pandas version 2.0.3. The following DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'],
                   'col_1': ['01', '11', '21', '31', '41'],
                   'col_2': ['02', '12', '22', '32', '42'],
                   'col_3': ['03', '13', '23', '33', '43']},
                  index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])
print(df)
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33
# row_4    40    41    42    43

Select columns by column numbers/names using []

[Column name]: Get a single column as pandas.Series

You can get the column as Series by specifying the column name in [].

print(df['col_2'])
print(type(df['col_2']))
# row_0    02
# row_1    12
# row_2    22
# row_3    32
# row_4    42
# Name: col_2, dtype: object
# <class 'pandas.core.series.Series'>

You may also specify a column name like .<column_name>. Note that if the column name conflicts with existing method names, the method takes precedence.

print(df.col_2)
# row_0    02
# row_1    12
# row_2    22
# row_3    32
# row_4    42
# Name: col_2, dtype: object

[List of column names]: Get single or multiple columns as pandas.DataFrame

You can get multiple columns as DataFrame by specifying a list of column names in []. The columns will be in the order of the specified list.

print(df[['col_2', 'col_0']])
print(type(df[['col_2', 'col_0']]))
#       col_2 col_0
# row_0    02    00
# row_1    12    10
# row_2    22    20
# row_3    32    30
# row_4    42    40
# <class 'pandas.core.frame.DataFrame'>

If you specify a list with one element, a single-column DataFrame is returned, not a Series.

print(df[['col_2']])
print(type(df[['col_2']]))
#       col_2
# row_0    02
# row_1    12
# row_2    22
# row_3    32
# row_4    42
# <class 'pandas.core.frame.DataFrame'>

loc and iloc

You can also use loc to specify a slice based on column names, and iloc to select columns by their numbers, either individually or as a range (list or slice).

print(df.loc[:, 'col_1':'col_3'])
#       col_1 col_2 col_3
# row_0    01    02    03
# row_1    11    12    13
# row_2    21    22    23
# row_3    31    32    33
# row_4    41    42    43

print(df.iloc[:, 2])
# row_0    02
# row_1    12
# row_2    22
# row_3    32
# row_4    42
# Name: col_2, dtype: object

Select rows by row numbers/names using []

[Slice of row number/name]: Get single or multiple rows as pandas.DataFrame

You can get multiple rows as a DataFrame by specifying a slice in [].

print(df[1:4])
print(type(df[1:4]))
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33
# <class 'pandas.core.frame.DataFrame'>

You may specify a negative value and step (start:stop:step) as in a normal slice. For example, you can use slices to extract odd or even rows.

print(df[:-3])
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13

print(df[::2])
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_2    20    21    22    23
# row_4    40    41    42    43

print(df[1::2])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_3    30    31    32    33

An error is raised if a row number is specified alone instead of a slice.

# print(df[1])
# KeyError: 1

If only one row is selected, a single-row DataFrame is returned, not a Series.

print(df[1:2])
print(type(df[1:2]))
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# <class 'pandas.core.frame.DataFrame'>

You may also specify a slice of row name instead of row number. In the case of a slice with row name, the stop row is included.

print(df['row_1':'row_3'])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_2    20    21    22    23
# row_3    30    31    32    33

loc and iloc

You can also use loc to select rows by their names, and iloc to select rows by their numbers, either individually or in a list. See the following article for details.

print(df.loc[['row_1', 'row_3']])
#       col_0 col_1 col_2 col_3
# row_1    10    11    12    13
# row_3    30    31    32    33

print(df.iloc[1])
# col_0    10
# col_1    11
# col_2    12
# col_3    13
# Name: row_1, dtype: object

[Boolean array/Series]: Get True rows as pandas.DataFrame

By specifying a Boolean array (list or numpy.ndarray) in [], you can extract the True rows as DataFrame.

l_bool = [True, False, False, True, True]
print(df[l_bool])
print(type(df[l_bool]))
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_3    30    31    32    33
# row_4    40    41    42    43
# <class 'pandas.core.frame.DataFrame'>

If there is only one True, a single-row DataFrame is returned, not a Series.

l_bool = [True, False, False, False, False]
print(df[l_bool])
print(type(df[l_bool]))
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# <class 'pandas.core.frame.DataFrame'>

An error is raised if the number of elements does not match.

# print(df[[True, False, False]])
# ValueError: Item wrong length 3 instead of 5.

You can also specify the Boolean Series. Rows are extracted based on labels, not order.

s_bool = pd.Series([True, False, False, True, True], index=reversed(df.index))
print(s_bool)
# row_4     True
# row_3    False
# row_2    False
# row_1     True
# row_0     True
# dtype: bool

print(df[s_bool])
#       col_0 col_1 col_2 col_3
# row_0    00    01    02    03
# row_1    10    11    12    13
# row_4    40    41    42    43

An error is raised if the number of elements or labels does not match.

s_bool_wrong = pd.Series([True, False, False, True, True],
                         index=['A', 'B', 'C', 'D', 'E'])
# print(df[s_bool_wrong])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Using a Boolean Series, you can select rows by conditions. Refer to the following article for details.

Select elements of pandas.Series by numbers/names

Consider the following Series as an example.

s = df['col_0']
print(s)
# row_0    00
# row_1    10
# row_2    20
# row_3    30
# row_4    40
# Name: col_0, dtype: object

[Number/name]: Get the value of a single element

You can get the value of the element by specifying the numbers (positions) or names (labels).

print(s[3])
print(type(s[3]))
# 30
# <class 'str'>

When specifying by number, you can use a negative value to indicate the position from the end. -1 represents the last element.

You may also specify the name like .<name>. Note that if the name conflicts with existing method names, the method takes precedence.

print(s[-1])
# 40

print(s['row_0'])
# 00

print(s.row_0)
# 00

[List of numbers/names]: Get single or multiple elements as pandas.Series

You can select multiple values as Series by specifying a list of numbers/names. The elements will be in the order of the specified list. If a list with one element is specified, a single-element Series is returned.

print(s[[3, 1]])
print(type(s[[3, 1]]))
# row_3    30
# row_1    10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

print(s[[1]])
print(type(s[[1]]))
# row_1    10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

You can specify a list of names.

print(s[['row_3', 'row_1']])
# row_3    30
# row_1    10
# Name: col_0, dtype: object

print(s[['row_1']])
# row_1    10
# Name: col_0, dtype: object

[Slice of numbers/names]: Get single or multiple elements as pandas.Series

You can also select multiple values as Series by specifying a slice of numbers/names. If only one element is selected, a single-element Series is returned.

print(s[1:3])
print(type(s[1:3]))
# row_1    10
# row_2    20
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

print(s[1:2])
print(type(s[1:2]))
# row_1    10
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

In the case of name slices, the stop element of start:stop is also included.

print(s['row_1':'row_3'])
# row_1    10
# row_2    20
# row_3    30
# Name: col_0, dtype: object

print(s['row_1':'row_1'])
# row_1    10
# Name: col_0, dtype: object

[Boolean array/Series]: Get True elements as pandas.Series

By specifying a Boolean array (list or numpy.ndarray) in [], you can extract the True elements as Series.

l_bool = [True, False, False, True, True]
print(s[l_bool])
print(type(s[l_bool]))
# row_0    00
# row_3    30
# row_4    40
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

If there is only one True, a single-element Series is returned.

l_bool = [True, False, False, False, False]
print(s[l_bool])
print(type(s[l_bool]))
# row_0    00
# Name: col_0, dtype: object
# <class 'pandas.core.series.Series'>

An error is raised if the number of elements does not match.

# print(s[[True, False, False]])
# IndexError: Boolean index has wrong length: 3 instead of 5

You can also specify the Boolean Series. Elements are extracted based on labels, not order.

s_bool = pd.Series(l_bool, index=reversed(df.index))
print(s_bool)
# row_4     True
# row_3    False
# row_2    False
# row_1    False
# row_0    False
# dtype: bool

print(s[s_bool])
# row_4    40
# Name: col_0, dtype: object

An error is raised if the number of elements or labels does not match.

s_bool_wrong = pd.Series([True, False, False, True, True],
                         index=['A', 'B', 'C', 'D', 'E'])
# print(s[s_bool_wrong])
# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Select elements of pandas.DataFrame

You can get the value of an element from a DataFrame by first extracting a Series and then retrieving the value from that Series.

print(df['col_1']['row_2'])
# 21

You may also extract any group by slices or lists.

print(df['row_1':'row_3'][['col_1', 'col_3']])
#       col_1 col_3
# row_1    11    13
# row_2    21    23
# row_3    31    33

However, using this approach ([...][...]), known as chained indexing, might trigger a SettingWithCopyWarning when you assign values.

You can select rows or columns at once with at, iat, loc, or iloc.

print(df.at['row_2', 'col_1'])
# 21

print(df.loc['row_1':'row_3', ['col_1', 'col_3']])
#       col_1 col_3
# row_1    11    13
# row_2    21    23
# row_3    31    33

Caution when row and column names are integers

Be careful when row and column names are integers.

Consider the following DataFrame as an example.

df = pd.DataFrame([[0, 10, 20], [30, 40, 50], [60, 70, 80]],
                  index=[2, 0, 1], columns=[1, 2, 0])
print(df)
#     1   2   0
# 2   0  10  20
# 0  30  40  50
# 1  60  70  80

If you use a [scalar value] or [list], the specified value is interpreted as a column name.

print(df[0])
# 2    20
# 0    50
# 1    80
# Name: 0, dtype: int64

print(df[[0, 2]])
#     0   2
# 2  20  10
# 0  50  40
# 1  80  70

If [slice], the specified value is considered a row number, not a row name. Negative values are also allowed.

print(df[:2])
#     1   2   0
# 2   0  10  20
# 0  30  40  50

print(df[-2:])
#     1   2   0
# 0  30  40  50
# 1  60  70  80

Use loc or iloc to clearly specify whether it is a name (label) or a number (position). In the case of loc, the stop element of start:stop is also included.

print(df.loc[:2])
#    1   2   0
# 2  0  10  20

print(df.iloc[:2])
#     1   2   0
# 2   0  10  20
# 0  30  40  50

For Series:

s = df[2]
print(s)
# 2    10
# 0    40
# 1    70
# Name: 2, dtype: int64

In Series, the specified value is considered a name, not a number.

print(s[0])
# 40

Use at or iat to clearly specify whether it is a name or a number.

print(s.at[0])
# 40

print(s.iat[0])
# 10

Note that specifying [-1] is considered as referring to a label named -1, not the last element. You can use iat.

# print(s[-1])
# KeyError: -1

print(s.iat[-1])
# 70

Thus, it is better to use at, iat, loc, or iloc when the row or column name is an integer.

Related Categories

Related Articles