pandas: How to fix SettingWithCopyWarning: A value is trying to be set on ...
SettingWithCopyWarning
is a frequently encountered warning in pandas. As the message indicates, this warning typically arises when a value is trying to be set on a copy of a slice from a pandas.DataFrame
.
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Although it's not an error and the process will continue to run, it may lead to unexpected results if not addressed.
For more details on loc[]
and iloc[]
, and views and copies in pandas, see the following articles.
Although not recommended, you can suppress the warning using the warnings
module.
Please note that the sample code used in this article is based on pandas version 2.0.3
and behavior may vary with different versions.
import pandas as pd
print(pd.__version__)
# 2.0.3
Chained indexing and assignment
How chained indexing and assignment causes SettingWithCopyWarning
As mentioned in the official documentation, the primary cause of SettingWithCopyWarning
is chained indexing and chained assignment.
Chained indexing refers to the consecutive use of []
, loc[]
, and iloc[]
.
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
print(df.loc['x':'y']['a'])
# x 0
# y 1
# Name: a, dtype: int64
Chained assignment refers to performing an assignment within chained indexing, sometimes resulting in a SettingWithCopyWarning
.
df.loc['x':'y']['a'] = 100
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_40458/3771299631.py:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
#
# See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# df.loc['x':'y']['a'] = 100
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
As the warning message indicates, "A value is trying to be set on a copy of a slice from a DataFrame
". If the first indexing []
returns a copy, the value is assigned to this copy when the second indexing []
is applied. As a result, the value in the original DataFrame
remains unchanged.
Note that the results may vary depending on the pandas version. In version 0.25.1
, the DataFrame
value was updated with the same code as above.
There are cases where the DataFrame
value doesn't change even when a SettingWithCopyWarning
isn't issued.
print(df.loc[['x', 'y']]['a'])
# x 0
# y 1
# Name: a, dtype: int64
df.loc[['x', 'y']]['a'] = 100
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
It's not always possible to determine whether []
, loc[]
, or iloc[]
returns a copy or a view. Hence, the occurrence of a SettingWithCopyWarning
doesn't necessarily indicate a problem, just as its absence doesn't guarantee there isn't one.
For robust code, avoid chained indexing when assigning values to a DataFrame
or Series
.
Solution: Avoid chaining
To avoid chained indexing, combine the indexing operations into a single one, as the warning message suggests.
Try using .loc[row_indexer,col_indexer] = value instead
The two examples above can be rewritten with loc
as follows:
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
df.loc['x':'y', 'a'] = 100
print(df)
# a b
# x 100 3
# y 100 4
# z 2 5
df.loc[['x', 'y'], 'a'] = 0
print(df)
# a b
# x 0 3
# y 0 4
# z 2 5
Performing a single indexing operation ensures assignment of the value to the original DataFrame
and offers a speed advantage.
When specifying a range using a combination of row/column names and row/column numbers, you may be tempted to use chained indexing operations. However, loc
requires row/column names, and iloc
requires row/column numbers. Thus, you cannot mix them, such as specifying row numbers and column names together.
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
print(df.iloc[[0, 1]]['a'])
# x 0
# y 1
# Name: a, dtype: int64
# df.loc[[0, 1], 'a']
# KeyError: "None of [Index([0, 1], dtype='int64')] are in the [index]"
# df.iloc[[0, 1], 'a']
# ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
In such cases, you can use the index
and columns
attributes to convert row/column numbers to row/column names.
print(df.index[0])
# x
print(df.index[1])
# y
print(df.columns[0])
# a
print(df.columns[1])
# b
Although this makes the code somewhat longer, it enables you to specify everything using row/column names.
print(df.loc[[df.index[0], df.index[1]], 'a'])
# x 0
# y 1
# Name: a, dtype: int64
Be careful when specifying the stop
value in a slice start:stop:step
. In iloc
, which uses row/column numbers, the stop
value is excluded from the result. On the other hand, in loc
, which uses row/column names, the stop
value is included in the result.
When retrieving row/column names from row/column numbers using the index
and columns
attributes for the stop
value, subtract 1 from the value.
print(df.iloc[:2]['a'])
# x 0
# y 1
# Name: a, dtype: int64
print(df.loc[: df.index[2], 'a'])
# x 0
# y 1
# z 2
# Name: a, dtype: int64
print(df.loc[: df.index[2 - 1], 'a'])
# x 0
# y 1
# Name: a, dtype: int64
Chained indexing and assignment via variables
Problem with variable assignment
The same problem occurs when the first indexing result is assigned to a variable.
Even though it might appear that there's no chaining, the example below is equivalent to df.loc['x':'y']['a']
, and assigning a value to it will trigger a SettingWithCopyWarning
.
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
df_slice = df.loc['x':'y']
print(df_slice)
# a b
# x 0 3
# y 1 4
print(df_slice['a'])
# x 0
# y 1
# Name: a, dtype: int64
df_slice['a'] = 100
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_40458/3718525832.py:1: SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
#
# See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# df_slice['a'] = 100
print(df_slice)
# a b
# x 100 3
# y 100 4
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
As mentioned above, be aware that the results may vary depending on the pandas version. In version 0.25.1
, the original DataFrame
value was also updated with the same code as above.
Solution: Create a copy with copy()
It's impossible to determine whether []
, loc[]
, or iloc[]
indexing operations create a view or a copy, and you can't always create a view.
If you're working with temporary code or won't be using the original DataFrame
after indexing (i.e., changes in the value aren't a concern), there's no need to exercise extreme caution. However, it can be risky to assume that a view will be returned in code that performs various operations.
A reliable solution is to explicitly create a copy using the copy()
method. If you always treat the result of an indexing operation stored in a variable as a copy, you won't encounter a SettingWithCopyWarning
.
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
df_slice_copy = df.loc['x':'y'].copy()
print(df_slice_copy)
# a b
# x 0 3
# y 1 4
df_slice_copy['a'] = 100
print(df_slice_copy)
# a b
# x 100 3
# y 100 4
print(df)
# a b
# x 0 3
# y 1 4
# z 2 5
However, be aware that if an indexing operation returns a copy and you call copy()
, extra memory may be temporarily allocated. This can be a concern when dealing with large data. If the data or processing is limited and you can confirm there will be no issues in advance, you may consider not using copy()
.