pandas: How to fix SettingWithCopyWarning: A value is trying to be set on ...

Modified: | Tags: Python, pandas, Error handling

SettingWithCopyWarning is a frequently encountered warning in pandas. As the message indicates, this warning typically arises when a value is trying to be set on a copy of a slice from a pandas.DataFrame.

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Although it's not an error and the process will continue to run, it may lead to unexpected results if not addressed.

For more details on loc[] and iloc[], and views and copies in pandas, see the following articles.

Although not recommended, you can suppress the warning using the warnings module.

Please note that the sample code used in this article is based on pandas version 2.0.3 and behavior may vary with different versions.

import pandas as pd

print(pd.__version__)
# 2.0.3

Chained indexing and assignment

How chained indexing and assignment causes SettingWithCopyWarning

As mentioned in the official documentation, the primary cause of SettingWithCopyWarning is chained indexing and chained assignment.

Chained indexing refers to the consecutive use of [], loc[], and iloc[].

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

print(df.loc['x':'y']['a'])
# x    0
# y    1
# Name: a, dtype: int64

Chained assignment refers to performing an assignment within chained indexing, sometimes resulting in a SettingWithCopyWarning.

df.loc['x':'y']['a'] = 100
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_40458/3771299631.py:1: SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
# 
# See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
#   df.loc['x':'y']['a'] = 100

print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

As the warning message indicates, "A value is trying to be set on a copy of a slice from a DataFrame". If the first indexing [] returns a copy, the value is assigned to this copy when the second indexing [] is applied. As a result, the value in the original DataFrame remains unchanged.

Note that the results may vary depending on the pandas version. In version 0.25.1, the DataFrame value was updated with the same code as above.

There are cases where the DataFrame value doesn't change even when a SettingWithCopyWarning isn't issued.

print(df.loc[['x', 'y']]['a'])
# x    0
# y    1
# Name: a, dtype: int64

df.loc[['x', 'y']]['a'] = 100
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

It's not always possible to determine whether [], loc[], or iloc[] returns a copy or a view. Hence, the occurrence of a SettingWithCopyWarning doesn't necessarily indicate a problem, just as its absence doesn't guarantee there isn't one.

For robust code, avoid chained indexing when assigning values to a DataFrame or Series.

Solution: Avoid chaining

To avoid chained indexing, combine the indexing operations into a single one, as the warning message suggests.

Try using .loc[row_indexer,col_indexer] = value instead

The two examples above can be rewritten with loc as follows:

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

df.loc['x':'y', 'a'] = 100
print(df)
#      a  b
# x  100  3
# y  100  4
# z    2  5

df.loc[['x', 'y'], 'a'] = 0
print(df)
#    a  b
# x  0  3
# y  0  4
# z  2  5

Performing a single indexing operation ensures assignment of the value to the original DataFrame and offers a speed advantage.

When specifying a range using a combination of row/column names and row/column numbers, you may be tempted to use chained indexing operations. However, loc requires row/column names, and iloc requires row/column numbers. Thus, you cannot mix them, such as specifying row numbers and column names together.

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

print(df.iloc[[0, 1]]['a'])
# x    0
# y    1
# Name: a, dtype: int64

# df.loc[[0, 1], 'a']
# KeyError: "None of [Index([0, 1], dtype='int64')] are in the [index]"

# df.iloc[[0, 1], 'a']
# ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

In such cases, you can use the index and columns attributes to convert row/column numbers to row/column names.

print(df.index[0])
# x

print(df.index[1])
# y

print(df.columns[0])
# a

print(df.columns[1])
# b

Although this makes the code somewhat longer, it enables you to specify everything using row/column names.

print(df.loc[[df.index[0], df.index[1]], 'a'])
# x    0
# y    1
# Name: a, dtype: int64

Be careful when specifying the stop value in a slice start:stop:step. In iloc, which uses row/column numbers, the stop value is excluded from the result. On the other hand, in loc, which uses row/column names, the stop value is included in the result.

When retrieving row/column names from row/column numbers using the index and columns attributes for the stop value, subtract 1 from the value.

print(df.iloc[:2]['a'])
# x    0
# y    1
# Name: a, dtype: int64

print(df.loc[: df.index[2], 'a'])
# x    0
# y    1
# z    2
# Name: a, dtype: int64

print(df.loc[: df.index[2 - 1], 'a'])
# x    0
# y    1
# Name: a, dtype: int64

Chained indexing and assignment via variables

Problem with variable assignment

The same problem occurs when the first indexing result is assigned to a variable.

Even though it might appear that there's no chaining, the example below is equivalent to df.loc['x':'y']['a'], and assigning a value to it will trigger a SettingWithCopyWarning.

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

df_slice = df.loc['x':'y']
print(df_slice)
#    a  b
# x  0  3
# y  1  4

print(df_slice['a'])
# x    0
# y    1
# Name: a, dtype: int64

df_slice['a'] = 100
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_40458/3718525832.py:1: SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame.
# Try using .loc[row_indexer,col_indexer] = value instead
# 
# See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
#   df_slice['a'] = 100

print(df_slice)
#      a  b
# x  100  3
# y  100  4

print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

As mentioned above, be aware that the results may vary depending on the pandas version. In version 0.25.1, the original DataFrame value was also updated with the same code as above.

Solution: Create a copy with copy()

It's impossible to determine whether [], loc[], or iloc[] indexing operations create a view or a copy, and you can't always create a view.

If you're working with temporary code or won't be using the original DataFrame after indexing (i.e., changes in the value aren't a concern), there's no need to exercise extreme caution. However, it can be risky to assume that a view will be returned in code that performs various operations.

A reliable solution is to explicitly create a copy using the copy() method. If you always treat the result of an indexing operation stored in a variable as a copy, you won't encounter a SettingWithCopyWarning.

df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]}, index=['x', 'y', 'z'])
print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

df_slice_copy = df.loc['x':'y'].copy()
print(df_slice_copy)
#    a  b
# x  0  3
# y  1  4

df_slice_copy['a'] = 100
print(df_slice_copy)
#      a  b
# x  100  3
# y  100  4

print(df)
#    a  b
# x  0  3
# y  1  4
# z  2  5

However, be aware that if an indexing operation returns a copy and you call copy(), extra memory may be temporarily allocated. This can be a concern when dealing with large data. If the data or processing is limited and you can confirm there will be no issues in advance, you may consider not using copy().

Related Categories

Related Articles