NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan()

Modified: | Tags: Python, NumPy

In NumPy, to replace NaN (np.nan) in an array (ndarray) with any values like 0, use np.nan_to_num(). Additionally, while np.isnan() is primarily used to identify NaN, its results can be used to replace NaN. You can also replace NaN with the mean of the non-NaN values.

To delete the row or column containing NaN instead of replacing them, see the following article.

For handling missing values in pandas, see the following article.

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as np

print(np.__version__)
# 1.26.1

NaN (np.nan) in NumPy

When you read a CSV file with np.genfromtxt(), by default, missing data is represented as NaN (Not a Number). These are displayed as nan when output with print().

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

If you want to generate NaN explicitly, use np.nan or float('nan'). You can also import the math module of the standard library and use math.nan. They are all the same.

a_nan = np.array([0, 1, np.nan, float('nan')])
print(a_nan)
# [ 0.  1. nan nan]

Since comparing NaN with == returns False, use np.isnan() to check if the value is NaN.

print(np.nan == np.nan)
# False

print(np.isnan(np.nan))
# True

np.isnan() can also check if each element of an ndarray is NaN.

print(a_nan == np.nan)
# [False False False False]

print(np.isnan(a_nan))
# [False False  True  True]

Replace NaN using np.genfromtxt() with filling_values

To fill missing data in a CSV file, use the filling_values argument with np.genfromtxt().

For example, fill NaN with 0:

a_fill = np.genfromtxt('data/src/sample_nan.csv', delimiter=',',
                       filling_values=0)
print(a_fill)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

Note that filling with the mean of the non-NaN values is not possible during the initial read with np.genfromtxt(). For this, refer to the method described below.

Replace NaN using np.nan_to_num()

You can use np.nan_to_num() to replace NaN.

Note that np.nan_to_num() also replaces infinity (inf). See the following article for details.

When you specify the array (ndarray) as the first argument to np.nan_to_num(), by default, a new ndarray is generated with NaN replaced by 0. The original ndarray remains unchanged.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.nan_to_num(a))
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

Setting the second argument (copy) to False modifies the original ndarray.

np.nan_to_num(a, copy=False)
print(a)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

From NumPy version 1.17, the third argument (nan) allows you to specify the value to replace NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.nan_to_num(a, nan=-1))
# [[11. 12. -1. 14.]
#  [21. -1. -1. 24.]
#  [31. 32. 33. 34.]]

You can use np.nanmean() to replace NaN with the mean of non-NaN values. This replacement can be done for the entire array or separately for each row or column.

print(np.nanmean(a))
# 23.555555555555557

print(np.nan_to_num(a, nan=np.nanmean(a)))
# [[11.         12.         23.55555556 14.        ]
#  [21.         23.55555556 23.55555556 24.        ]
#  [31.         32.         33.         34.        ]]

print(np.nanmean(a, axis=0, keepdims=True))
# [[21. 22. 33. 24.]]

print(np.nan_to_num(a, nan=np.nanmean(a, axis=0, keepdims=True)))
# [[11. 12. 33. 14.]
#  [21. 22. 33. 24.]
#  [31. 32. 33. 34.]]

print(np.nanmean(a, axis=1, keepdims=True))
# [[12.33333333]
#  [22.5       ]
#  [32.5       ]]

print(np.nan_to_num(a, nan=np.nanmean(a, axis=1, keepdims=True)))
# [[11.         12.         12.33333333 14.        ]
#  [21.         22.5        22.5        24.        ]
#  [31.         32.         33.         34.        ]]

If you specify an ndarray as the third argument (nan) in np.nan_to_num(), it will be broadcast to match the shape of the ndarray specified as the first argument.

If keepdims is set to True in np.nanmean(), the resulting array is broadcast correctly. While keepdims=False (default) is fine for axis=0, it is less error-prone to always set keepdims=True regardless of the axis.

For versions before 1.17, where the nan argument is not implemented, use the following method to replace NaN with values other than 0.

Identify and replace NaN using np.isnan()

You can use np.isnan() to check if values in an ndarray are NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

With the result from np.isnan(), you can assign a specific value to replace NaN.

a[np.isnan(a)] = 0
print(a)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

You can also use np.nanmean() to replace NaN with the mean of the non-missing values.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')

a[np.isnan(a)] = np.nanmean(a)
print(a)
# [[11.         12.         23.55555556 14.        ]
#  [21.         23.55555556 23.55555556 24.        ]
#  [31.         32.         33.         34.        ]]

To replace with the mean value for each row or column, use np.where().

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')

print(np.where(np.isnan(a), np.nanmean(a, axis=0, keepdims=True), a))
# [[11. 12. 33. 14.]
#  [21. 22. 33. 24.]
#  [31. 32. 33. 34.]]

print(np.where(np.isnan(a), np.nanmean(a, axis=1, keepdims=True), a))
# [[11.         12.         12.33333333 14.        ]
#  [21.         22.5        22.5        24.        ]
#  [31.         32.         33.         34.        ]]

Related Categories

Related Articles