NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan()

Modified: 2024-01-23 | Tags: Python, NumPy

In NumPy, to replace NaN (np.nan) in an array (ndarray) with any values like 0, use np.nan_to_num(). Additionally, while np.isnan() is primarily used to identify NaN, its results can be used to replace NaN. You can also replace NaN with the mean of the non-NaN values.

Contents

NaN (np.nan) in NumPy
Replace NaN using np.genfromtxt() with filling_values
Replace NaN using np.nan_to_num()
Identify and replace NaN using np.isnan()

To delete the row or column containing NaN instead of replacing them, see the following article.

NumPy: Remove NaN (np.nan) from an array

For handling missing values in pandas, see the following article.

Missing values in pandas (nan, None, pd.NA)

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as np

print(np.__version__)
# 1.26.1

source: numpy_nan_replace.py

`NaN` (`np.nan`) in NumPy

When you read a CSV file with np.genfromtxt(), by default, missing data is represented as NaN (Not a Number). These are displayed as nan when output with print().

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

If you want to generate NaN explicitly, use np.nan or float('nan'). You can also import the math module of the standard library and use math.nan. They are all the same.

What is nan in Python (float('nan'), math.nan, np.nan)

a_nan = np.array([0, 1, np.nan, float('nan')])
print(a_nan)
# [ 0.  1. nan nan]

source: numpy_nan_replace.py

Since comparing NaN with == returns False, use np.isnan() to check if the value is NaN.

numpy.isnan — NumPy v1.26 Manual

print(np.nan == np.nan)
# False

print(np.isnan(np.nan))
# True

source: numpy_nan_replace.py

np.isnan() can also check if each element of an ndarray is NaN.

print(a_nan == np.nan)
# [False False False False]

print(np.isnan(a_nan))
# [False False  True  True]

source: numpy_nan_replace.py

Replace `NaN` using `np.genfromtxt()` with `filling_values`

To fill missing data in a CSV file, use the filling_values argument with np.genfromtxt().

For example, fill NaN with 0:

a_fill = np.genfromtxt('data/src/sample_nan.csv', delimiter=',',
                       filling_values=0)
print(a_fill)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

Note that filling with the mean of the non-NaN values is not possible during the initial read with np.genfromtxt(). For this, refer to the method described below.

Replace `NaN` using `np.nan_to_num()`

You can use np.nan_to_num() to replace NaN.

numpy.nan_to_num — NumPy v1.26 Manual

Note that np.nan_to_num() also replaces infinity (inf). See the following article for details.

Infinity (inf) in Python

When you specify the array (ndarray) as the first argument to np.nan_to_num(), by default, a new ndarray is generated with NaN replaced by 0. The original ndarray remains unchanged.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.nan_to_num(a))
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

Setting the second argument (copy) to False modifies the original ndarray.

np.nan_to_num(a, copy=False)
print(a)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

From NumPy version 1.17, the third argument (nan) allows you to specify the value to replace NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.nan_to_num(a, nan=-1))
# [[11. 12. -1. 14.]
#  [21. -1. -1. 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

You can use np.nanmean() to replace NaN with the mean of non-NaN values. This replacement can be done for the entire array or separately for each row or column.

NumPy: Functions ignoring NaN (np.nansum, np.nanmean, etc.)

print(np.nanmean(a))
# 23.555555555555557

print(np.nan_to_num(a, nan=np.nanmean(a)))
# [[11.         12.         23.55555556 14.        ]
#  [21.         23.55555556 23.55555556 24.        ]
#  [31.         32.         33.         34.        ]]

print(np.nanmean(a, axis=0, keepdims=True))
# [[21. 22. 33. 24.]]

print(np.nan_to_num(a, nan=np.nanmean(a, axis=0, keepdims=True)))
# [[11. 12. 33. 14.]
#  [21. 22. 33. 24.]
#  [31. 32. 33. 34.]]

print(np.nanmean(a, axis=1, keepdims=True))
# [[12.33333333]
#  [22.5       ]
#  [32.5       ]]

print(np.nan_to_num(a, nan=np.nanmean(a, axis=1, keepdims=True)))
# [[11.         12.         12.33333333 14.        ]
#  [21.         22.5        22.5        24.        ]
#  [31.         32.         33.         34.        ]]

source: numpy_nan_replace.py

If you specify an ndarray as the third argument (nan) in np.nan_to_num(), it will be broadcast to match the shape of the ndarray specified as the first argument.

NumPy: Broadcasting rules and examples

If keepdims is set to True in np.nanmean(), the resulting array is broadcast correctly. While keepdims=False (default) is fine for axis=0, it is less error-prone to always set keepdims=True regardless of the axis.

NumPy: Meaning of the axis parameter (0, 1, -1)

For versions before 1.17, where the nan argument is not implemented, use the following method to replace NaN with values other than 0.

Identify and replace `NaN` using `np.isnan()`

You can use np.isnan() to check if values in an ndarray are NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

source: numpy_nan_replace.py

With the result from np.isnan(), you can assign a specific value to replace NaN.

a[np.isnan(a)] = 0
print(a)
# [[11. 12.  0. 14.]
#  [21.  0.  0. 24.]
#  [31. 32. 33. 34.]]

source: numpy_nan_replace.py

You can also use np.nanmean() to replace NaN with the mean of the non-missing values.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')

a[np.isnan(a)] = np.nanmean(a)
print(a)
# [[11.         12.         23.55555556 14.        ]
#  [21.         23.55555556 23.55555556 24.        ]
#  [31.         32.         33.         34.        ]]

source: numpy_nan_replace.py

To replace with the mean value for each row or column, use np.where().

numpy.where(): Manipulate elements depending on conditions

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')

print(np.where(np.isnan(a), np.nanmean(a, axis=0, keepdims=True), a))
# [[11. 12. 33. 14.]
#  [21. 22. 33. 24.]
#  [31. 32. 33. 34.]]

print(np.where(np.isnan(a), np.nanmean(a, axis=1, keepdims=True), a))
# [[11.         12.         12.33333333 14.        ]
#  [21.         22.5        22.5        24.        ]
#  [31.         32.         33.         34.        ]]

source: numpy_nan_replace.py

NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan()

`NaN` (`np.nan`) in NumPy

Replace `NaN` using `np.genfromtxt()` with `filling_values`

Replace `NaN` using `np.nan_to_num()`

Identify and replace `NaN` using `np.isnan()`

Related Categories

Related Articles

NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan()

NaN (np.nan) in NumPy

Replace NaN using np.genfromtxt() with filling_values

Replace NaN using np.nan_to_num()

Identify and replace NaN using np.isnan()

Related Categories

Related Articles

`NaN` (`np.nan`) in NumPy

Replace `NaN` using `np.genfromtxt()` with `filling_values`

Replace `NaN` using `np.nan_to_num()`

Identify and replace `NaN` using `np.isnan()`