NumPy: Extract or delete elements, rows, and columns that satisfy the conditions
This article describes how to extract or delete elements, rows, and columns that satisfy the condition from the NumPy array ndarray
.
- Extract elements that satisfy the conditions
- Extract rows and columns that satisfy the conditions
- All elements satisfy the condition:
numpy.all()
- At least one element satisfies the condition:
numpy.any()
- All elements satisfy the condition:
- Delete elements, rows, and columns that satisfy the conditions
- Use
~
(NOT) - Use
numpy.delete()
andnumpy.where()
- Use
- Multiple conditions
See the following article for an example when ndarray
contains missing values NaN
.
If you want to replace or count an element that satisfies the conditions, see the following article.
- numpy.where(): Manipulate elements depending on conditions
- NumPy: Count values in an array with conditions
To extract rows and columns by slices or lists rather than by conditions, see the following article.
Extract elements that satisfy the conditions
If you want to extract elements that meet the condition, you can use ndarray[condition]
.
Even if the original ndarray
is a multidimensional array, a flattened one-dimensional array is returned.
import numpy as np
a = np.arange(12).reshape((3, 4))
print(a)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(a < 5)
# [[ True True True True]
# [ True False False False]
# [False False False False]]
print(a[a < 5])
# [0 1 2 3 4]
print(a < 10)
# [[ True True True True]
# [ True True True True]
# [ True True False False]]
print(a[a < 10])
# [0 1 2 3 4 5 6 7 8 9]
A new ndarray
is returned, and the original ndarray
is unchanged. The same is true for the following examples.
b = a[a < 10]
print(b)
# [0 1 2 3 4 5 6 7 8 9]
print(a)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
It is possible to calculate the sum, average, maximum value, minimum value, standard deviation, etc., of elements that satisfy the condition.
print(a[a < 5].sum())
# 10
print(a[a < 5].mean())
# 2.0
print(a[a < 5].max())
# 4
print(a[a < 10].min())
# 0
print(a[a < 10].std())
# 2.8722813232690143
Extract rows and columns that satisfy the conditions
In the example of extracting elements, a one-dimensional array is returned, but if you use np.all()
and np.any()
, you can extract rows and columns while keeping the original ndarray
dimension.
All elements satisfy the condition: numpy.all()
np.all()
is a function that returns True
when all elements of ndarray
passed to the first parameter are True
and returns False
otherwise.
If you specify the parameter axis
, it returns True
if all elements are True
for each axis. In the case of a two-dimensional array, the result is for columns when axis=0
and for rows when axis=1
.
print(a < 5)
# [[ True True True True]
# [ True False False False]
# [False False False False]]
print(np.all(a < 5))
# False
print(np.all(a < 5, axis=0))
# [False False False False]
print(np.all(a < 5, axis=1))
# [ True False False]
print(a < 10)
# [[ True True True True]
# [ True True True True]
# [ True True False False]]
print(np.all(a < 10, axis=0))
# [ True True False False]
print(np.all(a < 10, axis=1))
# [ True True False]
Rows and columns are extracted by giving each result to [rows, :]
or [:, columns]
. For [rows, :]
, the trailing , :
can be omitted.
print(a[:, np.all(a < 10, axis=0)])
# [[0 1]
# [4 5]
# [8 9]]
print(a[np.all(a < 10, axis=1), :])
# [[0 1 2 3]
# [4 5 6 7]]
print(a[np.all(a < 10, axis=1)])
# [[0 1 2 3]
# [4 5 6 7]]
If the condition is not met, an empty ndarray
is returned.
print(a[:, np.all(a < 5, axis=0)])
# []
Even if only one row or one column is extracted, the number of dimensions does not change.
print(a[np.all(a < 5, axis=1)])
# [[0 1 2 3]]
print(a[np.all(a < 5, axis=1)].ndim)
# 2
print(a[np.all(a < 5, axis=1)].shape)
# (1, 4)
At least one element satisfies the condition: numpy.any()
np.any()
is a function that returns True
when ndarray
passed to the first parameter contains at least one True
element, and returns False
otherwise.
If you specify the parameter axis
, it returns True
if at least one element is True
for each axis. In the case of a two-dimensional array, the result is for columns when axis=0
and for rows when axis=1
.
print(a < 5)
# [[ True True True True]
# [ True False False False]
# [False False False False]]
print(np.any(a < 5))
# True
print(np.any(a < 5, axis=0))
# [ True True True True]
print(np.any(a < 5, axis=1))
# [ True True False]
You can extract rows and columns that match the conditions in the same way as np.all()
.
print(a[:, np.any(a < 5, axis=0)])
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(a[np.any(a < 5, axis=1)])
# [[0 1 2 3]
# [4 5 6 7]]
Delete elements, rows, and columns that satisfy the conditions
If you want to delete elements, rows, or columns instead of extracting them depending on conditions, there are the following two methods.
Use ~
(NOT)
If you add the negation operator ~
to a condition, elements, rows, and columns that do not satisfy the condition are extracted. This is equivalent to deleting elements, rows, or columns that satisfy the condition.
print(a[~(a < 5)])
# [ 5 6 7 8 9 10 11]
print(a[:, np.all(a < 10, axis=0)])
# [[0 1]
# [4 5]
# [8 9]]
print(a[:, ~np.all(a < 10, axis=0)])
# [[ 2 3]
# [ 6 7]
# [10 11]]
print(a[np.any(a < 5, axis=1)])
# [[0 1 2 3]
# [4 5 6 7]]
print(a[~np.any(a < 5, axis=1)])
# [[ 8 9 10 11]]
Use numpy.delete()
and numpy.where()
Rows and columns can also be deleted using np.delete()
and np.where()
.
In np.delete()
, set the target ndarray
, the index to delete and the target axis.
In the case of a two-dimensional array, rows are deleted if axis=0
and columns are deleted if axis=1
.
print(a)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
print(np.delete(a, [0, 2], axis=0))
# [[4 5 6 7]]
print(np.delete(a, [0, 2], axis=1))
# [[ 1 3]
# [ 5 7]
# [ 9 11]]
See also the following article for np.delete()
.
np.where()
returns the index of the element that satisfies the condition.
In the case of a multidimensional array, a tuple of a list of indices (row number, column number) that satisfy the condition for each dimension (row, column) is returned.
print(a < 2)
# [[ True True False False]
# [False False False False]
# [False False False False]]
print(np.where(a < 2))
# (array([0, 0]), array([0, 1]))
print(np.where(a < 2)[0])
# [0 0]
print(np.where(a < 2)[1])
# [0 1]
See also the following article for np.where()
.
By combining these two functions, you can delete the rows and columns that satisfy the condition.
print(np.delete(a, np.where(a < 2)[0], axis=0))
# [[ 4 5 6 7]
# [ 8 9 10 11]]
print(np.delete(a, np.where(a < 2)[1], axis=1))
# [[ 2 3]
# [ 6 7]
# [10 11]]
print(a == 6)
# [[False False False False]
# [False False True False]
# [False False False False]]
print(np.where(a == 6))
# (array([1]), array([2]))
print(np.delete(a, np.where(a == 6)))
# [ 0 3 4 5 6 7 8 9 10 11]
print(np.delete(a, np.where(a == 6)[0], axis=0))
# [[ 0 1 2 3]
# [ 8 9 10 11]]
print(np.delete(a, np.where(a == 6)[1], axis=1))
# [[ 0 1 3]
# [ 4 5 7]
# [ 8 9 11]]
As in the example above, the rows and columns that have at least one element satisfying the condition are deleted. This is the same as using np.any()
.
Multiple conditions
If you want to combine multiple conditions, enclose each condition with ()
and use &
or |
.
print(a[(a < 10) & (a % 2 == 1)])
# [1 3 5 7 9]
print(a[np.any((a == 2) | (a == 10), axis=1)])
# [[ 0 1 2 3]
# [ 8 9 10 11]]
print(a[:, ~np.any((a == 2) | (a == 10), axis=0)])
# [[ 0 1 3]
# [ 4 5 7]
# [ 8 9 11]]