NumPy — Detailed Notes
<font face="Arial" size="5"> This canvas covers the requested NumPy topics in detail with clear explanations,
short code examples, common pitfalls, and quick tips.
1. Introduction to NumPy
What it is: NumPy (Numerical Python) is the fundamental package for numerical computing in Python. It
provides the ndarray — a fast, homogeneous N-dimensional array — plus a suite of vectorized
operations implemented in C for speed.
Why use NumPy for Data Analysis: - Efficient storage and fast computation for large numerical datasets. -
Vectorized operations (no Python loops) → concise and much faster code. - Foundation for Pandas, scikit-
learn, and many scientific libraries.
Key idea: Think of an ndarray as a multi-dimensional table of numbers with a fixed data type ( dtype ).
Minimal example:
import numpy as np
x = np.array([1, 2, 3]) # 1D array
y = np.array([[1,2,3],[4,5,6]]) # 2D array (matrix)
Tip: Always prefer NumPy operations over Python lists when doing numeric computations.
2. Creating Arrays
Common constructors & when to use them: - np.array(sequence) — create from Python lists/tuples
(useful for small, explicit arrays). - np.zeros(shape) / np.ones(shape) — initialize arrays for
placeholders. - np.arange(start, stop, step) — integer sequences (fast alternative to Python
range when you need array arithmetic). - np.linspace(start, stop, num) — evenly spaced values,
good for plotting or sampling. - np.eye(n) — identity matrix (useful in linear algebra). -
np.empty(shape) — allocate without initializing (faster, but contains garbage values — fill before use). -
np.fromfile , np.loadtxt , np.genfromtxt , np.load (npy) — load data from disk.
Examples:
1
np.zeros((3,4)) # 3x4 zeros matrix
np.arange(0, 10, 2) # [0,2,4,6,8]
np.linspace(0,1,5) # [0,.25,.5,.75,1]
Tip: Use the dtype argument to save memory (e.g., dtype=np.float32 for large datasets).
3. Array Attributes
Important attributes: - arr.shape — tuple of axis lengths, e.g., (rows, cols) for 2D. - arr.ndim
— number of axes (dimensions). - arr.size — total number of elements (product of shape ). -
arr.dtype — data type (e.g., int64 , float32 ). - arr.itemsize — bytes per element. -
arr.nbytes — total bytes ( size * itemsize ).
Why they matter: - Shape guides broadcasting and axis-based computations. - dtype affects precision
and memory — choose float32 or integer types intentionally. - nbytes helps estimate memory usage
for large arrays.
Example:
a = np.arange(12).reshape(3,4)
print(a.shape, a.ndim, a.size, a.dtype, a.nbytes)
Pitfall: Mistaking shape order — shape is (axis0, axis1, ...) where axis0 is the number of rows
in 2D arrays.
4. Indexing & Slicing
Basics: - arr[i] , arr[i, j] access elements (0-based indexing). - Slicing uses start:stop:step
syntax and returns views when possible (not copies): arr[1:4] , arr[:, 0] .
2D examples:
M = np.arange(12).reshape(3,4)
M[0, 2] # element in 1st row, 3rd col
M[1] # second row (1D slice)
M[:, 1] # second column
M[0:2, 1:3] # submatrix top-left 2x2 block
Views vs Copies: - Slicing returns a view. Modifying the slice changes the original.
2
r = np.arange(6)
s = r[1:4]
s[0] = 99 # r is modified too
- Use .copy() to get an independent array.
Tip: Use slices for in-place transformations to save memory, but be careful about unintentional side-effects.
5. Boolean Indexing and Filtering
Concept: Create a boolean mask from a condition and apply it to select elements or rows.
Examples:
arr = np.array([10, 15, 20, 5, 30])
mask = arr > 15 # [False, False, True, False, True]
arr[mask] # [20, 30]
# Combine conditions
arr[(arr > 10) & (arr <= 30)]
2D filtering (rows matching a condition):
A = np.array([[1,20],[2,15],[3,30]])
rows = A[:,1] > 18 # check second column
A[rows] # rows where second column > 18
Use cases in Data Analysis: - Filtering outliers (e.g., data[data < threshold] ) - Conditional
transformations (e.g., set negative values to 0: arr[arr < 0] = 0 )
Pitfall: Chaining boolean operations requires parentheses due to operator precedence: (arr > a) &
(arr < b) not arr > a & arr < b .
6. Fancy Indexing
Definition: Indexing using integer arrays/lists; returns a new array (copy), not a view.
Examples:
3
x = np.array([5,10,15,20,25])
idx = [0, 2, 4]
x[idx] # [5, 15, 25]
# 2D fancy indexing
M = np.arange(16).reshape(4,4)
rows = [0,2]
cols = [1,3]
M[rows][:, cols] # careful — this uses a copy from first op
# Correct selection with advanced indexing
M[np.ix_(rows, cols)] # safer; returns 2x2 submatrix from specified rows/cols
Key differences from slicing: - Fancy indexing returns copies (safe to modify without affecting original). -
The order of indices determines output shape.
When to use: - Select specific rows/cols by index (non-contiguous selections) - Re-order data arbitrarily
arr[[2,0,1]]
Performance note: Fancy indexing creates copies — avoid in tight memory loops for very large arrays.
Summary & Suggested Exercises
• Convert a CSV column to NumPy, compute mean/std, and standardize values.
• Given a 2D array of sales (rows=days, cols=stores): filter days where total sales < threshold and
extract top N stores using fancy indexing.
</font>