0% found this document useful (0 votes)
27 views4 pages

Num Py Detailed - Intro To Indexing & Filtering

Uploaded by

SIVAVEDATHRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Num Py Detailed - Intro To Indexing & Filtering

Uploaded by

SIVAVEDATHRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

NumPy — Detailed Notes

<font face="Arial" size="5"> This canvas covers the requested NumPy topics in detail with clear explanations,
short code examples, common pitfalls, and quick tips.

1. Introduction to NumPy
What it is: NumPy (Numerical Python) is the fundamental package for numerical computing in Python. It
provides the ndarray — a fast, homogeneous N-dimensional array — plus a suite of vectorized
operations implemented in C for speed.

Why use NumPy for Data Analysis: - Efficient storage and fast computation for large numerical datasets. -
Vectorized operations (no Python loops) → concise and much faster code. - Foundation for Pandas, scikit-
learn, and many scientific libraries.

Key idea: Think of an ndarray as a multi-dimensional table of numbers with a fixed data type ( dtype ).

Minimal example:

import numpy as np
x = np.array([1, 2, 3]) # 1D array
y = np.array([[1,2,3],[4,5,6]]) # 2D array (matrix)

Tip: Always prefer NumPy operations over Python lists when doing numeric computations.

2. Creating Arrays
Common constructors & when to use them: - np.array(sequence) — create from Python lists/tuples
(useful for small, explicit arrays). - np.zeros(shape) / np.ones(shape) — initialize arrays for
placeholders. - np.arange(start, stop, step) — integer sequences (fast alternative to Python
range when you need array arithmetic). - np.linspace(start, stop, num) — evenly spaced values,
good for plotting or sampling. - np.eye(n) — identity matrix (useful in linear algebra). -
np.empty(shape) — allocate without initializing (faster, but contains garbage values — fill before use). -
np.fromfile , np.loadtxt , np.genfromtxt , np.load (npy) — load data from disk.

Examples:

1
np.zeros((3,4)) # 3x4 zeros matrix
np.arange(0, 10, 2) # [0,2,4,6,8]
np.linspace(0,1,5) # [0,.25,.5,.75,1]

Tip: Use the dtype argument to save memory (e.g., dtype=np.float32 for large datasets).

3. Array Attributes
Important attributes: - arr.shape — tuple of axis lengths, e.g., (rows, cols) for 2D. - arr.ndim
— number of axes (dimensions). - arr.size — total number of elements (product of shape ). -
arr.dtype — data type (e.g., int64 , float32 ). - arr.itemsize — bytes per element. -
arr.nbytes — total bytes ( size * itemsize ).

Why they matter: - Shape guides broadcasting and axis-based computations. - dtype affects precision
and memory — choose float32 or integer types intentionally. - nbytes helps estimate memory usage
for large arrays.

Example:

a = np.arange(12).reshape(3,4)
print(a.shape, a.ndim, a.size, a.dtype, a.nbytes)

Pitfall: Mistaking shape order — shape is (axis0, axis1, ...) where axis0 is the number of rows
in 2D arrays.

4. Indexing & Slicing


Basics: - arr[i] , arr[i, j] access elements (0-based indexing). - Slicing uses start:stop:step
syntax and returns views when possible (not copies): arr[1:4] , arr[:, 0] .

2D examples:

M = np.arange(12).reshape(3,4)
M[0, 2] # element in 1st row, 3rd col
M[1] # second row (1D slice)
M[:, 1] # second column
M[0:2, 1:3] # submatrix top-left 2x2 block

Views vs Copies: - Slicing returns a view. Modifying the slice changes the original.

2
r = np.arange(6)
s = r[1:4]
s[0] = 99 # r is modified too

- Use .copy() to get an independent array.

Tip: Use slices for in-place transformations to save memory, but be careful about unintentional side-effects.

5. Boolean Indexing and Filtering


Concept: Create a boolean mask from a condition and apply it to select elements or rows.

Examples:

arr = np.array([10, 15, 20, 5, 30])


mask = arr > 15 # [False, False, True, False, True]
arr[mask] # [20, 30]
# Combine conditions
arr[(arr > 10) & (arr <= 30)]

2D filtering (rows matching a condition):

A = np.array([[1,20],[2,15],[3,30]])
rows = A[:,1] > 18 # check second column
A[rows] # rows where second column > 18

Use cases in Data Analysis: - Filtering outliers (e.g., data[data < threshold] ) - Conditional
transformations (e.g., set negative values to 0: arr[arr < 0] = 0 )

Pitfall: Chaining boolean operations requires parentheses due to operator precedence: (arr > a) &
(arr < b) not arr > a & arr < b .

6. Fancy Indexing
Definition: Indexing using integer arrays/lists; returns a new array (copy), not a view.

Examples:

3
x = np.array([5,10,15,20,25])
idx = [0, 2, 4]
x[idx] # [5, 15, 25]
# 2D fancy indexing
M = np.arange(16).reshape(4,4)
rows = [0,2]
cols = [1,3]
M[rows][:, cols] # careful — this uses a copy from first op
# Correct selection with advanced indexing
M[np.ix_(rows, cols)] # safer; returns 2x2 submatrix from specified rows/cols

Key differences from slicing: - Fancy indexing returns copies (safe to modify without affecting original). -
The order of indices determines output shape.

When to use: - Select specific rows/cols by index (non-contiguous selections) - Re-order data arbitrarily
arr[[2,0,1]]

Performance note: Fancy indexing creates copies — avoid in tight memory loops for very large arrays.

Summary & Suggested Exercises


• Convert a CSV column to NumPy, compute mean/std, and standardize values.
• Given a 2D array of sales (rows=days, cols=stores): filter days where total sales < threshold and
extract top N stores using fancy indexing.

</font>

You might also like