0% found this document useful (0 votes)
2 views53 pages

Unit 3_Numpy_VP

Unit 3 covers the basics of NumPy, a fundamental library for numerical computing in Python, focusing on arrays, vectorized computation, and basic operations. It explains how to install and import NumPy, create arrays, perform element-wise operations, and utilize aggregation functions. Additionally, it introduces Pandas, highlighting its data structures, DataFrame and Series, and common tasks for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views53 pages

Unit 3_Numpy_VP

Unit 3 covers the basics of NumPy, a fundamental library for numerical computing in Python, focusing on arrays, vectorized computation, and basic operations. It explains how to install and import NumPy, create arrays, perform element-wise operations, and utilize aggregation functions. Additionally, it introduces Pandas, highlighting its data structures, DataFrame and Series, and common tasks for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Unit 3: Basics of Numpy

21BCA2T452 : Python Programming

Prof. Vishnu Priya P M


Assistant Professor Dept. of Computer
Science
Kristu Jayanti College,
Autonomous
(Reaccredited A++ Grade by NAAC with CGPA 3.78/4)
Bengaluru – 560077, India
NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION

NumPy (Numerical Python) is a fundamental library in Python for numerical and


scientific computing. It provides support for arrays (multi-dimensional,
homogeneous data structures) and a wide range of mathematical functions to
perform vectorized computations efficiently.
Installing NumPy

Before using NumPy, you need to make sure it's installed. You can install it using
pip:

pip install numpy


VISHNU PRIYA P M 2
Importing NumPy
To use NumPy in your Python code, you should import it:

import numpy as np
By convention, it's common to import NumPy as np for brevity.

Why Use Arrays?


Arrays are more efficient than lists when performing operations. For example, if you
want to add 2 to every element in the list, you would need a loop in plain Python. But
with NumPy, you can do this in a single line:

arr = np.array([1, 2, 3, 4, 5])


new_arr = arr + 2 # Adds 2 to every element in the array

print(new_arr)
Output: [3 4 5 6 7]
VISHNU PRIYA P M 3
Creating NumPy Arrays
You can create NumPy arrays using various methods:

1. From Python Lists:

arr = np.array([1, 2, 3, 4, 5])

2. Using NumPy Functions:

zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements


ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values
between 0 and 1

3. Using NumPy's Range Function:

range_arr
VISHNU PRIYA P M
= np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8] 4
BASIC ARRAY OPERATIONS

Once you have NumPy arrays, you can perform various operations on them:

1. Element-wise Operations:

NumPy allows you to perform element-wise operations, like addition, subtraction,


multiplication, and division:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]

VISHNU PRIYA P M 5
2. Indexing and Slicing:

Indexing means accessing a specific element in an array by its position


(index). In NumPy, indices start from 0.
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])

VISHNU PRIYA P M 6
Slicing:Slicing allows you to access a range or subset of elements from
an array. It is done using the syntax arr[start:end], where start is the
index where the slice begins (inclusive), and end is where it stops
(exclusive).

arr = np.array([10, 20, 30, 40, 50])

# Getting a slice of elements from index 1 to 3 (exclusive of 3)


print(arr[1:3]) # Output: [20 30]

# Getting a slice from the start till the third element


print(arr[:3]) # Output: [10 20 30]

# Getting a slice from index 2 to the end of the array


print(arr[2:]) # Output: [30 40 50]
VISHNU PRIYA P M 7
Negative Indexing:
You can also use negative indices to access elements from the end of the array. For
example, -1 refers to the last element, -2 refers to the second last element, and so
on.
Example:

arr = np.array([10, 20, 30, 40, 50])

# Accessing the last element


print(arr[-1]) # Output: 50

# Accessing the second last element


print(arr[-2]) # Output: 40

VISHNU PRIYA P M 8
Slicing with Steps:You can also specify a step value, which tells how
many elements to skip in the slice. The syntax is arr[start:end:step].

Example:

arr = np.array([10, 20, 30, 40, 50, 60])

# Getting every second element from index 1 to 5


print(arr[1:5:2]) # Output: [20 40] •The array is [10, 20, 30, 40, 50, 60].
•Index positions: [0, 1, 2, 3, 4, 5].
•The slice starts at index 1, which is
# Reversing the array using negative step
print(arr[::-1]) # Output: [60 50 40 30 2020.
•210]
is the step value, which means
"skip every second element.
•It skips the next element and picks
the element at index 3, which is 40.
VISHNU PRIYA P M •The slice stops before reaching 9

index 5.
3. Array Shape and Reshaping:
The shape of an array tells us how many elements it contains along each
dimension (or axis). You can check the shape of an array using
the .shape attribute.

You can check and change the shape of NumPy arrays:


arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)

Reshaping:
Reshaping allows you to change the shape of an array without changing
its data. You can convert a 1D array to a 2D array, or a 2D array to a 3D
array, etc., as long as the total number of elements stays the same.
Example:
VISHNU PRIYA P M 10
# Creating a 1D array with 6 elements
arr = np.array([1, 2, 3, 4, 5, 6])

# Reshaping the 1D array into a 2D array (2 rows, 3 columns)


reshaped_arr = arr.reshape(2, 3)

print(reshaped_arr)

Reshape Rules:
When reshaping an array, the new shape must contain the same total number of
elements as the original array. For example, if you have an array with 12 elements,
you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3 rows x 4
columns)A 4x3 array (4 rows x 3 columns)
Example

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

# Reshaping into 3 rows and 4 columns


VISHNU PRIYA P M 11

reshaped_arr = arr.reshape(3, 4)
print(reshaped_arr)
Flattening an Array:If you want to convert a multi-dimensional array back into
a 1D array, you can flatten it using the .flatten() method.

Example

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Flattening the 2D array into a 1D array


flat_arr = arr_2d.flatten()

print(flat_arr)

O/P

[1 2 3 4 5 6]

Shape: Tells you the dimensions of an array (rows, columns, etc.).


VISHNU PRIYA P M 12
Reshaping: Lets you change the shape of an array while keeping the same number
of elements.
Aggregation Functions:

Agregation functions are used to perform calculations on an entire array or along a specific
axis (e.g., summing all elements, finding the maximum, etc.). These functions are essential
for data analysis and numerical computations.

Common Aggregation Functions:Here are some of the most commonly used aggregation
functions in NumPy:
1. Sum:The sum() function adds all the elements of an array.
2. Mean:The mean() function calculates the average of the elements.
3. Maximum and Minimum:max() gives the maximum value in the array.min() gives the
minimum value in the array.
4. Product:The prod() function returns the product of all elements in the array (i.e.,
multiplies all elements together).
5. Standard Deviation and Variance:std() calculates the standard deviation (how spread out
the numbers are).var() calculates the variance (the square of the standard deviation).
6. Cumulative Sum and Product:cumsum() gives the cumulative sum (the sum of the
elements up to each index).cumprod() gives the cumulative product (the product of
elements up to each index).
VISHNU PRIYA P M 13

NumPy provides functions to compute statistics on arrays:


arr = np.array([1, 2, 3, 4, 5])
VECTORIZED COMPUTATION

Vectorized computation in Python refers to performing operations on entire arrays or


sequences of data without the need for explicit loops. This approach leverages highly
optimized, low-level code to achieve faster and more efficient computations. The
primary library for vectorized computation in Python is NumPy.

Traditional Loop-Based Computation


In traditional Python programming, you might use explicit loops to perform
operations on arrays or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
VISHNU PRIYA P M 14
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
Vectorized Computation with NumPy

NumPy allows you to perform operations on entire arrays, making code more concise and
efficient. Here's how you can achieve the same result using NumPy:

import numpy as np

# Using NumPy for element-wise addition


arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2

# Result: array([5, 7, 9])

VISHNU PRIYA P M 15
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two
primary data structures: the DataFrame and the Series. These data structures are
designed to handle structured data, making it easier to work with datasets in a tabular
format.
DataFrame:

 A DataFrame is a 2-dimensional, labeled data structure that resembles a


spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type
(e.g., integers, floats, strings, or even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is
VISHNU PRIYA P M 16

a column.
Here's a basic example of how to create a DataFrame using
Pandas:
import pandas as pd

# Creating a DataFrame from a dictionary of data


data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

# Displaying the DataFrame


print(df)
Importing pandas: import pandas as pd brings in the pandas
library so you can use its features.
Creating Data: A dictionary called data holds your
information.
DataFrame: pd.DataFrame(data) converts the dictionary into
a DataFrame.
VISHNU PRIYA P M Displaying Data: print(df) shows the table. 17
Series:

 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional
data.
Key characteristics of a Pandas Series:

 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces
homogeneity, meaning all the data within a Series must be of the same data type. For
example, if you create a Series with integer values, all values within that Series will be
integers.

 Labeled Data: Series have two parts: the data itself and an associated index. The index
provides labels or names for each data point in the Series. By default, Series have a numeric
index starting from 0, but you can specify custom labels if needed.
VISHNU PRIYA P M 18
 Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but
does not have columns or rows like a DataFrame.
import pandas as pd
0 10
# Create a Series from a list 1 20
data = [10, 20, 30, 40, 50] 2 30
series = pd.Series(data) 3 40
4 50
# Display the Series dtype: int64
print(series)

VISHNU PRIYA P M 19
Some common tasks you can perform with Pandas:

 Data Loading: Pandas can read data from various sources, including CSV files, Excel
spreadsheets, SQL databases, and more.

 Data Cleaning: You can clean and preprocess data by handling missing values, removing
duplicates, and transforming data types.

 Data Selection: Easily select specific rows and columns of interest using various indexing
techniques.

 Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data
based on specific criteria.

 Data Visualization: You can use Pandas in conjunction with visualization libraries like
Matplotlib and Seaborn to create informative plots and charts.

VISHNU PRIYA P M 20
DataFrame

A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially


heterogeneous tabular data structure provided by the popular library called Pandas. It is a
fundamental data structure for data manipulation and analysis in Python.

Here's how you can work with DataFrames in Python using Pandas:

1. Import Pandas:
First, you need to import the Pandas library.

import pandas as pd

2. Creating a DataFrame:
You can create a DataFrame in several ways. Here
are a few common methods:

From a dictionary:

data = {'Column1': [value1, value2, ...],


VISHNU PRIYA P M 21
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
• From a list of lists:

data = [[value1, value2],


[value3, value4]]
df = pd.DataFrame(data, columns=['Column1',
'Column2'])

• From a CSV file:

df = pd.read_csv('file.csv')

3. Viewing Data:
You can use various methods to view and explore your DataFrame:

df.head(): Displays the first few rows of the DataFrame.


df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
VISHNU PRIYA P M 22
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For
example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple
columns
df[df['Column1'] > 5] # Filter rows based on a
condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending
rows. For example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row

VISHNU PRIYA P M 23
6. Data Analysis:
Pandas provides various functions for data
analysis, such as describe(), groupby(), agg(), and
more.

7. Saving Data:
You can save the DataFrame to a CSV file or other
df.to_csv('output.csv',
formats: index=False)

VISHNU PRIYA P M 24
INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING

In Pandas, the Index object is a fundamental component of both Series and


DataFrame data structures. It provides the labels or names for the rows or columns of
your data. You can use indexing, selection, and filtering techniques with these indexes
to access specific data points or subsets of your data. Here's how you can work with
index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can
use .loc[] for label-based indexing and .iloc[] for integer-based indexing.

• Label-based indexing:

df.loc['label'] # Access a specific row by its label


df.loc['label',
VISHNU PRIYA P M 'column_name'] # Access a specific 25
element by label and column name
• Integer-based indexing:

df.iloc[0] # Access the first row


df.iloc[0, 1] # Access an element by row and
column index

2. Selection:
You can use various methods to select specific data based on conditions or criteria.

• Select rows based on a condition:

df[df['Column'] > 5] # Select rows where 'Column' is greater than 5

• Select rows by multiple conditions:

df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10

VISHNU PRIYA P M 26
3. Filtering:
Filtering allows you to create a boolean mask based on a
condition and then apply that mask to your DataFrame to
select rows meeting the condition.

Create a boolean mask:

condition = df['Column'] > 5


Apply the mask to the DataFrame:

filtered_df = df[condition]
VISHNU PRIYA P M 27
A boolean mask is like a checklist that goes through each row in your DataFrame and
marks whether it meets the condition (True) or not (False).
Boolean Mask Example:
Meets Condition?
Name Age Score
(Age > 25)
Alice 24 85 False
Bob 27 90 True
Charlie 22 88 False
David 32 95 True

4. Setting a New Index:


You can set a specific column as the index of your DataFrame using the .set_index() method.

df.set_index('Column_Name', inplace=True)

VISHNU PRIYA P M 28
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you
can use the .reset_index() method.

df.reset_index(inplace=True)

6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex
hierarchical data structures.

df.set_index(['Index1', 'Index2'], inplace=True)

Index objects in Pandas are versatile and powerful for working with data because
they enable you to access and manipulate your data in various ways, whether it's for
data retrieval, filtering, or restructuring.

VISHNU PRIYA P M 29
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed
between Series and DataFrames when they have different shapes or indices. Pandas
automatically aligns data based on the labels of the objects involved in the operation, which
ensures that the result of the operation maintains data integrity and is aligned correctly. Here are
some key aspects of arithmetic and data alignment in Pandas:

1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division)
between two Series or DataFrames, Pandas aligns the data based on their labels (index or column
names). It aligns the data based on common labels and performs the operation only on matching
labels.

series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])


series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
VISHNU PRIYA P M 30

result = series1 + series2


In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing
values.

3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The
alignment occurs both for rows (based on the index) and columns (based on column names).

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])


df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])

result = df1 + df2


In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in
both df1 and df2.

4. Handling Missing Data:


You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to
remove rows or columns with missing data.
VISHNU PRIYA P M 31

result_filled = result.fillna(0) # Replace NaN with 0


result_dropped = result.dropna() # Remove rows or columns with NaN values
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts
the scalar to match the shape of the Series.

series = pd.Series([1, 2, 3])


scalar = 2

result = series * scalar


In this example, result will be a Series with values [2, 4, 6].

Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and
allows you to work with datasets of different shapes without needing to manually align them. It
ensures that operations are performed in a way that maintains the integrity and structure of
your data.

VISHNU PRIYA P M 32
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays.
However, unlike Pandas, NumPy is primarily focused on numerical computations with
homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment
work in NumPy:

Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on
the shape of the arrays being operated on. This means that if you perform an operation
between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to
match the shape of the larger one, element-wise.

import numpy as np

arr1 = np.array([1, 2, 3])


arr2 =PRIYA
VISHNU np.array([4,
PM 5]) 33

result = arr1 + arr2


Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:

If the arrays have a different number of dimensions, pad the smaller shape with ones on the left
side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them
is 1, they are compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast
together" error.

Handling Missing Data:


In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations
between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending
on whether broadcasting is possible.

Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in
the resulting array is the result of applying the operation to the corresponding elements in the
input arrays.
VISHNU PRIYA P M 34
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
APPLYING FUNCTIONS AND MAPPING

In NumPy, you can apply functions and perform element-wise operations on arrays using various
techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function.
Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview
of these approaches:

Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply
functions to entire arrays or elements of arrays without the need for explicit loops. NumPy
provides built-in functions that can be applied element-wise to arrays.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Applying
VISHNU PRIYA P Ma function element-wise 35

result = np.square(arr) # Square each element


In this example, the np.square() function is applied element-wise to the arr array.
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a
multi-dimensional array. This is useful when you want to apply a function to each row or column
of a 2D array.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Apply a function along the rows (axis=1)


def sum_of_row(row):
return np.sum(row)

result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)


In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.

VISHNU PRIYA P M 36
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which
can then be applied element-wise to NumPy arrays.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Define a Python function


def my_function(x):
return x * 2

# Create a vectorized version of the function


vectorized_func = np.vectorize(my_function)

# Apply the vectorized function to the array


result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
VISHNU PRIYA P M 37
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Define a Python function


def my_function(x):
return x * 2

# Create a vectorized version of the function


vectorized_func = np.vectorize(my_function)

# Map the function to each element


result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more
complex mapping operations.
VISHNU PRIYA P M 38
These methods allow you to apply functions and perform mapping operations
efficiently on NumPy arrays, making it a powerful library for numerical and scientific
computing tasks.
SORTING AND RANKING

Sorting and ranking are common data manipulation operations in data analysis and are widely
supported in Python through libraries like NumPy and Pandas. These operations help organize
data in a desired order or rank elements based on specific criteria. Here's how to perform
sorting and ranking in both libraries:

Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.

np.sort(): This function returns a new sorted array without modifying the original array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

sorted_arr
VISHNU PRIYA P M = np.sort(arr) 39
np.argsort(): This function returns the indices that would sort the array. You can use these
indices to sort the original array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify
the column(s) to sort by and the sorting order.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],


'Age': [25, 30, 22, 35]}

df = pd.DataFrame(data)
VISHNU PRIYA P M 40

# Sort by 'Age' column in ascending order


sorted_df = df.sort_values(by='Age', ascending=True)
Ranking in NumPy:

NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking
of elements. You can then use these rankings to create a ranked array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and
how to handle ties (e.g., assigning the average rank to tied values).

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],


'Age': [25, 30, 22, 30]}

df = pd.DataFrame(data)
VISHNU PRIYA P M 41

# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS

1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.

import numpy as np

data = np.array([25, 30, 22, 35, 28])

mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)

VISHNU PRIYA P M 42
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.

percentile_25 = np.percentile(data, 25)


percentile_75 = np.percentile(data, 75)

3. Correlation and Covariance:


You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().

correlation_matrix = np.corrcoef(data1, data2)


covariance_matrix = np.cov(data1, data2)

VISHNU PRIYA P M 43
CORRELATION AND COVARIANCE

In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef()
and np.cov() functions, respectively. These functions are useful for analyzing relationships and
dependencies between variables. Here's how to use them:

Computing Correlation Coefficient (Correlation):


The correlation coefficient measures the strength and direction of a linear relationship between
two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation),
with 0 indicating no linear correlation.

import numpy as np

# Create two arrays representing variables


x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
VISHNU PRIYA P M 44
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)

# The correlation coefficient is in the (0, 1) element of the matrix


correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between
x and y.

VISHNU PRIYA P M 45
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values
indicate a positive relationship (both variables increase or decrease together), while negative
values indicate an inverse relationship (one variable increases as the other decreases).

import numpy as np

# Create two arrays representing variables


x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])

# Compute the covariance between x and y


covariance_matrix = np.cov(x, y)

# The covariance is in the (0, 1) element of the matrix


covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.

Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute
correlations
VISHNU PRIYA P M and covariances for multiple variables simultaneously. For example, if you have a 46
dataset with multiple columns, you can compute the correlation matrix or covariance matrix for
all pairs of variables.
HANDLING MISSING DATA

Handling missing data in NumPy is an important aspect of data analysis and manipulation.
NumPy provides several ways to work with missing or undefined values, typically represented
as NaN (Not-a-Number). Here are some common techniques for handling missing data in
NumPy:

Using np.nan: NumPy represents missing data using np.nan. You can create arrays with missing
values like this:

import numpy as np

arr = np.array([1.0, 2.0, np.nan, 4.0])


Now, arr contains a missing value represented as np.nan.

VISHNU PRIYA P M 47
Checking for Missing Data: You can check for missing values using the np.isnan() function. For
example:

np.isnan(arr) # Returns a boolean array indicating which elements are NaN.


Filtering Missing Data: To filter out missing values from an array, you can use boolean indexing.
For example:

arr[~np.isnan(arr)] # Returns
Replacing Missing Data: anreplace
You can array without
missingNaN values.
values with a specific value using
np.nan_to_num() or np.nanmean(). For example:

arr[np.isnan(arr)] = 0 # Replace NaN with 0


Or, to replace NaN with the mean of the non-missing values:

mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean

VISHNU PRIYA P M 48
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing
values. You can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore
NaN values when computing the result.

Interpolation: If you have a time series or ordered data, you can use interpolation methods to
fill missing values. NumPy provides functions like np.interp() for this purpose.

Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with
missing data more explicitly by creating a mask that specifies which values are missing. This
can be useful for certain computations.

Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional


arrays, you can apply the above techniques along a specific axis or use functions like
np.isnan() with the axis parameter to handle missing data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends
on your data analysis goals and the context of your data. Some methods may be
more appropriate than others, depending on your use case.

VISHNU PRIYA P M 49
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with
multi-dimensional arrays where each dimension has multiple levels or labels. This is particularly
useful when you want to represent higher-dimensional data with more complex hierarchical
structures.

You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:

import numpy as np

# Create a MultiIndex with two levels


index = np.array([['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
multi_index = np.vstack((index, ['X', 'Y', 'X', 'Y'])).T

# Create a random data array


data = np.random.rand(4, 3)
VISHNU PRIYA P M 50

# Create a DataFrame with MultiIndex


In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1',
'2' as the second level. Then, we've created a DataFrame with this MultiIndex and some random
data.

You can access data from this DataFrame using hierarchical indexing. For example:

# Accessing data using hierarchical indexing


value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # Access Value1 for 'A', 1, 'X'

VISHNU PRIYA P M 51
Some common operations with hierarchical indexing include:

Slicing: You can perform slices at each level of the index, allowing you to select specific subsets
of the data.

Stacking and Unstacking: You can stack or unstack levels to convert between a wide and long
format, which can be useful for different types of analyses.

Swapping Levels: You can swap levels to change the order of the levels in the index.

Grouping and Aggregating: You can group data based on levels of the index and perform
aggregation functions like mean, sum, etc.

Reordering Levels: You can change the order of levels in the index.

Resetting Index: You can reset the index to move the hierarchical index levels back to columns.

VISHNU PRIYA P M 52
Hierarchical indexing is especially valuable when dealing with multi-dimensional
data, such as panel data or data with multiple categorical variables. It allows for
more expressive data organization and manipulation. You can also use the
pd.MultiIndex class from the pandas library, which provides more advanced
functionality for working with hierarchical data structures, including various
methods for creating and manipulating MultiIndex objects.

VISHNU PRIYA P M 53

You might also like