0% found this document useful (0 votes)

2 views53 pages

Unit 3_Numpy_VP

Unit 3 covers the basics of NumPy, a fundamental library for numerical computing in Python, focusing on arrays, vectorized computation, and basic operations. It explains how to install and import NumPy, create arrays, perform element-wise operations, and utilize aggregation functions. Additionally, it introduces Pandas, highlighting its data structures, DataFrame and Series, and common tasks for data manipulation and analysis.

Uploaded by

ashupersonal12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views53 pages

Unit 3_Numpy_VP

Uploaded by

ashupersonal12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Unit 3: Basics of Numpy

21BCA2T452 : Python Programming

Prof. Vishnu Priya P M

Assistant Professor Dept. of Computer
Science
Kristu Jayanti College,
Autonomous
(Reaccredited A++ Grade by NAAC with CGPA 3.78/4)
Bengaluru – 560077, India
NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION

NumPy (Numerical Python) is a fundamental library in Python for numerical and

scientific computing. It provides support for arrays (multi-dimensional,
homogeneous data structures) and a wide range of mathematical functions to
perform vectorized computations efficiently.
Installing NumPy

Before using NumPy, you need to make sure it's installed. You can install it using
pip:

pip install numpy

VISHNU PRIYA P M 2
Importing NumPy
To use NumPy in your Python code, you should import it:

import numpy as np
By convention, it's common to import NumPy as np for brevity.

Why Use Arrays?

Arrays are more efficient than lists when performing operations. For example, if you
want to add 2 to every element in the list, you would need a loop in plain Python. But
with NumPy, you can do this in a single line:

arr = np.array([1, 2, 3, 4, 5])

new_arr = arr + 2 # Adds 2 to every element in the array

print(new_arr)
Output: [3 4 5 6 7]
VISHNU PRIYA P M 3
Creating NumPy Arrays
You can create NumPy arrays using various methods:

1. From Python Lists:

arr = np.array([1, 2, 3, 4, 5])

2. Using NumPy Functions:

zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements

ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values
between 0 and 1

3. Using NumPy's Range Function:

range_arr
VISHNU PRIYA P M
= np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8] 4
BASIC ARRAY OPERATIONS

Once you have NumPy arrays, you can perform various operations on them:

1. Element-wise Operations:

NumPy allows you to perform element-wise operations, like addition, subtraction,

multiplication, and division:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]

VISHNU PRIYA P M 5
2. Indexing and Slicing:

Indexing means accessing a specific element in an array by its position

(index). In NumPy, indices start from 0.
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])

VISHNU PRIYA P M 6
Slicing:Slicing allows you to access a range or subset of elements from
an array. It is done using the syntax arr[start:end], where start is the
index where the slice begins (inclusive), and end is where it stops
(exclusive).

arr = np.array([10, 20, 30, 40, 50])

# Getting a slice of elements from index 1 to 3 (exclusive of 3)

print(arr[1:3]) # Output: [20 30]

# Getting a slice from the start till the third element

print(arr[:3]) # Output: [10 20 30]

# Getting a slice from index 2 to the end of the array

print(arr[2:]) # Output: [30 40 50]
VISHNU PRIYA P M 7
Negative Indexing:
You can also use negative indices to access elements from the end of the array. For
example, -1 refers to the last element, -2 refers to the second last element, and so
on.
Example:

arr = np.array([10, 20, 30, 40, 50])

# Accessing the last element

print(arr[-1]) # Output: 50

# Accessing the second last element

print(arr[-2]) # Output: 40

VISHNU PRIYA P M 8
Slicing with Steps:You can also specify a step value, which tells how
many elements to skip in the slice. The syntax is arr[start:end:step].

Example:

arr = np.array([10, 20, 30, 40, 50, 60])

# Getting every second element from index 1 to 5

print(arr[1:5:2]) # Output: [20 40] •The array is [10, 20, 30, 40, 50, 60].
•Index positions: [0, 1, 2, 3, 4, 5].
•The slice starts at index 1, which is
# Reversing the array using negative step
print(arr[::-1]) # Output: [60 50 40 30 2020.
•210]
is the step value, which means
"skip every second element.
•It skips the next element and picks
the element at index 3, which is 40.
VISHNU PRIYA P M •The slice stops before reaching 9

index 5.
3. Array Shape and Reshaping:
The shape of an array tells us how many elements it contains along each
dimension (or axis). You can check the shape of an array using
the .shape attribute.

You can check and change the shape of NumPy arrays:

arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)

Reshaping:
Reshaping allows you to change the shape of an array without changing
its data. You can convert a 1D array to a 2D array, or a 2D array to a 3D
array, etc., as long as the total number of elements stays the same.
Example:
VISHNU PRIYA P M 10
# Creating a 1D array with 6 elements
arr = np.array([1, 2, 3, 4, 5, 6])

# Reshaping the 1D array into a 2D array (2 rows, 3 columns)

reshaped_arr = arr.reshape(2, 3)

print(reshaped_arr)

Reshape Rules:
When reshaping an array, the new shape must contain the same total number of
elements as the original array. For example, if you have an array with 12 elements,
you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3 rows x 4
columns)A 4x3 array (4 rows x 3 columns)
Example

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

# Reshaping into 3 rows and 4 columns

VISHNU PRIYA P M 11

reshaped_arr = arr.reshape(3, 4)
print(reshaped_arr)
Flattening an Array:If you want to convert a multi-dimensional array back into
a 1D array, you can flatten it using the .flatten() method.

Example

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Flattening the 2D array into a 1D array

flat_arr = arr_2d.flatten()

print(flat_arr)

O/P

[1 2 3 4 5 6]

Shape: Tells you the dimensions of an array (rows, columns, etc.).

VISHNU PRIYA P M 12
Reshaping: Lets you change the shape of an array while keeping the same number
of elements.
Aggregation Functions:

Agregation functions are used to perform calculations on an entire array or along a specific
axis (e.g., summing all elements, finding the maximum, etc.). These functions are essential
for data analysis and numerical computations.

Common Aggregation Functions:Here are some of the most commonly used aggregation
functions in NumPy:
1. Sum:The sum() function adds all the elements of an array.
2. Mean:The mean() function calculates the average of the elements.
3. Maximum and Minimum:max() gives the maximum value in the array.min() gives the
minimum value in the array.
4. Product:The prod() function returns the product of all elements in the array (i.e.,
multiplies all elements together).
5. Standard Deviation and Variance:std() calculates the standard deviation (how spread out
the numbers are).var() calculates the variance (the square of the standard deviation).
6. Cumulative Sum and Product:cumsum() gives the cumulative sum (the sum of the
elements up to each index).cumprod() gives the cumulative product (the product of
elements up to each index).
VISHNU PRIYA P M 13

NumPy provides functions to compute statistics on arrays:

arr = np.array([1, 2, 3, 4, 5])
VECTORIZED COMPUTATION

Vectorized computation in Python refers to performing operations on entire arrays or

sequences of data without the need for explicit loops. This approach leverages highly
optimized, low-level code to achieve faster and more efficient computations. The
primary library for vectorized computation in Python is NumPy.

Traditional Loop-Based Computation

In traditional Python programming, you might use explicit loops to perform
operations on arrays or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
VISHNU PRIYA P M 14
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
Vectorized Computation with NumPy

NumPy allows you to perform operations on entire arrays, making code more concise and
efficient. Here's how you can achieve the same result using NumPy:

import numpy as np

# Using NumPy for element-wise addition

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2

# Result: array([5, 7, 9])

VISHNU PRIYA P M 15
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two
primary data structures: the DataFrame and the Series. These data structures are
designed to handle structured data, making it easier to work with datasets in a tabular
format.
DataFrame:

 A DataFrame is a 2-dimensional, labeled data structure that resembles a

spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type
(e.g., integers, floats, strings, or even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is
VISHNU PRIYA P M 16

a column.
Here's a basic example of how to create a DataFrame using
Pandas:
import pandas as pd

# Creating a DataFrame from a dictionary of data

data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

# Displaying the DataFrame

print(df)
Importing pandas: import pandas as pd brings in the pandas
library so you can use its features.
Creating Data: A dictionary called data holds your
information.
DataFrame: pd.DataFrame(data) converts the dictionary into
a DataFrame.
VISHNU PRIYA P M Displaying Data: print(df) shows the table. 17
Series:

 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional
data.
Key characteristics of a Pandas Series:

 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces
homogeneity, meaning all the data within a Series must be of the same data type. For
example, if you create a Series with integer values, all values within that Series will be
integers.

 Labeled Data: Series have two parts: the data itself and an associated index. The index
provides labels or names for each data point in the Series. By default, Series have a numeric
index starting from 0, but you can specify custom labels if needed.
VISHNU PRIYA P M 18
 Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but
does not have columns or rows like a DataFrame.
import pandas as pd
0 10
# Create a Series from a list 1 20
data = [10, 20, 30, 40, 50] 2 30
series = pd.Series(data) 3 40
4 50
# Display the Series dtype: int64
print(series)

VISHNU PRIYA P M 19
Some common tasks you can perform with Pandas:

 Data Loading: Pandas can read data from various sources, including CSV files, Excel
spreadsheets, SQL databases, and more.

 Data Cleaning: You can clean and preprocess data by handling missing values, removing
duplicates, and transforming data types.

 Data Selection: Easily select specific rows and columns of interest using various indexing
techniques.

 Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data
based on specific criteria.

 Data Visualization: You can use Pandas in conjunction with visualization libraries like
Matplotlib and Seaborn to create informative plots and charts.

VISHNU PRIYA P M 20
DataFrame

A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially

heterogeneous tabular data structure provided by the popular library called Pandas. It is a
fundamental data structure for data manipulation and analysis in Python.

Here's how you can work with DataFrames in Python using Pandas:

1. Import Pandas:
First, you need to import the Pandas library.

import pandas as pd

2. Creating a DataFrame:
You can create a DataFrame in several ways. Here
are a few common methods:

From a dictionary:

data = {'Column1': [value1, value2, ...],

VISHNU PRIYA P M 21
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
• From a list of lists:

data = [[value1, value2],

[value3, value4]]
df = pd.DataFrame(data, columns=['Column1',
'Column2'])

• From a CSV file:

df = pd.read_csv('file.csv')

3. Viewing Data:
You can use various methods to view and explore your DataFrame:

df.head(): Displays the first few rows of the DataFrame.

df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
VISHNU PRIYA P M 22
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For
example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple
columns
df[df['Column1'] > 5] # Filter rows based on a
condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending
rows. For example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row

VISHNU PRIYA P M 23
6. Data Analysis:
Pandas provides various functions for data
analysis, such as describe(), groupby(), agg(), and
more.

7. Saving Data:
You can save the DataFrame to a CSV file or other
df.to_csv('output.csv',
formats: index=False)

VISHNU PRIYA P M 24
INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING

In Pandas, the Index object is a fundamental component of both Series and

DataFrame data structures. It provides the labels or names for the rows or columns of
your data. You can use indexing, selection, and filtering techniques with these indexes
to access specific data points or subsets of your data. Here's how you can work with
index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can
use .loc[] for label-based indexing and .iloc[] for integer-based indexing.

• Label-based indexing:

df.loc['label'] # Access a specific row by its label

df.loc['label',
VISHNU PRIYA P M 'column_name'] # Access a specific 25
element by label and column name
• Integer-based indexing:

df.iloc[0] # Access the first row

df.iloc[0, 1] # Access an element by row and
column index

2. Selection:
You can use various methods to select specific data based on conditions or criteria.

• Select rows based on a condition:

df[df['Column'] > 5] # Select rows where 'Column' is greater than 5

• Select rows by multiple conditions:

df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10

VISHNU PRIYA P M 26
3. Filtering:
Filtering allows you to create a boolean mask based on a
condition and then apply that mask to your DataFrame to
select rows meeting the condition.

Create a boolean mask:

condition = df['Column'] > 5

Apply the mask to the DataFrame:

filtered_df = df[condition]
VISHNU PRIYA P M 27
A boolean mask is like a checklist that goes through each row in your DataFrame and
marks whether it meets the condition (True) or not (False).
Boolean Mask Example:
Meets Condition?
Name Age Score
(Age > 25)
Alice 24 85 False
Bob 27 90 True
Charlie 22 88 False
David 32 95 True

4. Setting a New Index:

You can set a specific column as the index of your DataFrame using the .set_index() method.

df.set_index('Column_Name', inplace=True)

VISHNU PRIYA P M 28
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you
can use the .reset_index() method.

df.reset_index(inplace=True)

6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex
hierarchical data structures.

df.set_index(['Index1', 'Index2'], inplace=True)

Index objects in Pandas are versatile and powerful for working with data because
they enable you to access and manipulate your data in various ways, whether it's for
data retrieval, filtering, or restructuring.

VISHNU PRIYA P M 29
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed
between Series and DataFrames when they have different shapes or indices. Pandas
automatically aligns data based on the labels of the objects involved in the operation, which
ensures that the result of the operation maintains data integrity and is aligned correctly. Here are
some key aspects of arithmetic and data alignment in Pandas:

1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division)
between two Series or DataFrames, Pandas aligns the data based on their labels (index or column
names). It aligns the data based on common labels and performs the operation only on matching
labels.

series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])

series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
VISHNU PRIYA P M 30

result = series1 + series2

In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing
values.

3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The
alignment occurs both for rows (based on the index) and columns (based on column names).

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])

df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])

result = df1 + df2

In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in
both df1 and df2.

4. Handling Missing Data:

You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to
remove rows or columns with missing data.
VISHNU PRIYA P M 31

result_filled = result.fillna(0) # Replace NaN with 0

result_dropped = result.dropna() # Remove rows or columns with NaN values
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts
the scalar to match the shape of the Series.

series = pd.Series([1, 2, 3])

scalar = 2

result = series * scalar

In this example, result will be a Series with values [2, 4, 6].

Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and
allows you to work with datasets of different shapes without needing to manually align them. It
ensures that operations are performed in a way that maintains the integrity and structure of
your data.

VISHNU PRIYA P M 32
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays.
However, unlike Pandas, NumPy is primarily focused on numerical computations with
homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment
work in NumPy:

Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on
the shape of the arrays being operated on. This means that if you perform an operation
between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to
match the shape of the larger one, element-wise.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 =PRIYA
VISHNU np.array([4,
PM 5]) 33

result = arr1 + arr2

Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:

If the arrays have a different number of dimensions, pad the smaller shape with ones on the left
side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them
is 1, they are compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast
together" error.

Handling Missing Data:

In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations
between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending
on whether broadcasting is possible.

Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in
the resulting array is the result of applying the operation to the corresponding elements in the
input arrays.
VISHNU PRIYA P M 34
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
APPLYING FUNCTIONS AND MAPPING

In NumPy, you can apply functions and perform element-wise operations on arrays using various
techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function.
Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview
of these approaches:

Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply
functions to entire arrays or elements of arrays without the need for explicit loops. NumPy
provides built-in functions that can be applied element-wise to arrays.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Applying
VISHNU PRIYA P Ma function element-wise 35

result = np.square(arr) # Square each element

In this example, the np.square() function is applied element-wise to the arr array.
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a
multi-dimensional array. This is useful when you want to apply a function to each row or column
of a 2D array.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Apply a function along the rows (axis=1)

def sum_of_row(row):
return np.sum(row)

result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)

In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.

VISHNU PRIYA P M 36
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which
can then be applied element-wise to NumPy arrays.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Define a Python function

def my_function(x):
return x * 2

# Create a vectorized version of the function

vectorized_func = np.vectorize(my_function)

# Apply the vectorized function to the array

result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
VISHNU PRIYA P M 37
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.

import numpy as np

arr = np.array([1, 2, 3, 4])

# Define a Python function

def my_function(x):
return x * 2

# Create a vectorized version of the function

vectorized_func = np.vectorize(my_function)

# Map the function to each element

result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more
complex mapping operations.
VISHNU PRIYA P M 38
These methods allow you to apply functions and perform mapping operations
efficiently on NumPy arrays, making it a powerful library for numerical and scientific
computing tasks.
SORTING AND RANKING

Sorting and ranking are common data manipulation operations in data analysis and are widely
supported in Python through libraries like NumPy and Pandas. These operations help organize
data in a desired order or rank elements based on specific criteria. Here's how to perform
sorting and ranking in both libraries:

Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.

np.sort(): This function returns a new sorted array without modifying the original array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

sorted_arr
VISHNU PRIYA P M = np.sort(arr) 39
np.argsort(): This function returns the indices that would sort the array. You can use these
indices to sort the original array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify
the column(s) to sort by and the sorting order.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 22, 35]}

df = pd.DataFrame(data)
VISHNU PRIYA P M 40

# Sort by 'Age' column in ascending order

sorted_df = df.sort_values(by='Age', ascending=True)
Ranking in NumPy:

NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking
of elements. You can then use these rankings to create a ranked array.

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and
how to handle ties (e.g., assigning the average rank to tied values).

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [25, 30, 22, 30]}

df = pd.DataFrame(data)
VISHNU PRIYA P M 41

# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS

1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.

import numpy as np

data = np.array([25, 30, 22, 35, 28])

mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)

VISHNU PRIYA P M 42
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.

percentile_25 = np.percentile(data, 25)

percentile_75 = np.percentile(data, 75)

3. Correlation and Covariance:

You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().

correlation_matrix = np.corrcoef(data1, data2)

covariance_matrix = np.cov(data1, data2)

VISHNU PRIYA P M 43
CORRELATION AND COVARIANCE

In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef()
and np.cov() functions, respectively. These functions are useful for analyzing relationships and
dependencies between variables. Here's how to use them:

Computing Correlation Coefficient (Correlation):

The correlation coefficient measures the strength and direction of a linear relationship between
two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation),
with 0 indicating no linear correlation.

import numpy as np

# Create two arrays representing variables

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
VISHNU PRIYA P M 44
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)

# The correlation coefficient is in the (0, 1) element of the matrix

correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between
x and y.

VISHNU PRIYA P M 45
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values
indicate a positive relationship (both variables increase or decrease together), while negative
values indicate an inverse relationship (one variable increases as the other decreases).

import numpy as np

# Create two arrays representing variables

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])

# Compute the covariance between x and y

covariance_matrix = np.cov(x, y)

# The covariance is in the (0, 1) element of the matrix

covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.

Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute
correlations
VISHNU PRIYA P M and covariances for multiple variables simultaneously. For example, if you have a 46
dataset with multiple columns, you can compute the correlation matrix or covariance matrix for
all pairs of variables.
HANDLING MISSING DATA

Handling missing data in NumPy is an important aspect of data analysis and manipulation.
NumPy provides several ways to work with missing or undefined values, typically represented
as NaN (Not-a-Number). Here are some common techniques for handling missing data in
NumPy:

Using np.nan: NumPy represents missing data using np.nan. You can create arrays with missing
values like this:

import numpy as np

arr = np.array([1.0, 2.0, np.nan, 4.0])

Now, arr contains a missing value represented as np.nan.

VISHNU PRIYA P M 47
Checking for Missing Data: You can check for missing values using the np.isnan() function. For
example:

np.isnan(arr) # Returns a boolean array indicating which elements are NaN.

Filtering Missing Data: To filter out missing values from an array, you can use boolean indexing.
For example:

arr[~np.isnan(arr)] # Returns
Replacing Missing Data: anreplace
You can array without
missingNaN values.
values with a specific value using
np.nan_to_num() or np.nanmean(). For example:

arr[np.isnan(arr)] = 0 # Replace NaN with 0

Or, to replace NaN with the mean of the non-missing values:

mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean

VISHNU PRIYA P M 48
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing
values. You can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore
NaN values when computing the result.

Interpolation: If you have a time series or ordered data, you can use interpolation methods to
fill missing values. NumPy provides functions like np.interp() for this purpose.

Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with
missing data more explicitly by creating a mask that specifies which values are missing. This
can be useful for certain computations.

Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional

arrays, you can apply the above techniques along a specific axis or use functions like
np.isnan() with the axis parameter to handle missing data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends
on your data analysis goals and the context of your data. Some methods may be
more appropriate than others, depending on your use case.

VISHNU PRIYA P M 49
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with
multi-dimensional arrays where each dimension has multiple levels or labels. This is particularly
useful when you want to represent higher-dimensional data with more complex hierarchical
structures.

You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:

import numpy as np

# Create a MultiIndex with two levels

index = np.array([['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
multi_index = np.vstack((index, ['X', 'Y', 'X', 'Y'])).T

# Create a random data array

data = np.random.rand(4, 3)
VISHNU PRIYA P M 50

# Create a DataFrame with MultiIndex

In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1',
'2' as the second level. Then, we've created a DataFrame with this MultiIndex and some random
data.

You can access data from this DataFrame using hierarchical indexing. For example:

# Accessing data using hierarchical indexing

value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # Access Value1 for 'A', 1, 'X'

VISHNU PRIYA P M 51
Some common operations with hierarchical indexing include:

Slicing: You can perform slices at each level of the index, allowing you to select specific subsets
of the data.

Stacking and Unstacking: You can stack or unstack levels to convert between a wide and long
format, which can be useful for different types of analyses.

Swapping Levels: You can swap levels to change the order of the levels in the index.

Grouping and Aggregating: You can group data based on levels of the index and perform
aggregation functions like mean, sum, etc.

Reordering Levels: You can change the order of levels in the index.

Resetting Index: You can reset the index to move the hierarchical index levels back to columns.

VISHNU PRIYA P M 52
Hierarchical indexing is especially valuable when dealing with multi-dimensional
data, such as panel data or data with multiple categorical variables. It allows for
more expressive data organization and manipulation. You can also use the
pd.MultiIndex class from the pandas library, which provides more advanced
functionality for working with hierarchical data structures, including various
methods for creating and manipulating MultiIndex objects.

VISHNU PRIYA P M 53

Domino F530i All Sectors Brochure
No ratings yet
Domino F530i All Sectors Brochure
12 pages
Component Description Format User Guide: Product Version 6.1.6 November 2013
No ratings yet
Component Description Format User Guide: Product Version 6.1.6 November 2013
192 pages
4 Polyflow - 12.1 Polyflow B
No ratings yet
4 Polyflow - 12.1 Polyflow B
17 pages
NUMPY
No ratings yet
NUMPY
33 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Numpy
No ratings yet
Numpy
27 pages
NUMPY
No ratings yet
NUMPY
8 pages
Numpy in python
No ratings yet
Numpy in python
34 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
Array in Python
No ratings yet
Array in Python
33 pages
Mds1111 Merged Numbered (1)
No ratings yet
Mds1111 Merged Numbered (1)
41 pages
NUMPY _ PANDAS
No ratings yet
NUMPY _ PANDAS
26 pages
Unit 4
No ratings yet
Unit 4
19 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
NumPy class 11th
No ratings yet
NumPy class 11th
10 pages
Week2-1 Numpy
No ratings yet
Week2-1 Numpy
43 pages
Basic of Numphy
No ratings yet
Basic of Numphy
14 pages
SELF NUMPY
No ratings yet
SELF NUMPY
6 pages
NUMPY, PANDAS
No ratings yet
NUMPY, PANDAS
19 pages
Unit4
No ratings yet
Unit4
49 pages
Numpy_new
No ratings yet
Numpy_new
16 pages
Lab 1
No ratings yet
Lab 1
6 pages
Practical Guide To NumPy For Data Science
100% (1)
Practical Guide To NumPy For Data Science
27 pages
UNIT IV FDS
No ratings yet
UNIT IV FDS
142 pages
DSE UNIT 3
No ratings yet
DSE UNIT 3
12 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
Numpy_and_Pandas[1]
No ratings yet
Numpy_and_Pandas[1]
28 pages
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
No ratings yet
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
51 pages
Numpy_Notes (1)
No ratings yet
Numpy_Notes (1)
5 pages
Getting started with NumPy in Data Analytics
No ratings yet
Getting started with NumPy in Data Analytics
45 pages
Unit3_ Arrays and Strings
No ratings yet
Unit3_ Arrays and Strings
20 pages
1_Numpy
No ratings yet
1_Numpy
26 pages
PP&DS-3
No ratings yet
PP&DS-3
109 pages
Ot Lab 6
No ratings yet
Ot Lab 6
13 pages
NumPy
No ratings yet
NumPy
8 pages
Numpy
No ratings yet
Numpy
32 pages
p
No ratings yet
p
27 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Python NumPy Cheat Sheet
No ratings yet
Python NumPy Cheat Sheet
1 page
numpy_ppt
No ratings yet
numpy_ppt
73 pages
Numpy
No ratings yet
Numpy
64 pages
DE LAB MANUAL NEW
No ratings yet
DE LAB MANUAL NEW
24 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
Lab 02
No ratings yet
Lab 02
5 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
Numpy ML - AI
No ratings yet
Numpy ML - AI
135 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
Unit 4 Python Numpy
No ratings yet
Unit 4 Python Numpy
18 pages
Num Py
No ratings yet
Num Py
21 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
No ratings yet
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
48 pages
Python 5 Unit
No ratings yet
Python 5 Unit
74 pages
unit-3
No ratings yet
unit-3
34 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
15
No ratings yet
15
4 pages
Num Py
No ratings yet
Num Py
18 pages
Topic - 2 - The Basics of NumPy Arrays 1
100% (1)
Topic - 2 - The Basics of NumPy Arrays 1
10 pages
10 Numpy
No ratings yet
10 Numpy
39 pages
Num Py
No ratings yet
Num Py
49 pages
M3-Introduction to Numpy and Pandas
No ratings yet
M3-Introduction to Numpy and Pandas
55 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Cholesky Decomposition
No ratings yet
Cholesky Decomposition
13 pages
MySQL Test
No ratings yet
MySQL Test
5 pages
Five Steps in Programming
No ratings yet
Five Steps in Programming
2 pages
Cui Parrot Sosp13
No ratings yet
Cui Parrot Sosp13
21 pages
Assembly Language Programming 8085
No ratings yet
Assembly Language Programming 8085
46 pages
Autodesk Inventor Ilogic Basic Tutorial
100% (3)
Autodesk Inventor Ilogic Basic Tutorial
33 pages
Week 7 Computer Programming Module
No ratings yet
Week 7 Computer Programming Module
12 pages
AVL TreeSolutions
100% (1)
AVL TreeSolutions
22 pages
Unit I - Basic Concepts of Programming
No ratings yet
Unit I - Basic Concepts of Programming
52 pages
DS Notes New 2.1 (1)
No ratings yet
DS Notes New 2.1 (1)
23 pages
JAVA FAQ Viva Questions With Answers
No ratings yet
JAVA FAQ Viva Questions With Answers
8 pages
DSA With Python
No ratings yet
DSA With Python
10 pages
Unit-4 (OOAD)
92% (12)
Unit-4 (OOAD)
83 pages
Java Notes (Edited.)
No ratings yet
Java Notes (Edited.)
101 pages
TD Osb Sinc
No ratings yet
TD Osb Sinc
135 pages
Course_guidebook_Fundamentals of Programming II(IS)
No ratings yet
Course_guidebook_Fundamentals of Programming II(IS)
3 pages
15.053 - Optimization Methods in Management Science (Spring 2007) Problem Set 5
No ratings yet
15.053 - Optimization Methods in Management Science (Spring 2007) Problem Set 5
7 pages
Basis of Bisection Method
No ratings yet
Basis of Bisection Method
16 pages
C Piscine: Abstract: This Document Is The Subject For Day07 of The C Piscine at 42
No ratings yet
C Piscine: Abstract: This Document Is The Subject For Day07 of The C Piscine at 42
14 pages
apexMCQ From
No ratings yet
apexMCQ From
11 pages
Java Gui
No ratings yet
Java Gui
3 pages
Irp May 29th
No ratings yet
Irp May 29th
5 pages
D) All of The Above A) AVL Tree
No ratings yet
D) All of The Above A) AVL Tree
26 pages
Js Bom Events 120401044137 Phpapp02 PDF
100% (1)
Js Bom Events 120401044137 Phpapp02 PDF
19 pages
Java Notes For ECE
No ratings yet
Java Notes For ECE
42 pages
Oops Practical
No ratings yet
Oops Practical
17 pages
MapReduce - Report
No ratings yet
MapReduce - Report
8 pages