Week - 5 Pandas essentials
Week - 5 Pandas essentials
1. Series:
Example:Output:
python
Copy code
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)
css
Copy code
a 1
b 2
c 3
d 4
dtype: int64
Pandas functions 1
2. DataFrame:
Example:Output:
python
Copy code
data = {'Column1': [1, 2, 3, 4], 'Column2': ['A', 'B',
'C', 'D']}
df = pd.DataFrame(data)
print(df)
css
Copy code
Column1 Column2
0 1 A
1 2 B
2 3 C
3 4 D
In summary:
Pandas functions 2
Correlation and Covariance are both measures used in statistics and data
analysis to describe the relationship between two variables. However, they differ
in their interpretation, scale, and how they measure this relationship:
1. Covariance:
Definition: Covariance measures the direction of the linear relationship
between two variables. It tells us whether two variables tend to increase or
decrease together.
Interpretation:
If the covariance is negative, when one variable increases, the other tends
to decrease.
2. Correlation:
Definition: Correlation measures both the strength and the direction of the
linear relationship between two variables, but it is normalized and unit-free,
making it easier to interpret and compare across different datasets.
Cov(X, Y )
Corr(X, Y ) = Cov(X, Y )σXσY Corr(X, Y ) =
σX σY
Pandas functions 3
Interpretation:
rr
Key Differences:
Feature Covariance Correlation
In summary:
Correlation tells you both the direction and strength of the linear relationship in
a more interpretable and standardized form.
Pandas functions 4
In pandas, .loc and .iloc are used to select rows and columns from a DataFrame,
but they differ in how they index the data:
Behavior:
It allows for label-based indexing, which means you can select rows and
columns based on their explicit labels (index names or column names).
It supports slicing and selecting specific rows and columns by their labels.
Example:
python
Copy code
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'])
Output:
Copy code
1
python
Copy code
Pandas functions 5
df.loc['row2'] # Selects all columns of row2
python
Copy code
df.loc['row1':'row2', ['A', 'B']] # Slices rows from
'row1' to 'row2' and columns 'A' and 'B'
Behavior:
It allows for position-based indexing, which means you select rows and
columns based on their integer index positions, regardless of the actual
labels.
Like Python slicing, .iloc excludes the ending position when slicing
ranges.
Example:
python
Copy code
# Using .iloc to select by positions
df_iloc = df.iloc[0, 0] # Selects the value at the 0th ro
w and 0th column (first row, first column)
print(df_iloc)
Output:
Pandas functions 6
Copy code
1
python
Copy code
df.iloc[1] # Selects all columns of the second row (in
dex 1)
python
Copy code
df.iloc[0:2, 0:2] # Slices rows from position 0 to 1 a
nd columns from position 0 to 1
Key Differences:
Position-based (integer
Indexing method Label-based (index/column names)
positions)
Summary:
Use .loc when you want to select rows and columns by labels.
Use .iloc when you want to select rows and columns by position.
Pandas functions 7