Python for Data Analysis - Complete Notes
1. Introduction to Python for Data Analysis
Python is a powerful and widely-used language in data analysis due to its simplicity and rich ecosystem of
libraries. It supports data manipulation, visualization, and machine learning tasks efficiently.
2. Key Libraries for Data Analysis
- NumPy: For numerical operations
- Pandas: For data manipulation and analysis
- Matplotlib: For basic data visualization
- Seaborn: For statistical plots
- Plotly: For interactive visualizations
- Scikit-learn: For machine learning
- Statsmodels: For statistical modeling
3. NumPy Basics
NumPy provides array objects and functions to perform mathematical operations efficiently.
Example:
import numpy as np
arr = [Link]([1, 2, 3])
print([Link]()) # Output: 2.0
4. Pandas Essentials
Pandas is used to handle tabular data using DataFrames and Series.
Example:
import pandas as pd
df = pd.read_csv('[Link]')
print([Link]())
print([Link]())
5. Data Cleaning with Pandas
- Handling missing values: [Link](), [Link]()
- Filtering data: df[df['column'] > value]
Python for Data Analysis - Complete Notes
- Renaming columns: [Link](columns={'old': 'new'})
- Changing data types: df['col'] = df['col'].astype(int)
6. Data Aggregation & Grouping
Grouping helps in aggregating data:
[Link]('column')['sales'].sum()
df.pivot_table(index='category', values='sales', aggfunc='mean')
7. Data Visualization with Matplotlib & Seaborn
Matplotlib:
import [Link] as plt
[Link]([1, 2, 3], [4, 5, 6])
[Link]()
Seaborn:
import seaborn as sns
[Link](x='category', y='sales', data=df)
8. Handling Time Series Data
Pandas supports datetime operations:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df['value'].resample('M').mean()
9. Basic Statistics with Python
You can compute basic statistics with Pandas or NumPy:
df['column'].mean(), median(), std(), var(), value_counts()
10. Intro to Scikit-learn
Scikit-learn is used for ML modeling:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
[Link](X_train, y_train)
Python for Data Analysis - Complete Notes
predictions = [Link](X_test)
11. Exploratory Data Analysis (EDA)
- Understand data shape and types
- Check missing values
- Use [Link](), [Link](), [Link]()
- Visualize with histograms, boxplots, correlation heatmaps
12. Interview Tips
- Be confident in Pandas and NumPy
- Know how to clean and filter data
- Practice basic visualizations
- Understand simple ML concepts like linear regression
- Be ready to write logic for real-world scenarios