0% found this document useful (0 votes)
22 views33 pages

Unit 1 Supervised Learning

The document outlines various applications of Machine Learning (ML) across multiple domains including AI, software engineering, cybersecurity, and healthcare, highlighting its transformative impact on system design and automation. It discusses key ML types such as supervised, unsupervised, and reinforcement learning, along with specific algorithms and their use cases. Additionally, it provides a structured approach to learning ML concepts, covering foundational principles, algorithm design, and modern advancements.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views33 pages

Unit 1 Supervised Learning

The document outlines various applications of Machine Learning (ML) across multiple domains including AI, software engineering, cybersecurity, and healthcare, highlighting its transformative impact on system design and automation. It discusses key ML types such as supervised, unsupervised, and reinforcement learning, along with specific algorithms and their use cases. Additionally, it provides a structured approach to learning ML concepts, covering foundational principles, algorithm design, and modern advancements.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT 1 Supervised learning

Applications of Machine Learning


Machine learning (ML) has a wide range of applications in computer science and
engineering, revolutionizing how systems are designed, optimized, and automated. Below
are some key applications across different domains:

1. Artificial Intelligence and Robotics


 Autonomous Systems: Self-driving cars, drones, and robots use ML for perception,
decision-making, and control.
 Natural Language Processing (NLP): ML powers language translation, chatbots,
speech recognition, and sentiment analysis.
 Computer Vision: Used in object detection, facial recognition, scene understanding,
and surveillance systems.

2. Software Engineering
 Code Prediction and Completion: Tools like GitHub Copilot use ML to assist
developers in writing code.
 Bug Detection and Testing: ML helps predict software bugs, perform automated
testing, and ensure code quality.
 Software Maintenance: ML models predict software aging and recommend
necessary updates.

3. Cybersecurity
 Intrusion Detection Systems (IDS): Detect anomalies in network traffic using ML-
based classification.
 Malware Detection: Analyze behavior patterns to identify malicious software.
 Spam and Phishing Detection: Classify emails/messages to filter out malicious
content.

4. Data Science and Analytics


 Predictive Analytics: Forecast trends, sales, or behaviors using regression and
classification models.
 Recommendation Systems: Used by e-commerce, streaming platforms (e.g.,
Amazon, Netflix) to suggest products or content.
 Customer Segmentation: Clustering techniques group customers based on behavior
or preferences.

5. Computer Networks
 Traffic Prediction and Routing: ML models predict congestion and optimize routing
in large networks.
 QoS Optimization: Enhance quality of service by learning optimal parameter
configurations.
 Fault Detection: Identify faults in network devices through pattern analysis.

6. Embedded Systems and IoT


 Smart Devices: ML powers smart assistants, thermostats, and wearable health
monitors.
 Anomaly Detection: In IoT devices, detect abnormal sensor readings or operational
behavior.

1
 Energy Management: Optimize energy consumption patterns in smart homes and
grids.

7. Cloud Computing and Edge Computing


 Resource Allocation: ML predicts workload demands and allocates computational
resources accordingly.
 Latency Reduction: Learn and adapt data routing strategies to reduce latency in edge
devices.

8. Bioinformatics and Health Informatics


 Medical Diagnosis: ML helps in diagnosing diseases from images (e.g., X-rays,
MRIs) or genetic data.
 Drug Discovery: Predict how different compounds will interact with targets.
 Wearable Monitoring: Real-time health monitoring and alerts using ML on
physiological signals.

9. Image and Signal Processing


 Image Enhancement: Noise removal, super-resolution, and filtering.
 Speech Processing: Voice recognition, speaker identification, and voice synthesis.

10. Smart Infrastructure and Automation


 Smart Grids: Predict energy demand and dynamically manage power supply.
 Traffic Management: Adaptive signal control and congestion prediction using real-
time data.
 Industrial Automation: ML-driven predictive maintenance, quality control, and
robotics.

Summary Table:
Domain Application
AI & Robotics Vision, decision-making, automation
Software Engineering Code generation, testing, bug prediction
Cybersecurity Threat detection, anomaly identification
Data Science Forecasting, clustering, recommendations
Networks Routing, traffic prediction, fault detection
IoT Smart devices, energy saving, anomaly detection
Cloud/Edge Load balancing, latency optimization
Healthcare Diagnostics, treatment recommendation
Signal Processing Voice/image recognition, enhancement
Automation Smart city, industrial systems

2
📊 Table: ML Algorithms and Their Applications in Computer Science and
Engineering
Application Area ML Task Type Specific ML Purpose / Use
Algorithms Used Case
Network Intrusion Classification SVM, Decision Tree, Detect malicious or
Detection Random Forest, Naive anomalous network
Bayes, k-NN behavior
Traffic Prediction Regression / Time Linear Regression, Forecast network
Series Forecasting LSTM, ARIMA, congestion or traffic
Prophet volume
Smart Routing Reinforcement Q-Learning, Deep Q- Learn optimal
(SDN) Learning Networks (DQN), routing paths to
PPO reduce latency
Malware Detection Classification Logistic Regression, Classify network
Gradient Boosting packets as benign
(XGBoost, or malicious
LightGBM), CNNs
Network Anomaly Unsupervised K-Means, DBSCAN, Detect abnormal
Detection Learning Isolation Forest, patterns without
Autoencoders labeled data
QoS/QoE Reinforcement / Deep RL, Multi- Optimize service
Optimization Regression Armed Bandits, quality and user
Support Vector experience
Regression
Wireless Network Supervised Decision Tree, Predict best handoff
Handoff Learning Random Forest, point in mobile
Neural Networks networks
Software Bug Classification SVM, Random Forest, Predict fault-prone
Prediction AdaBoost software modules
Code Sequence RNN, Transformer Suggest code
Recommendation / Modeling / NLP (e.g., BERT, GPT), snippets or auto-
Completion LSTM complete code
Automated Testing Classification / Decision Trees, Classify test cases
Clustering Clustering (K-Means) or cluster test
behaviors
Image Recognition Classification CNNs (e.g., ResNet, Detect objects,
(Computer Vision) VGG), YOLO, Faster faces, or patterns in
R-CNN images
Speech Recognition Sequence-to- RNNs, LSTM, Convert spoken
Sequence Attention, words to text
Transformers
Natural Language Classification / Naive Bayes, BERT, Sentiment analysis,
Processing (NLP) Generation GPT, RoBERTa chatbots,
summarization
Cloud Resource Regression / RL XGBoost, DQN, Predict load,
Allocation Bayesian allocate
Optimization CPU/memory
dynamically
IoT Anomaly Clustering / One-Class SVM, Detect abnormal

3
Detection Classification DBSCAN, Random sensor behavior
Forest
Recommendation Collaborative ALS, SVD++, KNN, Suggest movies,
Systems Filtering / Matrix Neural Collaborative products, or
Factorization Filtering services
Medical Diagnosis Classification / CNN, SVM, Logistic Classify diseases
(CAD) Detection Regression from CT, MRI, or
X-ray images
Autonomous Computer Vision + CNN, RNN, DQN, Detect lanes, signs;
Vehicles RL A3C make driving
decisions
Face Recognition Classification / CNN, Triplet Loss Identify or verify
Embedding Networks, FaceNet human faces
Data Clustering Unsupervised K-Means, GMM, Group similar data
(Unlabeled Data) Learning Spectral Clustering, points without
Hierarchical labels
Clustering

🔍 Legend / Notes:
 Classification: Predict categories or classes (e.g., spam vs. non-spam).
 Regression: Predict continuous values (e.g., network load).
 Unsupervised Learning: Discover hidden patterns without labeled data.
 Reinforcement Learning (RL): Learn optimal actions via rewards (e.g., in routing or
robotics).
 Sequence Models: Handle data with temporal or ordered structure (e.g., speech, text).

These course objectives outline a comprehensive approach to learning Machine Learning


(ML) and Deep Learning (DL), covering foundational concepts, algorithmic design, and
modern advancements. Below is an elaboration on each objective:

4
1. To learn the concept of how to learn patterns and concepts from data without being
explicitly programmed
This objective introduces the core philosophy of Machine Learning: enabling systems
to automatically learn and improve from experience (data) without relying on hard-coded
rules.
 Key Topics Covered:
o Difference between traditional programming (explicit instructions)
and ML (data-driven learning).
o Understanding training data, features, and generalization.
o Introduction to model training, inference, and evaluation.
o Role of loss functions, optimization (e.g., gradient descent), and
overfitting/underfitting.
o Examples: A spam classifier learning from email data, a recommendation
system adapting to user preferences.
2. To design and analyze various machine learning algorithms and techniques with a
modern outlook focusing on recent advances
This objective emphasizes algorithmic understanding and cutting-edge developments in
ML.
 Key Topics Covered:
o Classical ML Algorithms:
 Linear/Logistic Regression, Decision Trees, SVM, k-NN, Naïve
Bayes.
o Ensemble & Advanced Methods:
 Random Forests, Gradient Boosting (XGBoost, LightGBM).
o Modern Techniques:
 Explainable AI (XAI), Federated Learning, Reinforcement Learning
(RL), GANs, and Transformer-based models.
o Analysis:
 Time complexity, bias-variance tradeoff, scalability, and
interpretability.
o Tools:
 Python (scikit-learn, TensorFlow/PyTorch), AutoML, and cloud-based
ML services.
3. Explore supervised and unsupervised learning paradigms of machine learning
This objective differentiates between two fundamental learning approaches in ML.
 Supervised Learning (Labeled Data):
o Regression: Predicting continuous values (e.g., house prices).
o Classification: Categorizing data (e.g., spam detection).
o Algorithms: Linear Regression, SVM, Neural Networks.
 Unsupervised Learning (Unlabeled Data):
o Clustering: Grouping similar data (e.g., customer segmentation via k-Means).
o Dimensionality Reduction: PCA, t-SNE for visualization.
o Association Rules: Market basket analysis (e.g., Apriori algorithm).
 Semi-supervised & Self-supervised Learning:
o Combining labeled and unlabeled data (e.g., contrastive learning in DL).

4. To explore Deep Learning techniques and various feature extraction strategies


This objective delves into Deep Learning, a subset of ML that uses neural networks for
hierarchical feature learning.

5
 Key Topics Covered:
o Neural Network Basics:
 Perceptrons, activation functions (ReLU, Sigmoid), backpropagation.
o Deep Architectures:
 CNNs (for images), RNNs/LSTMs (for sequences), Transformers (for
NLP).
o Feature Extraction Strategies:
 Manual vs. automated feature engineering.
 Transfer Learning (e.g., using pre-trained models like ResNet, BERT).
 Embeddings (Word2Vec, GloVe) and attention mechanisms.
o Applications:
 Computer Vision (object detection), NLP (chatbots), and generative
models (DALL-E).
Connecting the Objectives
The course progresses from fundamentals to advanced topics:
1. Starts with how ML differs from traditional programming.
2. Expands to algorithm design and modern trends (e.g., federated learning).
3. Differentiates supervised vs. unsupervised learning.
4. Culminates in Deep Learning, emphasizing automated feature learning.
This structure ensures a holistic understanding of ML, equipping learners to tackle real-
world problems using state-of-the-art techniques.
What is Machine Learning?
Machine Learning (ML) is a branch of Artificial Intelligence (AI) that enables computers
to learn from data and improve their performance on a task without being explicitly
programmed. Instead of following rigid rules, ML models identify patterns in data and
make predictions or decisions based on those patterns.
Key Characteristics of ML:
 Data-Driven: Relies on large datasets for training.
 Adaptive: Improves over time with more data.
 Automated: Reduces manual intervention in decision-making.

Examples of Machine Learning


1. Spam Detection (Classification)
o How it works: An ML model analyzes email content (words, sender info) and
classifies emails as "spam" or "not spam."
o Example: Gmail uses ML to filter spam emails automatically.
2. Recommendation Systems (Collaborative Filtering)
o How it works: Netflix or Amazon suggests movies/products based on your
past behavior and similar users' preferences.
o Example: YouTube recommends videos based on your watch history.
3. Self-Driving Cars (Reinforcement Learning)
o How it works: An AI learns to drive by receiving rewards (for safe driving)
and penalties (for mistakes) in a simulated environment.
o Example: Tesla’s Autopilot uses deep learning for real-time decision-making.
4. Fraud Detection (Anomaly Detection)
o How it works: Banks use ML to detect unusual transactions (e.g., sudden large
withdrawals) and flag them as potential fraud.
o Example: PayPal uses ML to block suspicious transactions.

Types of Machine Learning

6
ML can be broadly classified into three main types, with some advanced extensions:
1. Supervised Learning (Learning from Labeled Data)
 The model is trained on input-output pairs (labeled data).
 Used for prediction & classification.
 Examples:
o Email Spam Filtering (Input: Email text → Output: "Spam" or "Not Spam").
o House Price Prediction (Input: Size, Location → Output: Price).
o Medical Diagnosis (Input: Patient symptoms → Output: Disease).
Algorithms:
 Linear Regression, Logistic Regression
 Decision Trees, Random Forest
 Support Vector Machines (SVM)
 Neural Networks

2. Unsupervised Learning (Finding Patterns in Unlabeled Data)


 The model discovers hidden structures in data without labels.
 Used for clustering & dimensionality reduction.
 Examples:
o Customer Segmentation (Grouping customers based on purchasing
behavior).
o Anomaly Detection (Finding unusual patterns in network traffic for
cybersecurity).
o Topic Modeling (Extracting themes from large text datasets).
Algorithms:
 K-Means Clustering
 Principal Component Analysis (PCA)
 Apriori (Market Basket Analysis)
 Autoencoders (Deep Learning)

3. Reinforcement Learning (Learning by Trial & Error)


 An agent learns by interacting with an environment and
receiving rewards/penalties.
 Used in robotics, gaming, and autonomous systems.
 Examples:
o AlphaGo (DeepMind’s AI) learned to play Go by competing against itself.
o Robotic Arm Training learns to pick objects efficiently through rewards.
o Self-Driving Cars optimize routes and avoid collisions.
Algorithms:
 Q-Learning
 Deep Q Networks (DQN)
 Policy Gradient Methods

Other Advanced Types


4. Semi-Supervised Learning (Mix of Labeled & Unlabeled Data)
 Uses a small amount of labeled data and a large amount of unlabeled data.
 Example:
o Google Photos uses some labeled faces to recognize unlabeled ones.
5. Self-Supervised Learning (Auto-Generated Labels)
 The model creates its own labels from the data.
 Example:

7
o GPT-3 predicts the next word in a sentence (no human labeling needed).

Summary Table of ML Types


Type Description Example
Supervised Learns from labeled data Spam detection, Price prediction
Unsupervised Finds hidden patterns Customer segmentation
Reinforcement Learns via rewards/penalties Self-driving cars, Game AI
Semi- Uses some labeled + unlabeled data Medical image analysis
Supervised
Self-Supervised Auto-generates labels Language models (GPT)

Conclusion
Machine Learning enables computers to learn from data and make intelligent decisions.
Depending on the problem, we use:
 Supervised Learning (when we have labeled data).
 Unsupervised Learning (when we need to find hidden patterns).
 Reinforcement Learning (for decision-making in dynamic environments).
Each type has real-world applications, from recommendation systems to autonomous
robots.
Types of Supervised learning:
1. Basic Methods
2. Linear Models
3. Support Vector Machine, Non-Linear and Kernel Methods
4. Beyond binary Classification: Multiclass/Structured Output, Ranking.
Types of Unsupervised Learning:
1. Clustering: K-means/Kernel K-means
2. Dimensionality Reduction: PCA and kernel PCA
3. Matrix Factorization and Matrix Completion
4. Generative Models (mixture models and latent factor models)
Supervised Learning
Basic Methods- Elementary methods used in machine learning.
A) Distance-Based Methods in Supervised Learning
Distance-based methods are a fundamental class of supervised learning algorithms that
make predictions based on the similarity (or distance) between data points. These methods
assume that similar inputs yield similar outputs. They are widely used
for classification and regression tasks.

Key Distance-Based Methods


1. k-Nearest Neighbors (k-NN)
 Concept: Predicts the label of a new data point by looking at the k closest labeled
examples in the training set.
 Distance Metrics Used:
o Euclidean Distance (Default):

o Manhattan Distance:

8
o Cosine Similarity (for text/data with high dimensions).
 How it Works:
o For a new data point, compute its distance to all training points.
o Select the k nearest neighbors.
o For classification, take a majority vote of their labels.
o For regression, take the average of their values.
 Example:
o Classification: Predicting if a tumor is malignant (k=3 neighbors vote).
o Regression: Predicting house prices based on similar nearby houses.
 Pros & Cons:
o ✅ Simple, no training phase (lazy learner).
o ❌ Computationally expensive for large datasets (needs to store all training
data).
Numerical:

9
2. Radial Basis Function (RBF) Networks / Kernel Methods
 Concept: Uses distance-based similarity functions (kernels) to transform data into
a higher-dimensional space where it becomes linearly separable.
 Common Kernel:
o Gaussian (RBF) Kernel:

 How it Works:
o Each training point acts as a "center" in the kernel.
o New points are classified based on their weighted distance to these centers.
 Example:
o Support Vector Machines (SVM) with RBF kernel for non-linear
classification.
 Pros & Cons:
o ✅ Works well for non-linear data.
o ❌ Choosing the right kernel & parameters is tricky.
Numerical

10
11
3. Case-Based Reasoning (CBR)
 Concept: Solves new problems by retrieving and adapting solutions from similar past
cases.
 How it Works:
o Stores past cases (training data).
o For a new problem, retrieves the most similar case and adapts its solution.
 Example:
o Medical diagnosis based on similar past patient records.

12
Numeral no 2 for CBR

13
Step 2: Reuse (Adapt Solution)
 Closest Match: Case 2 (Flu) is identical to the new case.
 Adaptation Rule (Domain Knowledge):
o If Fatigue=Yes and Fever=High, prioritize Flu over Common Cold.
 Proposed Diagnosis: Flu.

Step 3: Revise (Validate & Correct)


 Doctor Review:
o Notes that the patient recently traveled (not in query).
o Revises Diagnosis: COVID-19 (more likely due to travel history).

Step 4: Retain (Store New Case)


Updated Case Base:
Case Fever Cough Fatigue Travel History Diagnosis
ID
... ... ... ... ... ...
5 High Yes Yes Yes COVID-19 (Newly added
case)

Key Takeaways
1. Retrieve: Found similar cases using distance metrics.
2. Reuse: Proposed "Flu" based on exact match.
3. Revise: Adjusted to "COVID-19" after expert input.
4. Retain: Added the new case to improve future diagnoses.
Comparison with Weighted k-NN
 CBR: Uses domain knowledge (e.g., travel history) for adaptation.
 k-NN: Only uses mathematical similarity (no manual revision).

14
4. Locally Weighted Regression (LWR)
 Concept: Fits a local regression model around each query point, giving more weight
to nearby points.
 How it Works:
o For a new point, compute weights for all training points based on distance.
o Fit a weighted regression model (e.g., linear regression) to predict the output.
 Example:
o Predicting stock prices where recent data is more relevant than older data.

15
16
When to Use Distance-Based Methods?
✔ Small to Medium Datasets (k-NN is slow for big data).
✔ Non-Parametric Data (no strict assumptions about data distribution).
✔ Interpretability Matters (k-NN is easy to explain).
❌ Avoid When:
 Data is high-dimensional (curse of dimensionality).
 Computational efficiency is critical.

Summary Table
Method Task Key Idea Example Use Case
k-NN Classification/Regression Majority vote of Spam detection,
nearest neighbors House price
prediction
RBF Networks Classification Kernel-based Handwritten digit
similarity recognition (SVM-
RBF)
Case-Based Problem-solving Retrieves similar Medical diagnosis
Reasoning past cases
Locally Regression Weighted Stock price
Weighted regression near forecasting
Regression query point
Conclusion
Distance-based methods are simple yet powerful for supervised learning, relying on the idea
that "similar things behave similarly."
 k-NN is the most popular for quick prototyping.
 RBF Kernels help in non-linear problems.
 LWR & CBR are useful for specialized cases.
B) Nearest Neighbor-Based Methods in Supervised Learning

17
Nearest neighbor (NN) methods are instance-based (or lazy learning) algorithms that make
predictions by finding the most similar training examples (neighbors) to a new data point.
These methods rely on distance metrics to determine similarity and are widely used
for classification and regression tasks.

Key Nearest Neighbor-Based Methods


1. k-Nearest Neighbors (k-NN)
 Concept: Predicts the output of a new data point based on the k closest training
examples.
 How it Works:
1. Compute the distance (e.g., Euclidean, Manhattan) between the new point and
all training points.
2. Select the k nearest neighbors.
3. For classification, take a majority vote of their labels.
4. For regression, take the average of their values.
 Example:
o Classification: Diagnosing a disease based on similar patient records.
o Regression: Predicting house prices using nearby sales.
 Pros & Cons:
o ✅ No training phase (lazy learner).
o ❌ Computationally expensive for large datasets (stores all training data).

2. Radius Neighbors Classifier/Regressor


 Concept: Instead of fixing k, it considers all neighbors within a fixed radius (r).
 How it Works:
o For a new point, find all training points within distance r.
o Predict based on their labels/values (majority vote or average).
 Example:
o Detecting local anomalies in network traffic (if few neighbors exist within r).
 Pros & Cons:
o ✅ Adapts to density variations in data.
o ❌ Struggles if data has varying densities.

3. Weighted k-NN
 Concept: Assigns higher importance to closer neighbors (inverse distance
weighting).
 How it Works:
o Neighbors contribute to prediction based on 1/distance.
o Closer points have more influence than distant ones.
 Example:
o Recommending movies (closer user preferences matter more).
 Pros & Cons:
o ✅ More accurate than standard k-NN.
o ❌ Sensitive to noise in nearby points.

4. Locally Weighted Regression (LWR)


 Concept: Fits a local linear model around each query point, weighting nearby points
more.
 How it Works:

18
1. For a new point, compute weights for all training points (e.g., Gaussian
kernel).
2. Fit a weighted linear regression to predict the output.
 Example:
o Predicting stock prices where recent trends matter more.
 Pros & Cons:
o ✅ Flexible for non-linear data.
o ❌ Computationally intensive.

Distance Metrics Used in NN Methods


Metric Formula Use Case
Euclidean ∑( xi ​− yi ​)2 General-purpose (continuous)
Manhattan ( ∑ x i− y i) High-dimensional data
Cosine Similarity
( xx ⋅⋅ yy ) Text/Image similarity

Minkowski
( (∑ x − y ) )
i
p 1/ p
i
Customizable (p=1: Manhattan, p=2:
Euclidean)

C) Decision Tree-Based Methods in Supervised Learning


Decision tree-based methods are non-parametric, interpretable supervised learning
algorithms that model decisions by splitting data into subsets based on feature values. They
are used for classification and regression tasks and form the foundation for more advanced
ensemble methods (e.g., Random Forests, Gradient Boosting).

Key Decision Tree-Based Methods


1. Decision Trees (CART, ID3, C4.5)
 Concept: A tree-like model where:
o Internal nodes = Features (splitting conditions).
o Branches = Decision rules (e.g., Age < 30).
o Leaf nodes = Predictions (class labels or continuous values).
 How It Works:
1. Splitting Criteria:
 Classification: Gini Impurity or Information Gain (entropy).
 Regression: Variance Reduction (minimizing MSE).
2. Stopping Conditions:
 Max depth, min samples per leaf, or purity threshold.
 Example:
o Classification: Predicting loan default (e.g., splits on Income, Credit Score).
o Regression: Estimating house prices (splits on Square Footage, Location).
 Pros & Cons:
o ✅ Interpretable (visualizable as rules).
o ❌ Prone to overfitting (needs pruning/regularization).

2. Random Forest
 Concept: An ensemble of decision trees trained on random subsets of data/features
(bagging).
 How It Works:
1. Build multiple trees on bootstrapped samples.
2. For prediction:

19
 Classification: Majority voting across trees.
 Regression: Averaging tree outputs.
 Example:
o Detecting diseases from medical records (improves accuracy over a single
tree).
 Pros & Cons:
o ✅ Reduces overfitting, handles noisy data.
o ❌ Less interpretable than single trees.

3. Gradient Boosted Decision Trees (GBDT)


 Concept: Sequentially trains trees to correct errors of previous trees (boosting).
 Algorithms:
o XGBoost, LightGBM, CatBoost (optimized for speed/accuracy).
 How It Works:
1. Fit a weak initial tree.
2. Compute residuals (errors), train next tree on residuals.
3. Combine predictions iteratively.
 Example:
o Ranking search results (used by Google, Amazon).
 Pros & Cons:
o ✅ High accuracy, handles imbalanced data.
o ❌ Requires careful hyperparameter tuning.

4. Extra Trees (Extremely Randomized Trees)


 Concept: Like Random Forest, but splits nodes using random thresholds (not
optimal ones).
 How It Works:
o Faster training (no search for best split).
o More bias but lower variance.
 Example:
o Real-time fraud detection (faster than Random Forest).
 Pros & Cons:
o ✅ Faster training, robust to noise.
o ❌ Slightly less accurate than Random Forest.

When to Use Decision Tree-Based Methods?


✔ Structured/tabular data (not ideal for raw images/text).
✔ Interpretability matters (single trees).
✔ Non-linear relationships (no need for feature scaling).
❌ Avoid When:
 Data has high-dimensional sparse features (e.g., NLP).
 Extrapolation is needed (trees predict within training range).

Summary Table
Method Task Key Idea Example Use Case
Decision Classification/Regression Single tree with Loan approval, Medical
Tree splits diagnosis
Random Classification/Regression Ensemble of Fraud detection, Stock

20
Forest decorrelated trees prediction
Gradient Classification/Regression Sequentially Search ranking, Click-
Boosting corrects errors through rate prediction
Extra Trees Classification/Regression Random splits for Real-time anomaly
speed detection

Conclusion
Decision tree-based methods are versatile and powerful for supervised learning:
 Single trees are simple and interpretable.
 Random Forest improves accuracy via ensemble learning.
 Gradient Boosting achieves state-of-the-art results (used in competitions like
Kaggle).
For large datasets, use LightGBM/CatBoost (optimized for speed). For interpretability,
visualize trees with sklearn.tree.plot_tree.
D) Naive Bayes-Based Methods in Supervised Learning
Naive Bayes is a family of probabilistic classifiers based on Bayes' Theorem, with a
"naive" assumption of feature independence. Despite its simplicity, it performs well in text
classification, spam filtering, and medical diagnosis.

Key Naive Bayes Methods


1. Gaussian Naive Bayes
 Assumption: Features follow a normal (Gaussian) distribution.
 Use Case: Continuous data (e.g., sensor readings, medical measurements).
 Example:
o Predicting diabetes risk based on age, blood sugar, and BMI (continuous
features).
 Pros & Cons:
o ✅ Works well with small datasets.
o ❌ Assumes normal distribution (may fail for skewed data).

2. Multinomial Naive Bayes


 Assumption: Features represent discrete counts (e.g., word frequencies).
 Use Case: Text classification (e.g., spam detection, sentiment analysis).
 Example:
o Classifying emails as "spam" or "not spam" based on word counts.
 Pros & Cons:
o ✅ Handles high-dimensional text data efficiently.
o ❌ Ignores word order (bag-of-words limitation).

3. Bernoulli Naive Bayes


 Assumption: Features are binary (e.g., presence/absence of words).
 Use Case: Binary text classification (e.g., document categorization).
 Example:
o Detecting fake reviews (1 if a word appears, 0 otherwise).
 Pros & Cons:
o ✅ Simpler than Multinomial NB for binary data.
o ❌ Loses frequency information.

4. Complement Naive Bayes


 Extension of Multinomial NB: Adjusts for imbalanced datasets.

21
 Use Case: Text classification with skewed classes.
 Example:
o Identifying rare diseases from clinical reports.
 Pros & Cons:
o ✅ Better for imbalanced data than standard Multinomial NB.
o ❌ Still assumes feature independence.

How Naive Bayes Works


1. Bayes' Theorem:

2. "Naive" Assumption:
o Features are conditionally independent given the class:

3. Prediction:
o For a new sample, compute the posterior for each class and pick the highest
probability.

When to Use Naive Bayes?


✔ Text classification (spam, sentiment analysis).
✔ Small datasets (works well even with limited data).
✔ Low computational resources (fast training/prediction).
❌ Avoid When:
 Features are correlated (violates independence assumption).
 Probabilistic interpretation is unnecessary (e.g., for complex non-linear data).

Comparison of Naive Bayes Variants


Method Data Type Example Use Case
Gaussian NB Continuous features Medical diagnosis
Multinomial NB Discrete counts Spam detection
Bernoulli NB Binary features Fake review detection
Complement NB Imbalanced text Rare disease classification
data

Pros & Cons of Naive Bayes


✅ Advantages:
 Fast training and prediction.
 Works well with high-dimensional data (e.g., text).
 Requires minimal tuning.
❌ Limitations:
 Strong independence assumption (often unrealistic).
 Can be outperformed by more complex models (e.g., SVMs, neural nets).
Example in Scikit-Learn
python
Copy

22
Download
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Sample text data


X = ["free lottery", "urgent meeting", "win cash prize"]
y = ["spam", "not spam", "spam"]

# Convert text to word counts


vectorizer = CountVectorizer()
X_counts = vectorizer.fit_transform(X)

# Train Naive Bayes


clf = MultinomialNB()
clf.fit(X_counts, y)

# Predict
print(clf.predict(vectorizer.transform(["free cash"]))) # Output: "spam"
Conclusion
Naive Bayes is a simple, efficient, and interpretable classifier, especially for:
 Text/data with categorical features.
 Scenarios where speed matters (e.g., real-time spam filtering).
For better accuracy with correlated features, consider Logistic Regression or Tree-Based
Models.

Here’s a tabular comparison of the four basic supervised learning methods:


Aspect Distance- Nearest Decision Tree Naive
Based Neighbor Methods (e.g., CART, Bayes
Methods (e.g., Methods (e.g., Random Forest) Methods (
k-NN) Radius NN) e.g.,
Gaussian
NB)
Core Idea Predicts based Uses neighbors Splits data into branches Uses
on distance within a fixed based on feature rules. Bayes’
metrics (e.g., distance/radius. Theorem
Euclidean). with
feature
independen
ce.
Model Type Instance-based Instance-based Rule-based (eager Probabilisti
(lazy learner). (lazy learner). learner). c (eager
learner).
Training Fast (no Fast (no Moderate (builds splits). Very fast
Speed training, stores training, stores (computes
data). data). probabilitie
s).
Prediction Slow (scans Slow (scans Fast (tree traversal). Very fast
Speed entire dataset). within radius). (probabilit
y lookup).
Interpretabili Moderate Moderate High (visualizable Moderate

23
ty (depends on k). (depends on rules). (probabilist
radius). ic rules).
Handles Non- Yes (flexible Yes (local Yes (non-linear splits). No (linear
Linearity boundaries). approximations decision
). boundaries
).
Feature Required (dist Required (dist Not required. Not
Scaling ance-sensitive). ance-sensitive). required
(except
Gaussian
NB).
Hyperparam k (number of Radius r, max_depth, min_sample Smoothing
eters neighbors), distance metric. s_split. parameter
distance metric. (e.g., alpha
).
Use Cases Image Anomaly Medical diagnosis, Text
recognition, detection, credit scoring. classificati
recommendatio density-based on, spam
n systems. clustering. filtering.
Pros Simple, no Adapts to local Interpretable, handles Fast, works
training phase. data density. mixed data types. well with
small data.
Cons Slow for large Struggles with Prone to overfitting Strong
datasets, curse varying (needs pruning). independen
of densities. ce
dimensionality. assumption
.

Key Takeaways
1. Distance/Neighbor Methods (k-NN, Radius NN):
o Best for small datasets where interpretability isn’t critical.
o Suffer from the curse of dimensionality.
2. Decision Trees:
o Ideal for structured data with non-linear relationships.
o Basis for ensemble methods (Random Forest, XGBoost).
3. Naive Bayes:
o Fast and efficient for text/categorical data.
o Struggles with feature correlations.
For high accuracy, ensemble methods (e.g., Random Forest) often outperform these basic
methods. For speed, Naive Bayes is unbeatable.

Linear Models: Linear models are a fundamental class of supervised learning algorithms that
assume a linear relationship between the input features (independent variables) and the output
(dependent variable). They are widely used for both regression (predicting continuous
values) and classification (predicting discrete labels) tasks.
Key Characteristics of Linear Models:
1. Linear Relationship: The model assumes the output is a weighted sum of input
features.
2. Simplicity & Interpretability: Easy to understand and interpret coefficients.
3. Efficiency: Fast training and prediction, suitable for large datasets.

24
4. Regularization: Can be extended with L1/L2 regularization to prevent overfitting.

A) Linear Regression Model


Linear Regression in Linear Models (with Examples & Explanation)
Linear regression is one of the simplest and most widely used supervised
learning algorithms in linear models. It predicts a continuous numerical output based on a
linear relationship between input features and the target variable.

1. Definition & Mathematical Formulation


Linear regression models the relationship between:
 Independent variables (features) → X1,X2,…,Xn
 Dependent variable (target) → y
The model equation is:
y=β0+β1X1+β2X2+⋯+βnXn+ϵ
where:
 y = predicted output
 β0 = intercept (bias term)
 β1,β2,…,βn = coefficients (feature weights)
 ϵ = error term (residuals, unexplained noise)
Goal of Linear Regression
Find the best-fit line that minimizes the sum of squared errors (SSE) between predicted
and actual values.

2. Types of Linear Regression


Type Description Example Use Case
Simple Linear Only one input feature (X) Predicting house price based on
Regression size
Multiple Linear Multiple features (X1,X2,… Predicting salary based on
Regression ,…) experience, education, location
Polynomial Extends with nonlinear Modeling growth rate with
Regression terms (X2,X3) curvature

3. Example Use Cases


Example 1: Predicting House Prices
Suppose we want to predict house prices based on size (sq. ft.).
 Independent variable (X) → House size
 Dependent variable (y) → Price
Model:
Price=β0+β1×Size
 If β0=50,000 and β1=200, then:
Price=50,000+200×Size
o A 1000 sq. ft. house → 50,000+200×1000=$250,000
Example 2: Predicting Salary Based on Experience
Now, consider predicting salary using years of experience and education level.
 Features (X) → Experience (X1), Education (X2)
 Target (y) → Salary
Model:
Salary=β0+β1×Experience+β2×Education

25
 If β0=30,000β0=30,000, β1=5,000β1=5,000, β2=10,000β2=10,000:
o A person with 3 years of experience and a master's degree (Education=2):
30,000+5,000×3+10,000×2=$65,000

4. How Linear Regression Works (Step-by-Step)


1. Collect Data: Gather input features (XX) and target (yy).
2. Initialize Coefficients: Start with random β0,β1,…β0
3. Compute Predictions: Use the linear equation to predict yy.
4. Calculate Error: Find the difference (residual) between predicted and actual yy.
5. Optimize (Gradient Descent / OLS): Adjust coefficients to minimize Mean
Squared Error (MSE).
6. Evaluate Model: Use metrics like R², MSE, RMSE.

5. Assumptions of Linear Regression


For reliable predictions, the model assumes:
✅ Linearity: Relationship between XX and yy is linear.
✅ No Multicollinearity: Features should not be highly correlated.
✅ Homoscedasticity: Residuals have constant variance.
✅ Normality of Residuals: Errors should be normally distributed.
✅ No Autocorrelation: Residuals should be independent (important in time series).

6. Implementing Linear Regression (Python Example)


python
Copy
Download
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data: X = house sizes, y = prices


X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)
y = np.array([250000, 350000, 450000, 550000, 650000])

# Train the model


model = LinearRegression()
model.fit(X, y)

# Predict a new house price


new_size = np.array([[1800]])
predicted_price = model.predict(new_size)
print(f"Predicted price for 1800 sq. ft: ${predicted_price[0]:,.2f}")

# Model coefficients
print(f"Intercept (β₀): {model.intercept_:.2f}")
print(f"Coefficient (β₁): {model.coef_[0]:.2f}")
Output:
Copy
Download
Predicted price for 1800 sq. ft: $410,000.00
Intercept (β₀): 50000.00
Coefficient (β₁): 200.00

26
7. Advantages & Limitations
✅ Advantages
 Simple & interpretable.
 Fast training and prediction.
 Works well when relationships are linear.
❌ Limitations
 Poor performance on nonlinear data.
 Sensitive to outliers.
 Assumes no multicollinearity.

8. When to Use Linear Regression?


✔️You need a quick baseline model.
✔️The relationship between XX and yy is linear.
✔️Interpretability is important (e.g., finance, healthcare).

9. Extensions & Improvements


 Polynomial Regression: Captures nonlinear trends.
 Ridge/Lasso Regression: Prevents overfitting.
 Robust Regression: Reduces outlier impact.

Final Summary
Linear regression is a fundamental ML algorithm for predicting continuous values. It works
best when:
 The data follows a linear trend.
 Features are not highly correlated.
 You need a simple, explainable model.
B) Logistic Regression : Logistic Regression in Linear Models (with Examples &
Explanation)
Logistic regression is a supervised learning algorithm used for binary
classification (though extendable to multi-class). Unlike linear regression, which predicts
continuous values, logistic regression predicts probabilities (between 0 and 1) using
a logistic (sigmoid) function.

1. Definition & Key Concepts


 Goal: Predict the probability that an instance belongs to a class (e.g., "spam" or "not
spam").
 Output: A probability (0 to 1), thresholded at 0.5 by default for class decision.
 Mathematical Formulation:

2. Why "Logistic" Regression?

27
 Despite the name, it’s a classification (not regression) method.
 Uses log-odds (logit) for linear modeling:

3. Example Use Cases


Example 1: Email Spam Detection
 Features (X): Word frequencies, sender reputation.
 Target (y): 1 (spam) or 0 (not spam).
 Model Output: Probability that an email is spam.
Prediction:
 If P(y=1)=0.8>0.5 → Classify as spam.
Example 2: Medical Diagnosis
 Features (X): Age, blood pressure, cholesterol.
 Target (y): 1 (has disease) or 0 (no disease).
 Model Output: Risk probability.
Prediction:
 If P(y=1)=0.3<0.5 → Classify as healthy.

4. How Logistic Regression Works


1. Input: Features (X) and binary labels (y).
2. Sigmoid Transformation: Converts linear equation to probability.

Optimization: Maximizes log-likelihood (or minimizes log loss) using gradient descent.
3. Decision Boundary: Default threshold = 0.5 (adjustable for precision/recall trade-
offs).

5. Types of Logistic Regression


Type Description Example
Binary Logistic Regression Two classes (0/1) Spam vs. Not spam
Multinomial Logistic >2 classes (softmax) Cat/Dog/Bird
Regression classification
Ordinal Logistic Regression Ordered classes (e.g., rating Movie ratings
1–5)

6. Assumptions
✅ Binary/multinomial target.
✅ Linear relationship between log-odds and features.
✅ No multicollinearity.
✅ Large sample size (for stable estimates).

7. Python Example (Scikit-learn)


python
Copy
Download
from sklearn.linear_model import LogisticRegression

28
import numpy as np

# Sample data: X = exam scores, y = pass(1)/fail(0)


X = np.array([[50], [60], [70], [80], [90]])
y = np.array([0, 0, 1, 1, 1])

# Train the model


model = LogisticRegression()
model.fit(X, y)

# Predict probabilities
print("Probability of passing exam (Score=65):", model.predict_proba([[65]])[:, 1])

# Predict class (threshold=0.5)


print("Predicted class (Score=65):", model.predict([[65]]))
Output:
Copy
Download
Probability of passing exam (Score=65): [0.32] # 32% chance
Predicted class (Score=65): [0] # Fails (since P < 0.5)

8. Advantages & Limitations


✅ Advantages
 Simple, interpretable (coefficients = feature importance).
 Efficient for linearly separable data.
 Outputs probabilities (useful for risk assessment).
❌ Limitations
 Poor performance on nonlinear problems.
 Requires feature scaling (for gradient descent).
 Sensitive to outliers.

9. When to Use Logistic Regression?


✔️Binary/multi-class classification.
✔️Interpretability matters (e.g., healthcare, finance).
✔️Features are roughly linear in log-odds.

10. Extensions & Improvements


 Regularization (L1/L2): Avoid overfitting (like Ridge/Lasso).
 Kernel Logistic Regression: Handles nonlinearity (rarely used).
 Imbalanced Data: Use class weights or resampling.

Key Takeaways
1. Logistic regression predicts probabilities for classification.
2. Uses sigmoid to map linear outputs to [0, 1].
3. Default threshold = 0.5 for class decision.
4. Interpret coefficients as log-odds impact.

C) Generalized Linear Models (GLMs) - Explained with Examples


Generalized Linear Models (GLMs) extend linear regression to support non-normal
distributions (e.g., binary, count, or exponential data). They provide a flexible framework for

29
modeling different types of dependent variables using link functions and exponential family
distributions.

1. Key Components of GLMs


GLMs consist of three main parts:
1. Random Component (Probability Distribution)
o Specifies the distribution of the response variable (e.g., Gaussian, Binomial,
Poisson).
o Example:
 Binary data → Bernoulli/Binomial
 Count data → Poisson
 Continuous data → Gaussian
2. Systematic Component (Linear Predictor)
o A linear combination of features:
η=β0+β1X1+β2X2+⋯+βpXp
3. Link Function
o Connects the mean of the response to the linear predictor:
g(μ)=η
o Common link functions:
 Identity link (for linear regression): μ=η
 Logit link (for logistic regression): log(1−μ)=η
 Log link (for Poisson regression): log(μ)=η

2. Common GLMs with Examples


(1) Linear Regression (Gaussian GLM)
 Distribution: Normal (Gaussian)
 Link: Identity (μ=η)
 Use Case: Predicting continuous outcomes (e.g., house prices).
 Example:
Price=β0+β1×Size+β2×Bedrooms ---- Mean/ expected price.
(2) Logistic Regression (Binomial GLM)

P is mean probability

30
Visits are Mean Visits

Claim Amount are mean Claim amounts


3. How GLMs Work (Step-by-Step)
1. Choose a distribution (based on the response variable).
2. Select a link function (connects mean response to predictors).
3. Estimate coefficients (via Maximum Likelihood Estimation).
4. Predict outcomes using the inverse link function.

4. Python Example (StatsModels)


import statsmodels.api as sm
import pandas as pd

# Example: Poisson Regression (Count Data)


data = pd.DataFrame({
'Accidents': [2, 5, 3, 8, 6], # Response (count)
'Drivers': [10, 20, 15, 30, 25], # Predictor
'Rainy': [0, 1, 0, 1, 1] # Binary predictor
})

# Fit Poisson GLM


glm = sm.GLM(
data['Accidents'],
sm.add_constant(data[['Drivers', 'Rainy']]),
family=sm.families.Poisson()
)
result = glm.fit()
print(result.summary())
Output Interpretation:
 Coefficients show how predictors affect the log of expected accidents.
 Rainy=1 increases accidents by eβ times (incidence rate ratio).

31
5. Advantages & Limitations
✅ Advantages
 Handles non-normal data (binary, counts, etc.).
 Flexible via link functions.
 Unifies many models under one framework.
❌ Limitations
 Requires correct distribution specification.
 Less efficient than specialized models (e.g., random forests for nonlinearity).

6. When to Use GLMs?


✔️Response variable follows an exponential family distribution.
✔️Need interpretable coefficients (e.g., healthcare, economics).
✔️Data is not highly nonlinear.

7. Extensions
 Generalized Additive Models (GAMs): Nonlinear extensions of GLMs.
 Mixed Effects GLMs: For clustered/hierarchical data.

Summary
GLMs generalize linear regression to binary, count, and skewed data using:
 Exponential family distributions (Gaussian, Binomial, Poisson).
 Link functions (identity, logit, log).

Here’s a tabular comparison of Linear Regression (LR), Logistic Regression (LogR),


and Generalized Linear Models (GLM) based on key characteristics:
Feature Linear Regression Logistic Regression Generalized Linear Model
(LR) (LogR) (GLM)
Type of Regression Classification Both (Depends on
Problem (Continuous output) (Binary/Multiclass) distribution/link)
Response Continuous Binary Any exponential family
Variable (Gaussian) (Bernoulli/Binomial) (Gaussian, Binomial, Poisson,
Gamma, etc.)
Link Identity (μ = η) Logit (log(μ/(1-μ)) = Flexible (Identity, Logit, Log,
Function η) Inverse, etc.)
Output Direct predicted Probability (0 to 1) Depends on link (e.g., log-
Interpret value (e.g., price) odds, log-counts)
ation
Distributi Normal (Gaussian) Binomial Exponential Family (Gaussian,
on Family Binomial, Poisson, Gamma,
etc.)
Example Predicting house Spam detection Count data (Poisson), Skewed
Use Case prices data (Gamma)
Mathema y = β₀ + β₁X₁ + ... + P ( y=1 )=1 / ( 1+e ) g(μ) = η, where η = β₀ + β₁X₁
( −η )

tical ε + ...
Form
Optimizat Least Squares (OLS) Maximum Maximum Likelihood (MLE)
ion Likelihood (MLE)

32
Assumpti Linearity, Linearity in log- Correct distribution, Linearity
ons Homoscedasticity, odds, No in link scale
Normality of multicollinearity
residuals
Regulariz Ridge/Lasso L1/L2 (LogisticR in Supported (e.g., GLM with
ation sklearn) elastic net)
Python sklearn.linear_model sklearn.linear_model statsmodels.GLM, scikit-learn
Libraries .LinearRegression .LogisticRegression (limited)

Key Takeaways:
1. Linear Regression assumes Gaussian errors and predicts continuous values.
2. Logistic Regression is a special case of GLM for binary classification (logit link +
Binomial distribution).
3. GLMs generalize both:
o Can model counts (Poisson), binary (Logistic), continuous (Gaussian), etc.
o Use link functions to map linear predictors to the response scale.
Support Vector Machines, Nonlinearity and Kernel Methods

33

You might also like