Unit 1 Supervised Learning
Unit 1 Supervised Learning
2. Software Engineering
Code Prediction and Completion: Tools like GitHub Copilot use ML to assist
developers in writing code.
Bug Detection and Testing: ML helps predict software bugs, perform automated
testing, and ensure code quality.
Software Maintenance: ML models predict software aging and recommend
necessary updates.
3. Cybersecurity
Intrusion Detection Systems (IDS): Detect anomalies in network traffic using ML-
based classification.
Malware Detection: Analyze behavior patterns to identify malicious software.
Spam and Phishing Detection: Classify emails/messages to filter out malicious
content.
5. Computer Networks
Traffic Prediction and Routing: ML models predict congestion and optimize routing
in large networks.
QoS Optimization: Enhance quality of service by learning optimal parameter
configurations.
Fault Detection: Identify faults in network devices through pattern analysis.
1
Energy Management: Optimize energy consumption patterns in smart homes and
grids.
Summary Table:
Domain Application
AI & Robotics Vision, decision-making, automation
Software Engineering Code generation, testing, bug prediction
Cybersecurity Threat detection, anomaly identification
Data Science Forecasting, clustering, recommendations
Networks Routing, traffic prediction, fault detection
IoT Smart devices, energy saving, anomaly detection
Cloud/Edge Load balancing, latency optimization
Healthcare Diagnostics, treatment recommendation
Signal Processing Voice/image recognition, enhancement
Automation Smart city, industrial systems
2
📊 Table: ML Algorithms and Their Applications in Computer Science and
Engineering
Application Area ML Task Type Specific ML Purpose / Use
Algorithms Used Case
Network Intrusion Classification SVM, Decision Tree, Detect malicious or
Detection Random Forest, Naive anomalous network
Bayes, k-NN behavior
Traffic Prediction Regression / Time Linear Regression, Forecast network
Series Forecasting LSTM, ARIMA, congestion or traffic
Prophet volume
Smart Routing Reinforcement Q-Learning, Deep Q- Learn optimal
(SDN) Learning Networks (DQN), routing paths to
PPO reduce latency
Malware Detection Classification Logistic Regression, Classify network
Gradient Boosting packets as benign
(XGBoost, or malicious
LightGBM), CNNs
Network Anomaly Unsupervised K-Means, DBSCAN, Detect abnormal
Detection Learning Isolation Forest, patterns without
Autoencoders labeled data
QoS/QoE Reinforcement / Deep RL, Multi- Optimize service
Optimization Regression Armed Bandits, quality and user
Support Vector experience
Regression
Wireless Network Supervised Decision Tree, Predict best handoff
Handoff Learning Random Forest, point in mobile
Neural Networks networks
Software Bug Classification SVM, Random Forest, Predict fault-prone
Prediction AdaBoost software modules
Code Sequence RNN, Transformer Suggest code
Recommendation / Modeling / NLP (e.g., BERT, GPT), snippets or auto-
Completion LSTM complete code
Automated Testing Classification / Decision Trees, Classify test cases
Clustering Clustering (K-Means) or cluster test
behaviors
Image Recognition Classification CNNs (e.g., ResNet, Detect objects,
(Computer Vision) VGG), YOLO, Faster faces, or patterns in
R-CNN images
Speech Recognition Sequence-to- RNNs, LSTM, Convert spoken
Sequence Attention, words to text
Transformers
Natural Language Classification / Naive Bayes, BERT, Sentiment analysis,
Processing (NLP) Generation GPT, RoBERTa chatbots,
summarization
Cloud Resource Regression / RL XGBoost, DQN, Predict load,
Allocation Bayesian allocate
Optimization CPU/memory
dynamically
IoT Anomaly Clustering / One-Class SVM, Detect abnormal
3
Detection Classification DBSCAN, Random sensor behavior
Forest
Recommendation Collaborative ALS, SVD++, KNN, Suggest movies,
Systems Filtering / Matrix Neural Collaborative products, or
Factorization Filtering services
Medical Diagnosis Classification / CNN, SVM, Logistic Classify diseases
(CAD) Detection Regression from CT, MRI, or
X-ray images
Autonomous Computer Vision + CNN, RNN, DQN, Detect lanes, signs;
Vehicles RL A3C make driving
decisions
Face Recognition Classification / CNN, Triplet Loss Identify or verify
Embedding Networks, FaceNet human faces
Data Clustering Unsupervised K-Means, GMM, Group similar data
(Unlabeled Data) Learning Spectral Clustering, points without
Hierarchical labels
Clustering
🔍 Legend / Notes:
Classification: Predict categories or classes (e.g., spam vs. non-spam).
Regression: Predict continuous values (e.g., network load).
Unsupervised Learning: Discover hidden patterns without labeled data.
Reinforcement Learning (RL): Learn optimal actions via rewards (e.g., in routing or
robotics).
Sequence Models: Handle data with temporal or ordered structure (e.g., speech, text).
4
1. To learn the concept of how to learn patterns and concepts from data without being
explicitly programmed
This objective introduces the core philosophy of Machine Learning: enabling systems
to automatically learn and improve from experience (data) without relying on hard-coded
rules.
Key Topics Covered:
o Difference between traditional programming (explicit instructions)
and ML (data-driven learning).
o Understanding training data, features, and generalization.
o Introduction to model training, inference, and evaluation.
o Role of loss functions, optimization (e.g., gradient descent), and
overfitting/underfitting.
o Examples: A spam classifier learning from email data, a recommendation
system adapting to user preferences.
2. To design and analyze various machine learning algorithms and techniques with a
modern outlook focusing on recent advances
This objective emphasizes algorithmic understanding and cutting-edge developments in
ML.
Key Topics Covered:
o Classical ML Algorithms:
Linear/Logistic Regression, Decision Trees, SVM, k-NN, Naïve
Bayes.
o Ensemble & Advanced Methods:
Random Forests, Gradient Boosting (XGBoost, LightGBM).
o Modern Techniques:
Explainable AI (XAI), Federated Learning, Reinforcement Learning
(RL), GANs, and Transformer-based models.
o Analysis:
Time complexity, bias-variance tradeoff, scalability, and
interpretability.
o Tools:
Python (scikit-learn, TensorFlow/PyTorch), AutoML, and cloud-based
ML services.
3. Explore supervised and unsupervised learning paradigms of machine learning
This objective differentiates between two fundamental learning approaches in ML.
Supervised Learning (Labeled Data):
o Regression: Predicting continuous values (e.g., house prices).
o Classification: Categorizing data (e.g., spam detection).
o Algorithms: Linear Regression, SVM, Neural Networks.
Unsupervised Learning (Unlabeled Data):
o Clustering: Grouping similar data (e.g., customer segmentation via k-Means).
o Dimensionality Reduction: PCA, t-SNE for visualization.
o Association Rules: Market basket analysis (e.g., Apriori algorithm).
Semi-supervised & Self-supervised Learning:
o Combining labeled and unlabeled data (e.g., contrastive learning in DL).
5
Key Topics Covered:
o Neural Network Basics:
Perceptrons, activation functions (ReLU, Sigmoid), backpropagation.
o Deep Architectures:
CNNs (for images), RNNs/LSTMs (for sequences), Transformers (for
NLP).
o Feature Extraction Strategies:
Manual vs. automated feature engineering.
Transfer Learning (e.g., using pre-trained models like ResNet, BERT).
Embeddings (Word2Vec, GloVe) and attention mechanisms.
o Applications:
Computer Vision (object detection), NLP (chatbots), and generative
models (DALL-E).
Connecting the Objectives
The course progresses from fundamentals to advanced topics:
1. Starts with how ML differs from traditional programming.
2. Expands to algorithm design and modern trends (e.g., federated learning).
3. Differentiates supervised vs. unsupervised learning.
4. Culminates in Deep Learning, emphasizing automated feature learning.
This structure ensures a holistic understanding of ML, equipping learners to tackle real-
world problems using state-of-the-art techniques.
What is Machine Learning?
Machine Learning (ML) is a branch of Artificial Intelligence (AI) that enables computers
to learn from data and improve their performance on a task without being explicitly
programmed. Instead of following rigid rules, ML models identify patterns in data and
make predictions or decisions based on those patterns.
Key Characteristics of ML:
Data-Driven: Relies on large datasets for training.
Adaptive: Improves over time with more data.
Automated: Reduces manual intervention in decision-making.
6
ML can be broadly classified into three main types, with some advanced extensions:
1. Supervised Learning (Learning from Labeled Data)
The model is trained on input-output pairs (labeled data).
Used for prediction & classification.
Examples:
o Email Spam Filtering (Input: Email text → Output: "Spam" or "Not Spam").
o House Price Prediction (Input: Size, Location → Output: Price).
o Medical Diagnosis (Input: Patient symptoms → Output: Disease).
Algorithms:
Linear Regression, Logistic Regression
Decision Trees, Random Forest
Support Vector Machines (SVM)
Neural Networks
7
o GPT-3 predicts the next word in a sentence (no human labeling needed).
Conclusion
Machine Learning enables computers to learn from data and make intelligent decisions.
Depending on the problem, we use:
Supervised Learning (when we have labeled data).
Unsupervised Learning (when we need to find hidden patterns).
Reinforcement Learning (for decision-making in dynamic environments).
Each type has real-world applications, from recommendation systems to autonomous
robots.
Types of Supervised learning:
1. Basic Methods
2. Linear Models
3. Support Vector Machine, Non-Linear and Kernel Methods
4. Beyond binary Classification: Multiclass/Structured Output, Ranking.
Types of Unsupervised Learning:
1. Clustering: K-means/Kernel K-means
2. Dimensionality Reduction: PCA and kernel PCA
3. Matrix Factorization and Matrix Completion
4. Generative Models (mixture models and latent factor models)
Supervised Learning
Basic Methods- Elementary methods used in machine learning.
A) Distance-Based Methods in Supervised Learning
Distance-based methods are a fundamental class of supervised learning algorithms that
make predictions based on the similarity (or distance) between data points. These methods
assume that similar inputs yield similar outputs. They are widely used
for classification and regression tasks.
o Manhattan Distance:
8
o Cosine Similarity (for text/data with high dimensions).
How it Works:
o For a new data point, compute its distance to all training points.
o Select the k nearest neighbors.
o For classification, take a majority vote of their labels.
o For regression, take the average of their values.
Example:
o Classification: Predicting if a tumor is malignant (k=3 neighbors vote).
o Regression: Predicting house prices based on similar nearby houses.
Pros & Cons:
o ✅ Simple, no training phase (lazy learner).
o ❌ Computationally expensive for large datasets (needs to store all training
data).
Numerical:
9
2. Radial Basis Function (RBF) Networks / Kernel Methods
Concept: Uses distance-based similarity functions (kernels) to transform data into
a higher-dimensional space where it becomes linearly separable.
Common Kernel:
o Gaussian (RBF) Kernel:
How it Works:
o Each training point acts as a "center" in the kernel.
o New points are classified based on their weighted distance to these centers.
Example:
o Support Vector Machines (SVM) with RBF kernel for non-linear
classification.
Pros & Cons:
o ✅ Works well for non-linear data.
o ❌ Choosing the right kernel & parameters is tricky.
Numerical
10
11
3. Case-Based Reasoning (CBR)
Concept: Solves new problems by retrieving and adapting solutions from similar past
cases.
How it Works:
o Stores past cases (training data).
o For a new problem, retrieves the most similar case and adapts its solution.
Example:
o Medical diagnosis based on similar past patient records.
12
Numeral no 2 for CBR
13
Step 2: Reuse (Adapt Solution)
Closest Match: Case 2 (Flu) is identical to the new case.
Adaptation Rule (Domain Knowledge):
o If Fatigue=Yes and Fever=High, prioritize Flu over Common Cold.
Proposed Diagnosis: Flu.
Key Takeaways
1. Retrieve: Found similar cases using distance metrics.
2. Reuse: Proposed "Flu" based on exact match.
3. Revise: Adjusted to "COVID-19" after expert input.
4. Retain: Added the new case to improve future diagnoses.
Comparison with Weighted k-NN
CBR: Uses domain knowledge (e.g., travel history) for adaptation.
k-NN: Only uses mathematical similarity (no manual revision).
14
4. Locally Weighted Regression (LWR)
Concept: Fits a local regression model around each query point, giving more weight
to nearby points.
How it Works:
o For a new point, compute weights for all training points based on distance.
o Fit a weighted regression model (e.g., linear regression) to predict the output.
Example:
o Predicting stock prices where recent data is more relevant than older data.
15
16
When to Use Distance-Based Methods?
✔ Small to Medium Datasets (k-NN is slow for big data).
✔ Non-Parametric Data (no strict assumptions about data distribution).
✔ Interpretability Matters (k-NN is easy to explain).
❌ Avoid When:
Data is high-dimensional (curse of dimensionality).
Computational efficiency is critical.
Summary Table
Method Task Key Idea Example Use Case
k-NN Classification/Regression Majority vote of Spam detection,
nearest neighbors House price
prediction
RBF Networks Classification Kernel-based Handwritten digit
similarity recognition (SVM-
RBF)
Case-Based Problem-solving Retrieves similar Medical diagnosis
Reasoning past cases
Locally Regression Weighted Stock price
Weighted regression near forecasting
Regression query point
Conclusion
Distance-based methods are simple yet powerful for supervised learning, relying on the idea
that "similar things behave similarly."
k-NN is the most popular for quick prototyping.
RBF Kernels help in non-linear problems.
LWR & CBR are useful for specialized cases.
B) Nearest Neighbor-Based Methods in Supervised Learning
17
Nearest neighbor (NN) methods are instance-based (or lazy learning) algorithms that make
predictions by finding the most similar training examples (neighbors) to a new data point.
These methods rely on distance metrics to determine similarity and are widely used
for classification and regression tasks.
3. Weighted k-NN
Concept: Assigns higher importance to closer neighbors (inverse distance
weighting).
How it Works:
o Neighbors contribute to prediction based on 1/distance.
o Closer points have more influence than distant ones.
Example:
o Recommending movies (closer user preferences matter more).
Pros & Cons:
o ✅ More accurate than standard k-NN.
o ❌ Sensitive to noise in nearby points.
18
1. For a new point, compute weights for all training points (e.g., Gaussian
kernel).
2. Fit a weighted linear regression to predict the output.
Example:
o Predicting stock prices where recent trends matter more.
Pros & Cons:
o ✅ Flexible for non-linear data.
o ❌ Computationally intensive.
Minkowski
( (∑ x − y ) )
i
p 1/ p
i
Customizable (p=1: Manhattan, p=2:
Euclidean)
2. Random Forest
Concept: An ensemble of decision trees trained on random subsets of data/features
(bagging).
How It Works:
1. Build multiple trees on bootstrapped samples.
2. For prediction:
19
Classification: Majority voting across trees.
Regression: Averaging tree outputs.
Example:
o Detecting diseases from medical records (improves accuracy over a single
tree).
Pros & Cons:
o ✅ Reduces overfitting, handles noisy data.
o ❌ Less interpretable than single trees.
Summary Table
Method Task Key Idea Example Use Case
Decision Classification/Regression Single tree with Loan approval, Medical
Tree splits diagnosis
Random Classification/Regression Ensemble of Fraud detection, Stock
20
Forest decorrelated trees prediction
Gradient Classification/Regression Sequentially Search ranking, Click-
Boosting corrects errors through rate prediction
Extra Trees Classification/Regression Random splits for Real-time anomaly
speed detection
Conclusion
Decision tree-based methods are versatile and powerful for supervised learning:
Single trees are simple and interpretable.
Random Forest improves accuracy via ensemble learning.
Gradient Boosting achieves state-of-the-art results (used in competitions like
Kaggle).
For large datasets, use LightGBM/CatBoost (optimized for speed). For interpretability,
visualize trees with sklearn.tree.plot_tree.
D) Naive Bayes-Based Methods in Supervised Learning
Naive Bayes is a family of probabilistic classifiers based on Bayes' Theorem, with a
"naive" assumption of feature independence. Despite its simplicity, it performs well in text
classification, spam filtering, and medical diagnosis.
21
Use Case: Text classification with skewed classes.
Example:
o Identifying rare diseases from clinical reports.
Pros & Cons:
o ✅ Better for imbalanced data than standard Multinomial NB.
o ❌ Still assumes feature independence.
2. "Naive" Assumption:
o Features are conditionally independent given the class:
3. Prediction:
o For a new sample, compute the posterior for each class and pick the highest
probability.
22
Download
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
# Predict
print(clf.predict(vectorizer.transform(["free cash"]))) # Output: "spam"
Conclusion
Naive Bayes is a simple, efficient, and interpretable classifier, especially for:
Text/data with categorical features.
Scenarios where speed matters (e.g., real-time spam filtering).
For better accuracy with correlated features, consider Logistic Regression or Tree-Based
Models.
23
ty (depends on k). (depends on rules). (probabilist
radius). ic rules).
Handles Non- Yes (flexible Yes (local Yes (non-linear splits). No (linear
Linearity boundaries). approximations decision
). boundaries
).
Feature Required (dist Required (dist Not required. Not
Scaling ance-sensitive). ance-sensitive). required
(except
Gaussian
NB).
Hyperparam k (number of Radius r, max_depth, min_sample Smoothing
eters neighbors), distance metric. s_split. parameter
distance metric. (e.g., alpha
).
Use Cases Image Anomaly Medical diagnosis, Text
recognition, detection, credit scoring. classificati
recommendatio density-based on, spam
n systems. clustering. filtering.
Pros Simple, no Adapts to local Interpretable, handles Fast, works
training phase. data density. mixed data types. well with
small data.
Cons Slow for large Struggles with Prone to overfitting Strong
datasets, curse varying (needs pruning). independen
of densities. ce
dimensionality. assumption
.
Key Takeaways
1. Distance/Neighbor Methods (k-NN, Radius NN):
o Best for small datasets where interpretability isn’t critical.
o Suffer from the curse of dimensionality.
2. Decision Trees:
o Ideal for structured data with non-linear relationships.
o Basis for ensemble methods (Random Forest, XGBoost).
3. Naive Bayes:
o Fast and efficient for text/categorical data.
o Struggles with feature correlations.
For high accuracy, ensemble methods (e.g., Random Forest) often outperform these basic
methods. For speed, Naive Bayes is unbeatable.
Linear Models: Linear models are a fundamental class of supervised learning algorithms that
assume a linear relationship between the input features (independent variables) and the output
(dependent variable). They are widely used for both regression (predicting continuous
values) and classification (predicting discrete labels) tasks.
Key Characteristics of Linear Models:
1. Linear Relationship: The model assumes the output is a weighted sum of input
features.
2. Simplicity & Interpretability: Easy to understand and interpret coefficients.
3. Efficiency: Fast training and prediction, suitable for large datasets.
24
4. Regularization: Can be extended with L1/L2 regularization to prevent overfitting.
25
If β0=30,000β0=30,000, β1=5,000β1=5,000, β2=10,000β2=10,000:
o A person with 3 years of experience and a master's degree (Education=2):
30,000+5,000×3+10,000×2=$65,000
# Model coefficients
print(f"Intercept (β₀): {model.intercept_:.2f}")
print(f"Coefficient (β₁): {model.coef_[0]:.2f}")
Output:
Copy
Download
Predicted price for 1800 sq. ft: $410,000.00
Intercept (β₀): 50000.00
Coefficient (β₁): 200.00
26
7. Advantages & Limitations
✅ Advantages
Simple & interpretable.
Fast training and prediction.
Works well when relationships are linear.
❌ Limitations
Poor performance on nonlinear data.
Sensitive to outliers.
Assumes no multicollinearity.
Final Summary
Linear regression is a fundamental ML algorithm for predicting continuous values. It works
best when:
The data follows a linear trend.
Features are not highly correlated.
You need a simple, explainable model.
B) Logistic Regression : Logistic Regression in Linear Models (with Examples &
Explanation)
Logistic regression is a supervised learning algorithm used for binary
classification (though extendable to multi-class). Unlike linear regression, which predicts
continuous values, logistic regression predicts probabilities (between 0 and 1) using
a logistic (sigmoid) function.
27
Despite the name, it’s a classification (not regression) method.
Uses log-odds (logit) for linear modeling:
Optimization: Maximizes log-likelihood (or minimizes log loss) using gradient descent.
3. Decision Boundary: Default threshold = 0.5 (adjustable for precision/recall trade-
offs).
6. Assumptions
✅ Binary/multinomial target.
✅ Linear relationship between log-odds and features.
✅ No multicollinearity.
✅ Large sample size (for stable estimates).
28
import numpy as np
# Predict probabilities
print("Probability of passing exam (Score=65):", model.predict_proba([[65]])[:, 1])
Key Takeaways
1. Logistic regression predicts probabilities for classification.
2. Uses sigmoid to map linear outputs to [0, 1].
3. Default threshold = 0.5 for class decision.
4. Interpret coefficients as log-odds impact.
29
modeling different types of dependent variables using link functions and exponential family
distributions.
P is mean probability
30
Visits are Mean Visits
31
5. Advantages & Limitations
✅ Advantages
Handles non-normal data (binary, counts, etc.).
Flexible via link functions.
Unifies many models under one framework.
❌ Limitations
Requires correct distribution specification.
Less efficient than specialized models (e.g., random forests for nonlinearity).
7. Extensions
Generalized Additive Models (GAMs): Nonlinear extensions of GLMs.
Mixed Effects GLMs: For clustered/hierarchical data.
Summary
GLMs generalize linear regression to binary, count, and skewed data using:
Exponential family distributions (Gaussian, Binomial, Poisson).
Link functions (identity, logit, log).
tical ε + ...
Form
Optimizat Least Squares (OLS) Maximum Maximum Likelihood (MLE)
ion Likelihood (MLE)
32
Assumpti Linearity, Linearity in log- Correct distribution, Linearity
ons Homoscedasticity, odds, No in link scale
Normality of multicollinearity
residuals
Regulariz Ridge/Lasso L1/L2 (LogisticR in Supported (e.g., GLM with
ation sklearn) elastic net)
Python sklearn.linear_model sklearn.linear_model statsmodels.GLM, scikit-learn
Libraries .LinearRegression .LogisticRegression (limited)
Key Takeaways:
1. Linear Regression assumes Gaussian errors and predicts continuous values.
2. Logistic Regression is a special case of GLM for binary classification (logit link +
Binomial distribution).
3. GLMs generalize both:
o Can model counts (Poisson), binary (Logistic), continuous (Gaussian), etc.
o Use link functions to map linear predictors to the response scale.
Support Vector Machines, Nonlinearity and Kernel Methods
33