0% found this document useful (1 vote)

142 views46 pages

ANN Notes Updated

The document explains the backpropagation algorithm used in multilayer perceptrons, detailing its steps including forward propagation, error calculation, backward propagation, and weight updates. It discusses the advantages and limitations of backpropagation, such as its ability to approximate functions and issues like vanishing gradients. Additionally, it covers concepts of generalization, cross-validation, and the structure of supervised learning.

Uploaded by

K Sneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

142 views46 pages

ANN Notes Updated

Uploaded by

K Sneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

UNIT III

1. Describe the backpropagation algorithm used in multilayer

perceptrons.
Ans: Backpropagation (Backward Propagation of Errors) is a supervised
learning algorithm used to train Multilayer Perceptrons (MLPs). It adjusts
the weights of the network by propagating the error backward from the
output layer to the input layer, minimizing the difference between predicted
and actual outputs using gradient descent.
Process of the Algorithm
1. Forward Propagation:
- The input data is passed through the network to compute the predicted
output.
- Each neuron applies an activation function (e.g., Sigmoid, ReLU) to its
weighted sum of inputs.

2. Error Calculation:
- The difference between the predicted output and the actual target is
computed using a loss function (e.g., Mean Squared Error).

3. Backward Propagation:
- The error is propagated backward through the network.
- The gradient of the loss with respect to each weight is computed using the
chain rule from calculus.

4. Weight Update:
- The weights are adjusted in the opposite direction of the gradient to
minimize the error.
- The learning rate (η) controls the step size of the weight updates.

Step-by-Step Explanation
Step 1: Forward Pass (Compute Output)
1. Input Layer: Pass input features to the first hidden layer.
2. Hidden Layers: Each neuron calculates:

3. Output Layer: Compute the final prediction.

Step 2: Compute Loss (Error)

Step 3: Backward Pass (Update Weights)

Step 4: Repeat Until Convergence

- Iterate over training data for multiple epochs until error is minimized.

Real-Time Example: Predicting House Prices

Problem:
- Input Features: Size (sq. ft.), Number of Bedrooms.
- Target: House Price ($).
Neural Network Structure:
- Input Layer: 2 neurons (Size, Bedrooms).
- Hidden Layer: 3 neurons (Sigmoid activation).
- Output Layer: 1 neuron (Predicted Price).
Training Steps:
1. Forward Pass:
- Input: [1500 sq. ft., 3 bedrooms] → Hidden Layer → Output → Predicted
Price = $300,000.
2. Error Calculation:
- Actual Price = $320,000 → Error = $(320K - 300K)^2 = 400,000,000.
3. Backward Pass:
- Compute gradients and adjust weights to reduce error.
4. Next Iteration:
- New prediction → $310,000 → Closer to target!

Result:
- After multiple iterations, the network learns to predict prices accurately.

2. Explain the backpropagation algorithm in detail with diagram.

Ans: Backpropagation is a supervised learning algorithm used to train neural
networks by minimizing the error between predicted and actual outputs. It
works by propagating errors backward through the network and adjusting
weights using gradient descent.
Key Steps of Backpropagation
1. Forward Pass:
o Input is passed through the network.
o Each layer computes its output using weights, biases, and
activation functions.
o Final output is compared with the true label to compute loss (e.g.,
Mean Squared Error or Cross-Entropy Loss).
2. Backward Pass (Error Propagation):
o The gradient of the loss w.r.t. each weight is computed using
the chain rule.
o Errors are propagated backward from the output layer to hidden
layers.
o Weights and biases are updated to minimize loss.
3. Weight Update:

Backpropagation Diagram
Below is a step-by-step diagram of backpropagation in a simple neural network
with one hidden layer:
Neural Network Structure:
 Input Layer (X₁, X₂)
 Hidden Layer (H₁, H₂) (with weights W1, W2 and activation function,
e.g., Sigmoid)
 Output Layer (Ŷ) (with weights W3, W4 and loss computation)

X₁ X₂ (Input )
\ /
H₁ H₂ (Hidden)
\ /
Ŷ (Output)

Step-by-Step Explanation of Diagram:

1. Forward Pass (Blue Arrows):
o Inputs X1,X2 are fed into the hidden layer.
o Hidden layer computes weighted sum and applies activation:

2. Backward Pass (Red Arrows):

3. Weight Update:
3. Describe the differentiation and error minimization process in BP
Ans: Backpropagation minimizes error by computing gradients (partial
derivatives) of the loss function with respect to each weight and updating
them using gradient descent.

Step-by-Step Example (Single Neuron)

Neural Network Setup
 Input: X=2X=2
 Weight: W=0.5W=0.5 (initial)
 Bias: b=1b=1
 Activation: Sigmoid (σ(z)=11+e−zσ(z)=1+e−z1)
 True Output: Y=1Y=1
 Loss: Mean Squared Error L=12(Y−Y^)2L=21(Y−Y^)2
4. Discuss virtues and limitations of backpropagation learning.
Ans: Virtues (Advantages) of Backpropagation Learning
1. Universal Approximation
o With enough hidden neurons, a neural network trained via
backpropagation can approximate any continuous
function (Universal Approximation Theorem).
2. Efficient Gradient Computation
o Uses the chain rule to compute gradients systematically, avoiding
manual differentiation.
3. Works with Deep Networks
o Extendable to deep neural networks (DNNs) with multiple hidden
layers (basis of modern deep learning).
4. Handles Large Datasets
o Compatible with stochastic gradient descent (SGD), making it
scalable to big data.
5. Flexible with Activation Functions
o Works with ReLU, Sigmoid, Tanh, Softmax, etc., allowing
different learning behaviors.
6. Automated Feature Learning
o Unlike traditional ML, it automatically learns features from raw
data without manual engineering.

Limitations (Disadvantages) of Backpropagation Learning

1. Vanishing/Exploding Gradients
o In deep networks, gradients can become too small
(vanishing) or too large (exploding), slowing or destabilizing
training.
2. Local Minima & Saddle Points
o Gradient descent may get stuck in suboptimal solutions instead of
the global minimum.
3. Computationally Expensive
o Requires many iterations (epochs) and large compute power for
deep networks.
4. Sensitive to Initial Weights
o Poor weight initialization can lead to slow convergence or bad
performance.
5. Overfitting Risk
o Without regularization (e.g., dropout, L2), networks
may memorize training data instead of generalizing.
6. Requires Labeled Data
o Only works in supervised learning; unsupervised variants (e.g.,
autoencoders) need modifications.
7. Black Box Nature
o Hard to interpret why a neural network makes certain decisions
(explainability challenge).

Mitigation Strategies for Limitations

1. Vanishing Gradients → Use ReLU, BatchNorm, Residual Connections
(ResNet).
2. Local Minima → Use momentum (SGD with momentum, Adam).
3. Overfitting → Apply Dropout, L2 Regularization, Early Stopping.
4. Compute Cost → Use GPUs, Mini-batch SGD, Distributed Training.
5. Initialization Sensitivity → He/Xavier Initialization.
Backpropagation is powerful but not perfect—it drives modern deep learning
but requires careful tuning. Advances like BatchNorm, Residual Networks,
and Adaptive Optimizers help overcome its limitations.

5. Explain convergence and error handling in backpropagation.

ANS: Convergence in backpropagation refers to the process where a
neural network’s weights are iteratively adjusted to minimize the error (loss)
between predicted and actual outputs. The goal is to reach a stable state
where further training does not significantly reduce the error.
Key Factors Affecting Convergence
1. Learning Rate (η)
o Too high → Overshooting (loss oscillates).
o Too low → Slow convergence.
2. Weight Initialization
o Poor initialization (e.g., all zeros) → Stuck training.
3. Loss Function Shape
o Non-convex (many local minima) → May converge to suboptimal
solutions.
4. Optimization Algorithm
o SGD, Adam, RMSprop affect speed and stability.

Example: Simple Linear Regression with Backpropagation

Problem: Fit a line y=Wx + by = Wx + b to data points (x=1,y=2)
(x=1,y=2) and (x=2,y=3) (x=2,y=3).
Loss Curve:
│
│ ● Initial high loss
│ \
│ \
│ ● Converged (low loss)
│
└─────────> Iterations
Converged Solution:
 W≈1, b≈1 → Perfect fit (y=x+1).

Types of Convergence
1. Global Convergence
o Reaches the best possible solution (rare in deep learning due to
non-convexity).
2. Local Convergence
o Stops improving but may not be the global minimum.
3. Divergence
o Loss increases (e.g., due to too high η).

Error handling in backpropagation refers to the techniques used to manage

and correct various types of errors that occur during neural network training.
These errors can stem from computational issues, poor learning dynamics, or
generalization problems.

1. Types of Errors in Backpropagation

Error Type Description Example

Gradients become extremely small,

Vanishing Sigmoid/Tanh in
slowing/stopping learning in early
Gradients deep networks.
layers.

Exploding Gradients grow exponentially, Recurrent Networks

Gradients causing unstable updates. (RNNs).
Error Type Description Example

Model memorizes training data but High accuracy on

Overfitting
fails on test data. train, low on test.

Optimization gets stuck in Non-convex loss

Local Minima
suboptimal solutions. landscapes.

Numerical Division by zero in

Floating-point errors (e.g., NaN/∞).
Instability softmax.
6. Write a detailed note on generalization and cross-validation.
Ans:
Generalization refers to a model's ability to perform well on unseen data
(test/real-world data) after being trained on a limited dataset (training data).
A model that generalizes well avoids overfitting (memorizing training data)
and underfitting (failing to learn patterns)
Key Concepts
1. Overfitting
o High training accuracy but poor test accuracy.
o Occurs when the model is too complex (e.g., deep neural networks
with insufficient data).
o Example: A polynomial regression of degree 10 fitting noise in a
small dataset.
2. Underfitting
o Poor performance on both training and test data.
o Occurs when the model is too simple (e.g., linear regression on
non-linear data).
o Example: Using a straight line to fit a sine wave.
3. Bias-Variance Tradeoff
o High Bias (Underfitting): Oversimplified model.
o High Variance (Overfitting): Overly complex model.
o Goal: Find the "sweet spot" where both bias and variance are
minimized.
Techniques to Improve Generalization
1. Regularization (L1/L2)
o Adds penalty terms (L1: sparsity, L2: small weights) to the loss
function to prevent overfitting.
o Example: model.add(Dense(64, kernel_regularizer=l2(0.01))) in
Keras.
2. Dropout
o Randomly deactivates neurons during training to prevent co-
adaptation.
o Example: model.add(Dropout(0.5)) after a dense layer.
3. Early Stopping
o Halts training when validation error stops improving to avoid
overfitting.
o Example: EarlyStopping(monitor='val_loss', patience=5) in Keras.
4. Data Augmentation
o Artificially expands training data with transformations (e.g.,
rotations, flips).
o Example: ImageDataGenerator(rotation_range=20) for images.
5. Batch Normalization
o Normalizes layer outputs to stabilize and accelerate training.
o Example: model.add(BatchNormalization()) after a conv layer.
6. Cross-Validation
o Splits data into folds to assess model robustness before final
training.
o Example: cross_val_score(model, X, y, cv=5) in scikit-learn.
7. Ensemble Methods
o Combines predictions from multiple models (e.g., Bagging,
Boosting).
o Example: RandomForestClassifier() for reduced variance.
8. Simpler Architectures
o Reduces model complexity (fewer layers/neurons) to match data
size.
o Example: Replace a 10-layer NN with a 3-layer NN for small
datasets.
9. Transfer Learning
o Uses pre-trained models (e.g., ResNet, BERT) as feature extractors.
o Example: VGG16(weights='imagenet', include_top=False).
10.Noise Injection
o Adds slight noise to inputs/weights to improve robustness.
o Example: GaussianNoise(0.1) in Keras input layers.

Cross-validation (CV) is a technique to evaluate how well a machine learning

model generalizes to unseen data. Instead of training on the entire dataset once,
the data is split into multiple subsets (folds). The model is trained on some folds
and tested on the remaining one, repeating the process to ensure reliable
performance estimates.
Example: Training a Robot to Recognize Objects
Scenario:
You’re training a robot to recognize apples vs. oranges using a dataset of 100
images (50 apples, 50 oranges).
Step 1: Holdout Validation (Basic Approach)
 Split: 70 images (train) + 30 images (test).
 Risk: If the test set has easy samples, accuracy may be misleading.
Step 2: 5-Fold Cross-Validation (Better Approach)
1. Divide the 100 images into 5 equal folds (20 images each).
2. Train the robot on 4 folds (80 images) and test on the remaining fold (20
images).
3. Repeat 5 times, each time with a different test fold.
4. Average the results to get a robust accuracy estimate.
Example Results:

Fold Test Accuracy

1 92%

2 88%

3 90%

4 85%

5 93%

 Final Accuracy: (92+88+90+85+93)/5 = 89.6%

Use Cross-Validation:
 Prevents data leakage (using test data in training).
 Provides a robust estimate of model performance.
 Helps in hyperparameter tuning (e.g., selecting the best learning rate).

Types of Cross-Validation
1. Holdout Validation
o Simplest method: Split data into train (70%) and test (30%).
o Limitation: Performance varies based on the split.
2. k-Fold Cross-Validation
o Divides data into k equal folds.
o Trains on k-1 folds, validates on the remaining fold.
o Repeats k times and averages results.
o Best for: Small datasets.
Example (5-Fold CV):
Fold 1: [Test] | [Train, Train, Train, Train]
Fold 2: [Train] | [Test, Train, Train, Train]
...
Fold 5: [Train, Train, Train, Train] | [Test]

3. Stratified k-Fold CV
o Ensures each fold has the same class distribution as the full
dataset.
o Best for: Imbalanced datasets (e.g., fraud detection).
4. Leave-One-Out CV (LOOCV)
o Extreme case of k-Fold where k = n (sample size).
o Pros: Unbiased estimate.
o Cons: Computationally expensive.
5. Time Series CV
o Preserves temporal order (critical for forecasting).
o Example:
Train: [t1, t2, t3] → Test: [t4]
Train: [t1, t2, t3, t4] → Test: [t5]
7. Explain the structure and phases of supervised learning.
Ans: Supervised learning is a machine learning paradigm where a model
learns from labeled data (input-output pairs) to make predictions. It follows a
structured workflow with distinct phases:
1. Problem Definition
Goal: Define the task (classification, regression) and success metrics.
 Example:
o Task: Predict house prices (regression).
o Success Metric: Mean Absolute Error (MAE) < $10,000.
2. Data Collection
Goal: Gather high-quality labeled data.
 Sources: Databases, APIs, surveys, sensors.
 Example: Collect 10,000 house listings with features (size, location) and
prices.
3. Data Preprocessing
Goal: Clean and prepare data for modeling.
Key Steps:
1. Handling Missing Data:
o Drop rows or impute (mean/median).
2. Feature Engineering:
o Create new features (e.g., "price per sq. ft").
3. Encoding Categorical Variables:
o One-hot encode (e.g., "neighborhood").
4. Normalization/Scaling:
o Standardize numerical features (e.g., MinMaxScaler).
Example:
 Convert "location" (text) to numerical labels.
 Scale "square footage" to [0, 1].
4. Model Selection
 Goal: Choose an algorithm based on the problem type.

Task Algorithms

Classification Logistic Regression, Random Forest, SVM

Regression Linear Regression, Decision Trees, XGBoost

 Example: Use Random Forest for house price prediction.

5. Training

Goal: Teach the model to map inputs to outputs.

 Process:

1. Split data into training (70%) and validation (30%) sets.

2. Feed training data to the model.

3. Adjust weights to minimize loss (e.g., MSE for regression).

Example:

 Train the model on 7,000 houses to learn price patterns.

6. Evaluation

Goal: Test model performance on unseen data.

Metrics:

 Classification: Accuracy, Precision, Recall, F1-Score.

 Regression: MAE, RMSE, R².

Example:

 Validate on 3,000 houses → MAE = $8,500 (meets goal!).

7. Hyperparameter Tuning
Goal: Optimize model settings for better performance.

 Methods:

o Grid Search: Test predefined hyperparameter combinations.

o Random Search: Sample randomly from ranges.

Example:

 Tune n_estimators and max_depth in Random Forest.

8. Deployment

Goal: Integrate the model into real-world systems.

 Steps:

1. Save the trained model (e.g., .pkl file).

2. Deploy as an API (e.g., Flask, FastAPI).

3. Monitor performance in production.

Example:

 Predict prices for new listings on a real estate website.

9. Monitoring & Maintenance

Goal: Ensure model adapts to changing data.

 Actions:

o Retrain with fresh data periodically.

o Detect drift (e.g., sudden price fluctuations).

Example:

 Update model every 6 months with new sales data.

Workflow:

Problem → Data → Preprocess → Train → Evaluate → Tune → Deploy → Monitor

8. Write short notes on overfitting.

Ans: Overfitting occurs when a machine learning model memorizes the
training data (including noise and outliers) instead of learning general
patterns, leading to poor performance on new, unseen data.
Key Signs of Overfitting:
 High accuracy on training data, but low accuracy on test data.
 The model fails to generalize to real-world scenarios.

Example: A Robot Learning to Recognize Cats vs. Dogs

Scenario
 Task: Train a robot to classify images as "cat" or "dog."
 Training Data: 100 images (50 cats, 50 dogs).
 Test Data: 30 new images (unseen during training).


What Happens When the Robot Overfits?

1. Memorization Over Learning:
o The robot memorizes tiny details (e.g., a scratch on one cat’s ear, a
specific dog collar).
o Instead of learning general features (e.g., ear shape, fur texture), it
associates irrelevant details with labels.
2. Training vs. Test Performance:
o Training Accuracy: 99% (almost perfect).
o Test Accuracy: 60% (fails on new images).
3. Real-World Failure:
o If given a photo of a new cat with different lighting, the robot
misclassifies it because it relied on memorized patterns.
Why Does Overfitting Happen?
 Model Too Complex: A deep neural network with too many layers learns
noise.
 Small Dataset: Limited data makes memorization easier than
generalization.
 No Regularization: No techniques (e.g., dropout, L2) to prevent over-
reliance on training data.

How to Fix Overfitting in the Robot Example

1. Simplify the Model
o Use fewer layers in the neural network.
o Example: Switch from a 10-layer CNN to a 3-layer CNN.
2. Data Augmentation
o Artificially expand training data with rotations, flips, and
brightness changes.
o Example: Generate 500 augmented images from the original 100.
3. Regularization
o Dropout: Randomly ignore 20% of neurons during training to
prevent reliance on specific features.
o L2 Regularization: Penalize large weights in the model.
4. Cross-Validation
o Use 5-fold CV to ensure the robot performs well on all data splits.
5. Early Stopping
o Stop training when test accuracy stops improving.

Visualizing Overfitting
Training Accuracy: ●●●●●●●●●● (100%)
Test Accuracy: ●●●○○○○○○○ (60%)
 The gap between training and test performance indicates overfitting.

9. Explain the role of Hessian matrix in learning algorithms.

Ans: The Hessian matrix is a square matrix of second-order partial
derivatives of a scalar-valued function (typically the loss function in machine
learning). It plays a crucial role in optimization, curvature analysis, and
understanding the behavior of learning algorithms. Below is a structured
breakdown of its significance:

2. Key Roles in Learning Algorithms

(A) Optimization (Second-Order Methods)

(B) Analyzing Critical Points

 Eigenvalues of H reveal:
o Positive Definite: Local minimum (all eigenvalues > 0).
o Negative Definite: Local maximum (all eigenvalues < 0).
o Saddle Point: Mixed signs (common in high-dimensional spaces).

(C) Regularization and Robustness

(D) Pruning and Compression

 Optimal Brain Surgeon: Uses Hessian to prune unimportant weights
without significant loss increase.

4. Challenges and Workarounds

Challenge Solution

High computational cost Approximate HH (e.g., diagonal Hessian).

Challenge Solution

Non-convex loss Focus on local curvature (e.g., saddle point

landscapes escape).

Numerical instability Use eigenvalue clipping or damping.

10.Describe the process of network pruning.

Ans: Network pruning is a technique to reduce the size of a neural
network by removing unnecessary weights, neurons, or layers without
significantly compromising performance. It improves efficiency, reduces
computational costs, and enables deployment on edge devices (e.g.,
smartphones, IoT).
Prune Neural Networks:
 Reduces model size (storage/memory).
 Speeds up inference (fewer FLOPs).
 Maintains accuracy (or minimizes loss).
 Enables edge deployment (e.g., MobileNet, TinyML).

Step-by-Step Pruning Process

Step 1: Train a Baseline Model
 Train the original network to convergence.
 Example: Train a ResNet-50 on CIFAR-10 (baseline accuracy = 92%).
Step 2: Identify Pruning Candidates
 Criteria for pruning:
o Magnitude-based: Remove weights near zero (e.g., |weight| <
0.01).
o Gradient-based: Prune weights with low impact on loss (using
gradients/Hessian).
o Activation-based: Drop neurons with low average activation.
Step 3: Prune the Network
 Weight Pruning Example:
mask = (torch.abs(model.weight) > threshold) # Keep weights above threshold
model.weight.data *= mask.float() # Zero out others
 Neuron Pruning Example:
Remove neurons with L2-norm below a threshold in a dense layer.

Step 4: Fine-Tune the Pruned Model

 Retrain the pruned network to recover lost accuracy.
 Example: Train for 10 more epochs with 50% fewer weights.
Step 5: Iterate (Optional)
 Repeat pruning + fine-tuning cycles for aggressive compression.

4. Advanced Pruning Techniques

Technique Description

Lottery Ticket Finds sparse "winning tickets" that can train from
Hypothesis scratch.

Prunes gradually (e.g., 20% weights at a time) vs.

Iterative Pruning
one-shot.

Structured Pruning Removes entire channels/filters (hardware-friendly).

Uses RL/NAS (Neural Architecture Search) to

Automated Pruning
optimize pruning.
Types of Neural Network Pruning
Pruning techniques are categorized based on what is removed (weights,
neurons, layers) and how it is removed (structured vs. unstructured). Below are
the key types:
1. Based on Granularity
(A) Weight Pruning (Unstructured Pruning)
 What: Removes individual weights (sets them to zero).
 Pros: High compression rates.
 Cons: Requires sparse hardware support for speedup.
 Example:
# PyTorch weight pruning
mask = (torch.abs(weight_tensor) < threshold
weight_tensor[mask] = 0 # Zero out small weights

(B) Neuron/Unit Pruning (Structured Pruning)

 What: Removes entire neurons or filters (e.g., in dense/conv layers).
 Pros: Hardware-friendly (works with standard libraries).
 Cons: Less fine-grained than weight pruning.
 Example:
o Drop neurons with low activation norms in a dense layer.

(C) Channel/Filter Pruning (Structured Pruning)

 What: Removes entire channels in convolutional layers.
 Pros: Reduces FLOPs significantly.
 Example:
o Delete a 3x3 filter in a Conv2D layer if its L1-norm is small.
(D) Layer Pruning
 What: Removes entire layers (e.g., residual blocks in ResNet).
 Pros: Drastically reduces model depth.
 Cons: Risky (may break architecture).

11.Describe the supervised learning approach with examples.

Ans: Supervised Learning is a machine learning approach where the model
is trained on a labeled dataset—each input has a corresponding correct
output. The system learns to predict the output from inputs by minimizing
the difference between predicted and actual values.
Working:
1. Training Phase:
o The algorithm is given input-output pairs (e.g., features and labels).
o It learns the pattern or mapping function from inputs to outputs.
2. Testing/Prediction Phase:
o The trained model is tested on new, unseen data to predict the
output.
Examples:
1. Email Spam Detection
 Input (Features): Email content, sender, subject line
 Output (Label): Spam or Not Spam
 Goal: Classify new emails as spam or not based on past data.
2. House Price Prediction
 Input (Features): Square footage, number of bedrooms, location
 Output (Label): Price of the house
 Goal: Predict the price of a house based on its features.

Types of Supervised Learning

1. Classification
 Definition: The task of predicting a discrete label or category.
 Goal: Assign input data to one of the predefined classes.
 Examples:
o Email Spam Detection → Spam or Not Spam
o Disease Prediction → Positive or Negative
o Image Recognition → Cat, Dog, or Human
 Algorithms Used:
o Logistic Regression
o Decision Tree
o Random Forest
o Support Vector Machine (SVM)
o k-Nearest Neighbors (KNN)

2. Regression
 Definition: The task of predicting a continuous numeric value.
 Goal: Estimate the relationship between variables.
 Examples:
o House Price Prediction → ₹25,00,000
o Stock Market Forecasting → ₹1,200.75
o Temperature Prediction → 29.5°C
 Algorithms Used:
o Linear Regression
o Ridge/Lasso Regression
o Decision Tree Regressor
o Support Vector Regressor (SVR)
UNIT IV
Self-Organizing Maps (SOM)

2 Marks Questions (12)

1. What is a Self-Organizing Map?
Ans: An unsupervised neural network that projects high-dimensional data
onto a low-dimensional (usually 2D) grid, preserving topological
relationships.
Example: Visualizing customer segments in marketing data on a 2D map.
2. Define a feature map.
Ans: A transformation of input features into a structured output space where
similar inputs activate nearby units.
Example: SOM transforming pixel colours of images into a 2D colour map.
3. Mention one property of SOM.
Ans: Preserves the topology of input data in the output space.
Example: Similar words clustering together in word embedding
visualizations.
4. What is the SOM algorithm?
Ans: Iterative process where neurons compete to represent input data, and
weights are updated based on a neighbourhood function.
Example: Training a SOM to classify handwritten digits without labels.
5. Define computer simulations in neural networks.
Ans: Using software to model neural network behaviour and test learning
algorithms.
Example: Simulating SOM training on iris dataset to study clustering.
6. What is learning vector quantization?
Ans: A supervised version of SOM where prototype vectors are adjusted to
minimize classification errors.
Example: Classifying medical data (e.g., diabetic vs. non-diabetic patients).
7. Mention a use of adaptive pattern classification.
Ans: Systems that learn and adapt to recognize patterns dynamically.
Example: Spam filters improving accuracy by learning from user-marked
emails.
8. What is a feature mapping model?
Ans: A model that reduces data dimensions while retaining structural
relationships.
Example: SOM mapping gene expression data for cancer subtype analysis.
9. Define clustering in SOM.
Ans: Grouping similar data points into neighbouring neurons on the map.
Example: SOM clustering countries based on GDP, population, and literacy
rates.
10.What is neighbourhood function in SOM?
Ans: A function (e.g., Gaussian) defining how nearby neurons are updated
during training.
Example: Updating not just the "winning neuron" but also its neighbours in a
3x3 grid.
11.What does unsupervised learning mean?
Ans: Training without labelled data, relying on patterns and similarities.
Example: SOM organizing a library of unlabelled images by visual
similarity.
12.What is dimensionality reduction?
Ans: Reducing the number of input variables while preserving key
information.
Example: SOM compressing 30D sensor data into a 2D map for fault
detection.
5 Marks Questions (6)
1. Explain the two basic feature mapping models.
Ans: (A) Competitive Learning Model
Definition & Mechanism
Competitive learning is an unsupervised neural network model where
neurons compete to respond to input patterns. The key principle is "winner-
takes-all", meaning only the Best Matching Unit (BMU) updates its weights.
Steps:
1. Input Presentation: A high-dimensional input vector is fed into the
network.
2. Similarity Calculation: Each neuron computes its distance (e.g.,
Euclidean) from the input.
3. Competition: The neuron with the smallest distance wins and becomes
the BMU.
4. Weight Update: Only the BMU adjusts its weights to better match the
input.
5. Iteration: The process repeats until weights stabilize.
Example: Handwritten Digit Recognition
 Input: Pixel values of handwritten digits (0-9).
 Competition: Neurons compete to represent digits.
 Result: Each neuron specializes in recognizing a specific digit.
Limitations
 Only the winning neuron learns, ignoring neighbors.
 May lead to "dead neurons" (neurons that never win).

(B) Self-Organizing Map (SOM) Model

Definition & Mechanism
SOM is an unsupervised neural network that projects high-dimensional data
into a low-dimensional (usually 2D) grid while preserving topological
relationships. Unlike competitive learning, SOM updates not just the BMU
but also its neighbors.
Steps:
1. Initialization: Assign random weights to neurons in a 2D grid.
2. Competition: For each input, find the BMU (closest neuron).
3. Cooperation: Identify neighboring neurons using a neighborhood
function (e.g., Gaussian).
4. Adaptation: Adjust weights of BMU and neighbors to resemble the input.
5. Decay: Reduce learning rate and neighborhood size over time.
Example: Customer Segmentation
 Input: Customer data (age, income, purchase history).
 SOM Output: A 2D map where similar customers cluster together.
 Business Use: Targeted marketing strategies based on clusters.
Advantages Over Competitive Learning
 Preserves topology (similar inputs stay close).
 Avoids dead neurons by updating neighbors.
2. Describe the architecture of a self-organizing map.
Ans: 1. Input Layer
 Receives high-dimensional data (e.g., feature vectors).
 Example: A dataset with features like temperature, humidity, pressure.
2. Competitive Layer (Output Grid)
 A 2D grid of neurons (e.g., 5×5, 10×10).
 Each neuron has a weight vector of the same dimension as input.
3. Neighborhood Function
 Determines how neighboring neurons are updated.
 Common functions:
o Gaussian: Smooth updates around BMU.
o Mexican Hat: Excites nearby neurons but inhibits distant ones.
4. Learning Rate (α)
 Controls how much weights are adjusted.
 Starts high (~0.8) and decays over time (e.g., α(t) = α₀ e^(-kt)).
5. Training Process
1. Initialization: Random weights.
2. Competition: BMU selection.
3. Cooperation: Neighborhood identification.
4. Adaptation: Weight updates.
5. Convergence: Weights stabilize.
Example: Image Color Clustering
 Input: RGB pixel values.
 Output: A 2D color map grouping similar shades.
3. Explain the SOM algorithm with an example.
Ans: Step-by-Step SOM Algorithm
1. Initialize Weights: Randomly assign weight vectors to neurons.
2. Select Input Vector: Pick a data sample (e.g., customer features).
3. Find BMU: Calculate distances (Euclidean) and select the closest neuron.
4. Determine Neighborhood: Use a Gaussian function to define nearby
neurons.
5. Update Weights: Adjust BMU and neighbors using:

1. where:
o h(t) = neighborhood function
o α(t) = learning rate
2. Repeat: Until convergence (no significant weight changes).
Example: Animal Classification
 Input: Features (size, diet, habitat).
 SOM Output:
o Cluster 1: Small herbivores (rabbits, squirrels).
o Cluster 2: Large carnivores (lions, tigers).
4. Write short notes on properties of feature maps.
Ans: (1) Topology Preservation
 Similar inputs map to nearby neurons.
 Example: In a SOM for text data, "cat" and "dog" appear close.
(2) Dimensionality Reduction
 Projects high-D data into 2D/3D for visualization.
 Example: Reducing 30D financial data to a 2D risk map.
(3) Unsupervised Learning
 No labeled data required.
 Example: Clustering news articles by topic.
(4) Neighborhood Influence
 Nearby neurons update together, ensuring smooth transitions.
 Example: In weather data, "hot" and "warm" regions are adjacent.
(5) Convergence
 Weights stabilize after sufficient iterations.
 Example: SOM for fraud detection stops updating after learning patterns.
5. Discuss the process of learning vector quantization.
Ans: Key Differences from SOM
 Supervised (uses labeled data).
 No neighborhood function (only BMU updates).
Steps in LVQ
1. Initialize Prototypes: Assign random weight vectors per class.
2. Input Presentation: Feed labeled training data.
3. Find BMU: Select the closest prototype.
4. Update Rule:
o If BMU’s class = input class:

5. Iterate Until Convergence.

Example: Medical Diagnosis
 Input: Patient symptoms (fever, cough).
 Prototypes: "Flu" vs. "Cold" classifiers.
 Result: New patients are classified based on learned prototypes.

6. Describe adaptive pattern classification.

Ans: A system that dynamically updates its classification rules based on new
data.
Phases
1. Training: Learns initial patterns (e.g., SOM/LVQ).
2. Adaptation: Adjusts weights when new data arrives.
3. Classification: Assigns labels to new inputs.
Example: Spam Filter
 Initial Training: Learns from labeled emails (spam/ham).
 Adaptation: Updates when users mark new emails.
 Result: Improves accuracy over time.

10 Marks Questions (6)

1. Explain the working and applications of SOM in detail.
Ans: Working of Self-Organizing Maps (SOM)
Self-Organizing Maps (SOM), also known as Kohonen Networks, are a type
of artificial neural network that uses unsupervised learning to produce a
low-dimensional (typically 2D) representation of input space while
preserving the topological properties of the input data.
Key Components of SOM:
1. Input Layer: Receives high-dimensional data (e.g., feature vectors).
2. Output Layer (Competitive Layer): A 2D grid of neurons, each with a
weight vector of the same dimension as the input.
3. Neighborhood Function: Defines how neighboring neurons are updated
based on the Best Matching Unit (BMU).
4. Learning Rate: Controls the magnitude of weight updates.
Working Mechanism:
1. Initialization: Assign random weight vectors to each neuron in the 2D
grid.
2. Competition: For each input vector, the neuron whose weight vector is
closest (e.g., Euclidean distance) is declared the BMU.
3. Cooperation: The BMU and its neighboring neurons (determined by the
neighborhood function) adjust their weights to become more similar to
the input vector.
4. Adaptation: Weights are updated using the formula:

where:
o α(t)α(t) = learning rate (decreases over time)
o h(t)h(t) = neighborhood function (e.g., Gaussian)
5. Iteration: Repeat until convergence (weights stabilize).
Applications of SOM:
1. Data Visualization: Reducing high-dimensional data to 2D for easier
interpretation (e.g., gene expression data).
2. Clustering: Grouping similar data points (e.g., customer segmentation in
marketing).
3. Image Processing: Color quantization, feature extraction.
4. Anomaly Detection: Identifying outliers in datasets (e.g., fraud
detection).
5. Speech Recognition: Mapping phonetic features to a 2D grid.

2. Discuss the SOM algorithm and its steps with example.

Ans: SOM Algorithm Steps:
1. Initialization:
o Assign random weight vectors to each neuron in the 2D grid.
o Example: For a 5×5 SOM with 3D input data, each neuron has a
random 3D weight vector.
2. Input Presentation:
o Feed an input vector XX (e.g., a customer’s age, income, spending
score).
3. Competition (Finding BMU):
o Calculate the Euclidean distance between XX and each neuron’s
weight vector.
o The neuron with the smallest distance is the BMU.

4. Cooperation (Neighborhood Function):

o Define a neighborhood around the BMU (e.g., Gaussian function).
o Neurons within this neighborhood will update their weights.
5. Adaptation (Weight Update):
 Adjust weights of BMU and neighbors using:
o Example: If BMU is neuron (2,3), update it and its neighbors.
2. Iteration:
o Repeat for all input vectors over multiple epochs.
o Gradually reduce learning rate α(t)α(t) and neighborhood
size h(t)h(t).
3. Convergence:
o Weights stabilize, and the map organizes itself.
Example: Customer Segmentation
 Input Data: Age, Income, Spending Score.
 SOM Output: A 2D map where similar customers cluster together (e.g.,
young high-spenders vs. older low-spenders).
3. Explain how feature maps are formed in SOM.
Ans: Formation of Feature Maps:
1. Input Data Representation:
o High-dimensional data (e.g., 30D gene expression data) is fed into
the SOM.
2. Competitive Learning:
o Neurons compete to represent input patterns (winner-takes-all).
o BMU and its neighbors adjust weights to resemble input.
3. Topology Preservation:
o Similar inputs activate nearby neurons in the 2D grid.
o Example: In a SOM for animal features, "cat" and "dog" neurons
are close.
4. Dimensionality Reduction:
o High-D data is projected onto a 2D grid while preserving
relationships.
o Example: A 100D dataset is visualized as a 2D heatmap.
5. Convergence:
o After training, the map stabilizes, and clusters emerge.
Example: Image Color Mapping
 Input: RGB pixel values (3D).
 SOM Output: A 2D grid where similar colors are grouped together.
4. Describe learning vector quantization and its properties.
Ans: LVQ Overview:
LVQ is a supervised version of SOM used for classification. It adjusts
prototype vectors to minimize classification errors.
Steps in LVQ:
1. Initialize Prototypes: Assign random weight vectors for each class.
2. Input Presentation: Feed labeled training data.
3. Find BMU: Select the closest prototype.
4. Update Rule:
o If BMU’s class = input class:

5. Iterate Until Convergence.

Properties of LVQ:
1. Supervised Learning: Requires labeled data.
2. No Neighborhood Function: Only BMU updates.
3. Efficient for Classification: Works well with small datasets.
4. Interpretable Prototypes: Each prototype represents a class.
Example: Medical Diagnosis
 Input: Patient symptoms (fever, cough).
 Prototypes: "Flu" vs. "Cold" classifiers.
 Result: New patients are classified based on learned prototypes.
5. Compare supervised and unsupervised learning using SOM.
Ans:

Unsupervised Supervised LVQ

Aspect SOM (SOM-based)

Requires
1 Learning Type No labels needed
labeled data

Grouping Classifying
Example customers by emails as
purchase behavior spam/ham

Labeled
Raw input features examples with
2 Training Data
only correct
outputs

Diagnosing
Analyzing gene diseases from
Example
expression patterns labeled patient
data

Minimize
Discover hidden
3 Objective classification
patterns
errors

Handwritten
Market
Example digit
segmentation
recognition

Only BMU
Neuron BMU + neighbors
4 updates
Updates update weights
weights
Unsupervised Supervised LVQ
Aspect SOM (SOM-based)

Fraud
Color quantization
Example detection in
in images
transactions

Uses decreasing
No
radius
5 Neighborhood neighborhood
(Gaussian/Mexican
function
hat)

Predicting
Organizing
Example customer
document topics
churn

Topological map of Direct class

6 Output
similar patterns predictions

2D visualization of Medical
Example high-dimensional diagnosis
data (healthy/sick)

Quantization error, Precision,

7 Evaluation topographic recall,
accuracy accuracy

Measuring cluster Testing spam

Example quality in customer filter
data performance

Interpret clusters Provide labels

8 Human Role
post-training before training
Unsupervised Supervised LVQ
Aspect SOM (SOM-based)

Labeling
Analyzing new
Example training
customer segments
images for AI

Limited to
Discovers unknown
9 Flexibility predefined
patterns
classes

Classifying
Finding novel
Example known plant
patient subgroups
species

Exploratory data Predictive

10 Best For
analysis modeling

Building
Understanding
Example decision
complex datasets
systems

Key Differences Illustrated:

1. In customer analytics:
o Unsupervised SOM might reveal 5 natural customer clusters
o Supervised LVQ would predict "will buy" vs "won't buy"
2. In medical research:
o Unsupervised SOM could discover new disease subtypes
o Supervised LVQ would diagnose known conditions
3. In image processing:
o Unsupervised SOM groups similar images by visual features
o Supervised LVQ classifies images into predefined categories
6. Describe the role of SOM in classification tasks.
Ans: 1. Feature Extraction:
o Reduces high-D data to 2D, making classification easier.
o Example: Extracting key features from images before
classification.
2. Cluster Identification:
o Groups similar data points, which can then be labeled.
o Example: Identifying tumor types in medical data.
3. Preprocessing for Supervised Models:
o SOM can be used before SVM or k-NN for better accuracy.
o Example: Reducing 100D data to 2D before applying k-NN.
4. Anomaly Detection:
o Outliers appear far from clusters in the SOM.
o Example: Detecting fraudulent transactions.
Example: Handwritten Digit Classification
 Input: Pixel values of digits.
 SOM Output: 2D map where similar digits cluster.
 Classification: Assign labels to clusters (e.g., "0", "1").

DL Unit 3 Notes PPT
No ratings yet
DL Unit 3 Notes PPT
37 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Lecture (4) Backpropagation
No ratings yet
Lecture (4) Backpropagation
27 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
18 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
PNAL6 MLPTraining
No ratings yet
PNAL6 MLPTraining
40 pages
ML Unit 5
No ratings yet
ML Unit 5
34 pages
Lecture-17 Machine Learning With Python
No ratings yet
Lecture-17 Machine Learning With Python
37 pages
FFNN, GD, Backpropagation
No ratings yet
FFNN, GD, Backpropagation
18 pages
Unit 2
No ratings yet
Unit 2
38 pages
Module 2
No ratings yet
Module 2
14 pages
Backpropagation Networks Presentation Updated
No ratings yet
Backpropagation Networks Presentation Updated
10 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
CI-6-8 Backpropagation (COMPLETE) Updated
No ratings yet
CI-6-8 Backpropagation (COMPLETE) Updated
76 pages
UML - Unit 2
No ratings yet
UML - Unit 2
10 pages
ML Unit 2 Lecture Notes
No ratings yet
ML Unit 2 Lecture Notes
20 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
6 pages
ANN Research
No ratings yet
ANN Research
18 pages
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
No ratings yet
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
44 pages
4-perceptron-06-08-2025
No ratings yet
4-perceptron-06-08-2025
32 pages
5_2018_12_06!07_01_32_PM
No ratings yet
5_2018_12_06!07_01_32_PM
37 pages
Backpropagation
No ratings yet
Backpropagation
2 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
Lecture 40,41 BP Algorithm
No ratings yet
Lecture 40,41 BP Algorithm
11 pages
A Study On Backpropagation in Artificial Neural Ne
No ratings yet
A Study On Backpropagation in Artificial Neural Ne
8 pages
Back Propagation
No ratings yet
Back Propagation
19 pages
L4deep Learning
No ratings yet
L4deep Learning
14 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Backpropagation Process in Deep Neural Network
No ratings yet
Backpropagation Process in Deep Neural Network
6 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Week 2
No ratings yet
Week 2
17 pages
Exp 3
No ratings yet
Exp 3
9 pages
What Is Backpropagation
No ratings yet
What Is Backpropagation
8 pages
Errorback Propagation
No ratings yet
Errorback Propagation
3 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Back Propogation Algorithm
No ratings yet
Back Propogation Algorithm
13 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
0111CS191028
No ratings yet
0111CS191028
4 pages
Unit 4
No ratings yet
Unit 4
16 pages
Backpropagation Steps
No ratings yet
Backpropagation Steps
2 pages
Step by Step Back Propagation
No ratings yet
Step by Step Back Propagation
8 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
ML Exp 8
No ratings yet
ML Exp 8
2 pages
Unit4 - Chain Rule and Backpropagation
No ratings yet
Unit4 - Chain Rule and Backpropagation
4 pages
Back Propagation of Errors
No ratings yet
Back Propagation of Errors
8 pages
03 Mind Map Theory
No ratings yet
03 Mind Map Theory
24 pages
COMPRO & LEGALITAS PT FABINA CIPTA PESONA APR 25
No ratings yet
COMPRO & LEGALITAS PT FABINA CIPTA PESONA APR 25
79 pages
Compiled Test 1 EIS (All)
No ratings yet
Compiled Test 1 EIS (All)
349 pages
LK Ign, Electrical
No ratings yet
LK Ign, Electrical
193 pages
Wireless Livestock Feed Monitoring and Management System Using Arduino and IOT
No ratings yet
Wireless Livestock Feed Monitoring and Management System Using Arduino and IOT
7 pages
Computer Science: Basic Computer Organisation: Description of A Computer System
No ratings yet
Computer Science: Basic Computer Organisation: Description of A Computer System
5 pages
Zero Leakage Performance Robust Design Trouble-Free Operation Ergonomically Designed
No ratings yet
Zero Leakage Performance Robust Design Trouble-Free Operation Ergonomically Designed
9 pages
LTE Power Control
100% (2)
LTE Power Control
34 pages
Session 9 Verilog Programming
No ratings yet
Session 9 Verilog Programming
13 pages
Baidu Summary
No ratings yet
Baidu Summary
4 pages
5.hmt-B19162a-M02 - Piping Diagram of Ballast Water System - 1.0
No ratings yet
5.hmt-B19162a-M02 - Piping Diagram of Ballast Water System - 1.0
6 pages
Bca Project4thsem
No ratings yet
Bca Project4thsem
39 pages
Crafting The Methods and Results in Academic Publishing
No ratings yet
Crafting The Methods and Results in Academic Publishing
10 pages
Imagen Turbo-Compresor Solar
No ratings yet
Imagen Turbo-Compresor Solar
2 pages
7-1. Check Valves (2011)
No ratings yet
7-1. Check Valves (2011)
8 pages
USAID - BHA RFSA M&E Technical Guidance May 2023
No ratings yet
USAID - BHA RFSA M&E Technical Guidance May 2023
143 pages
Example-Self Safety Inspection Checklist - QA
100% (1)
Example-Self Safety Inspection Checklist - QA
3 pages
Tutorial Session 10 Autocorrelation Solution
No ratings yet
Tutorial Session 10 Autocorrelation Solution
4 pages
ORGANIZING
No ratings yet
ORGANIZING
36 pages
What Is Buffer Overflow?
No ratings yet
What Is Buffer Overflow?
5 pages
LUMEL Transducers Range
No ratings yet
LUMEL Transducers Range
6 pages
Pipe Risers and Their Supports
No ratings yet
Pipe Risers and Their Supports
4 pages
Nandini - Gupta - Resume PDF
No ratings yet
Nandini - Gupta - Resume PDF
2 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
Facial Expression Based Music Recommendation System
No ratings yet
Facial Expression Based Music Recommendation System
10 pages
Macros For Mine Planning Engineer
No ratings yet
Macros For Mine Planning Engineer
8 pages
SS - Gr. Techlam Bisleri Inks
No ratings yet
SS - Gr. Techlam Bisleri Inks
1 page
OUC DC 911 Follow Up
No ratings yet
OUC DC 911 Follow Up
2 pages
q4 w4 21st Century Literature
No ratings yet
q4 w4 21st Century Literature
3 pages
HT Test Reopts July CTPT 2020
No ratings yet
HT Test Reopts July CTPT 2020
6 pages