UNIT III
1. Describe the backpropagation algorithm used in multilayer
perceptrons.
Ans: Backpropagation (Backward Propagation of Errors) is a supervised
learning algorithm used to train Multilayer Perceptrons (MLPs). It adjusts
the weights of the network by propagating the error backward from the
output layer to the input layer, minimizing the difference between predicted
and actual outputs using gradient descent.
Process of the Algorithm
1. Forward Propagation:
- The input data is passed through the network to compute the predicted
output.
- Each neuron applies an activation function (e.g., Sigmoid, ReLU) to its
weighted sum of inputs.
2. Error Calculation:
- The difference between the predicted output and the actual target is
computed using a loss function (e.g., Mean Squared Error).
3. Backward Propagation:
- The error is propagated backward through the network.
- The gradient of the loss with respect to each weight is computed using the
chain rule from calculus.
4. Weight Update:
- The weights are adjusted in the opposite direction of the gradient to
minimize the error.
- The learning rate (η) controls the step size of the weight updates.
Step-by-Step Explanation
Step 1: Forward Pass (Compute Output)
1. Input Layer: Pass input features to the first hidden layer.
2. Hidden Layers: Each neuron calculates:
3. Output Layer: Compute the final prediction.
Step 2: Compute Loss (Error)
Step 3: Backward Pass (Update Weights)
Step 4: Repeat Until Convergence
- Iterate over training data for multiple epochs until error is minimized.
Real-Time Example: Predicting House Prices
Problem:
- Input Features: Size (sq. ft.), Number of Bedrooms.
- Target: House Price ($).
Neural Network Structure:
- Input Layer: 2 neurons (Size, Bedrooms).
- Hidden Layer: 3 neurons (Sigmoid activation).
- Output Layer: 1 neuron (Predicted Price).
Training Steps:
1. Forward Pass:
- Input: [1500 sq. ft., 3 bedrooms] → Hidden Layer → Output → Predicted
Price = $300,000.
2. Error Calculation:
- Actual Price = $320,000 → Error = $(320K - 300K)^2 = 400,000,000.
3. Backward Pass:
- Compute gradients and adjust weights to reduce error.
4. Next Iteration:
- New prediction → $310,000 → Closer to target!
Result:
- After multiple iterations, the network learns to predict prices accurately.
2. Explain the backpropagation algorithm in detail with diagram.
Ans: Backpropagation is a supervised learning algorithm used to train neural
networks by minimizing the error between predicted and actual outputs. It
works by propagating errors backward through the network and adjusting
weights using gradient descent.
Key Steps of Backpropagation
1. Forward Pass:
o Input is passed through the network.
o Each layer computes its output using weights, biases, and
activation functions.
o Final output is compared with the true label to compute loss (e.g.,
Mean Squared Error or Cross-Entropy Loss).
2. Backward Pass (Error Propagation):
o The gradient of the loss w.r.t. each weight is computed using
the chain rule.
o Errors are propagated backward from the output layer to hidden
layers.
o Weights and biases are updated to minimize loss.
3. Weight Update:
Backpropagation Diagram
Below is a step-by-step diagram of backpropagation in a simple neural network
with one hidden layer:
Neural Network Structure:
Input Layer (X₁, X₂)
Hidden Layer (H₁, H₂) (with weights W1, W2 and activation function,
e.g., Sigmoid)
Output Layer (Ŷ) (with weights W3, W4 and loss computation)
X₁ X₂ (Input )
\ /
H₁ H₂ (Hidden)
\ /
Ŷ (Output)
Step-by-Step Explanation of Diagram:
1. Forward Pass (Blue Arrows):
o Inputs X1,X2 are fed into the hidden layer.
o Hidden layer computes weighted sum and applies activation:
2. Backward Pass (Red Arrows):
3. Weight Update:
3. Describe the differentiation and error minimization process in BP
Ans: Backpropagation minimizes error by computing gradients (partial
derivatives) of the loss function with respect to each weight and updating
them using gradient descent.
Step-by-Step Example (Single Neuron)
Neural Network Setup
Input: X=2X=2
Weight: W=0.5W=0.5 (initial)
Bias: b=1b=1
Activation: Sigmoid (σ(z)=11+e−zσ(z)=1+e−z1)
True Output: Y=1Y=1
Loss: Mean Squared Error L=12(Y−Y^)2L=21(Y−Y^)2
4. Discuss virtues and limitations of backpropagation learning.
Ans: Virtues (Advantages) of Backpropagation Learning
1. Universal Approximation
o With enough hidden neurons, a neural network trained via
backpropagation can approximate any continuous
function (Universal Approximation Theorem).
2. Efficient Gradient Computation
o Uses the chain rule to compute gradients systematically, avoiding
manual differentiation.
3. Works with Deep Networks
o Extendable to deep neural networks (DNNs) with multiple hidden
layers (basis of modern deep learning).
4. Handles Large Datasets
o Compatible with stochastic gradient descent (SGD), making it
scalable to big data.
5. Flexible with Activation Functions
o Works with ReLU, Sigmoid, Tanh, Softmax, etc., allowing
different learning behaviors.
6. Automated Feature Learning
o Unlike traditional ML, it automatically learns features from raw
data without manual engineering.
Limitations (Disadvantages) of Backpropagation Learning
1. Vanishing/Exploding Gradients
o In deep networks, gradients can become too small
(vanishing) or too large (exploding), slowing or destabilizing
training.
2. Local Minima & Saddle Points
o Gradient descent may get stuck in suboptimal solutions instead of
the global minimum.
3. Computationally Expensive
o Requires many iterations (epochs) and large compute power for
deep networks.
4. Sensitive to Initial Weights
o Poor weight initialization can lead to slow convergence or bad
performance.
5. Overfitting Risk
o Without regularization (e.g., dropout, L2), networks
may memorize training data instead of generalizing.
6. Requires Labeled Data
o Only works in supervised learning; unsupervised variants (e.g.,
autoencoders) need modifications.
7. Black Box Nature
o Hard to interpret why a neural network makes certain decisions
(explainability challenge).
Mitigation Strategies for Limitations
1. Vanishing Gradients → Use ReLU, BatchNorm, Residual Connections
(ResNet).
2. Local Minima → Use momentum (SGD with momentum, Adam).
3. Overfitting → Apply Dropout, L2 Regularization, Early Stopping.
4. Compute Cost → Use GPUs, Mini-batch SGD, Distributed Training.
5. Initialization Sensitivity → He/Xavier Initialization.
Backpropagation is powerful but not perfect—it drives modern deep learning
but requires careful tuning. Advances like BatchNorm, Residual Networks,
and Adaptive Optimizers help overcome its limitations.
5. Explain convergence and error handling in backpropagation.
ANS: Convergence in backpropagation refers to the process where a
neural network’s weights are iteratively adjusted to minimize the error (loss)
between predicted and actual outputs. The goal is to reach a stable state
where further training does not significantly reduce the error.
Key Factors Affecting Convergence
1. Learning Rate (η)
o Too high → Overshooting (loss oscillates).
o Too low → Slow convergence.
2. Weight Initialization
o Poor initialization (e.g., all zeros) → Stuck training.
3. Loss Function Shape
o Non-convex (many local minima) → May converge to suboptimal
solutions.
4. Optimization Algorithm
o SGD, Adam, RMSprop affect speed and stability.
Example: Simple Linear Regression with Backpropagation
Problem: Fit a line y=Wx + by = Wx + b to data points (x=1,y=2)
(x=1,y=2) and (x=2,y=3) (x=2,y=3).
Loss Curve:
│
│ ● Initial high loss
│ \
│ \
│ ● Converged (low loss)
│
└─────────> Iterations
Converged Solution:
W≈1, b≈1 → Perfect fit (y=x+1).
Types of Convergence
1. Global Convergence
o Reaches the best possible solution (rare in deep learning due to
non-convexity).
2. Local Convergence
o Stops improving but may not be the global minimum.
3. Divergence
o Loss increases (e.g., due to too high η).
Error handling in backpropagation refers to the techniques used to manage
and correct various types of errors that occur during neural network training.
These errors can stem from computational issues, poor learning dynamics, or
generalization problems.
1. Types of Errors in Backpropagation
Error Type Description Example
Gradients become extremely small,
Vanishing Sigmoid/Tanh in
slowing/stopping learning in early
Gradients deep networks.
layers.
Exploding Gradients grow exponentially, Recurrent Networks
Gradients causing unstable updates. (RNNs).
Error Type Description Example
Model memorizes training data but High accuracy on
Overfitting
fails on test data. train, low on test.
Optimization gets stuck in Non-convex loss
Local Minima
suboptimal solutions. landscapes.
Numerical Division by zero in
Floating-point errors (e.g., NaN/∞).
Instability softmax.
6. Write a detailed note on generalization and cross-validation.
Ans:
Generalization refers to a model's ability to perform well on unseen data
(test/real-world data) after being trained on a limited dataset (training data).
A model that generalizes well avoids overfitting (memorizing training data)
and underfitting (failing to learn patterns)
Key Concepts
1. Overfitting
o High training accuracy but poor test accuracy.
o Occurs when the model is too complex (e.g., deep neural networks
with insufficient data).
o Example: A polynomial regression of degree 10 fitting noise in a
small dataset.
2. Underfitting
o Poor performance on both training and test data.
o Occurs when the model is too simple (e.g., linear regression on
non-linear data).
o Example: Using a straight line to fit a sine wave.
3. Bias-Variance Tradeoff
o High Bias (Underfitting): Oversimplified model.
o High Variance (Overfitting): Overly complex model.
o Goal: Find the "sweet spot" where both bias and variance are
minimized.
Techniques to Improve Generalization
1. Regularization (L1/L2)
o Adds penalty terms (L1: sparsity, L2: small weights) to the loss
function to prevent overfitting.
o Example: model.add(Dense(64, kernel_regularizer=l2(0.01))) in
Keras.
2. Dropout
o Randomly deactivates neurons during training to prevent co-
adaptation.
o Example: model.add(Dropout(0.5)) after a dense layer.
3. Early Stopping
o Halts training when validation error stops improving to avoid
overfitting.
o Example: EarlyStopping(monitor='val_loss', patience=5) in Keras.
4. Data Augmentation
o Artificially expands training data with transformations (e.g.,
rotations, flips).
o Example: ImageDataGenerator(rotation_range=20) for images.
5. Batch Normalization
o Normalizes layer outputs to stabilize and accelerate training.
o Example: model.add(BatchNormalization()) after a conv layer.
6. Cross-Validation
o Splits data into folds to assess model robustness before final
training.
o Example: cross_val_score(model, X, y, cv=5) in scikit-learn.
7. Ensemble Methods
o Combines predictions from multiple models (e.g., Bagging,
Boosting).
o Example: RandomForestClassifier() for reduced variance.
8. Simpler Architectures
o Reduces model complexity (fewer layers/neurons) to match data
size.
o Example: Replace a 10-layer NN with a 3-layer NN for small
datasets.
9. Transfer Learning
o Uses pre-trained models (e.g., ResNet, BERT) as feature extractors.
o Example: VGG16(weights='imagenet', include_top=False).
10.Noise Injection
o Adds slight noise to inputs/weights to improve robustness.
o Example: GaussianNoise(0.1) in Keras input layers.
Cross-validation (CV) is a technique to evaluate how well a machine learning
model generalizes to unseen data. Instead of training on the entire dataset once,
the data is split into multiple subsets (folds). The model is trained on some folds
and tested on the remaining one, repeating the process to ensure reliable
performance estimates.
Example: Training a Robot to Recognize Objects
Scenario:
You’re training a robot to recognize apples vs. oranges using a dataset of 100
images (50 apples, 50 oranges).
Step 1: Holdout Validation (Basic Approach)
Split: 70 images (train) + 30 images (test).
Risk: If the test set has easy samples, accuracy may be misleading.
Step 2: 5-Fold Cross-Validation (Better Approach)
1. Divide the 100 images into 5 equal folds (20 images each).
2. Train the robot on 4 folds (80 images) and test on the remaining fold (20
images).
3. Repeat 5 times, each time with a different test fold.
4. Average the results to get a robust accuracy estimate.
Example Results:
Fold Test Accuracy
1 92%
2 88%
3 90%
4 85%
5 93%
Final Accuracy: (92+88+90+85+93)/5 = 89.6%
Use Cross-Validation:
Prevents data leakage (using test data in training).
Provides a robust estimate of model performance.
Helps in hyperparameter tuning (e.g., selecting the best learning rate).
Types of Cross-Validation
1. Holdout Validation
o Simplest method: Split data into train (70%) and test (30%).
o Limitation: Performance varies based on the split.
2. k-Fold Cross-Validation
o Divides data into k equal folds.
o Trains on k-1 folds, validates on the remaining fold.
o Repeats k times and averages results.
o Best for: Small datasets.
Example (5-Fold CV):
Fold 1: [Test] | [Train, Train, Train, Train]
Fold 2: [Train] | [Test, Train, Train, Train]
...
Fold 5: [Train, Train, Train, Train] | [Test]
3. Stratified k-Fold CV
o Ensures each fold has the same class distribution as the full
dataset.
o Best for: Imbalanced datasets (e.g., fraud detection).
4. Leave-One-Out CV (LOOCV)
o Extreme case of k-Fold where k = n (sample size).
o Pros: Unbiased estimate.
o Cons: Computationally expensive.
5. Time Series CV
o Preserves temporal order (critical for forecasting).
o Example:
Train: [t1, t2, t3] → Test: [t4]
Train: [t1, t2, t3, t4] → Test: [t5]
7. Explain the structure and phases of supervised learning.
Ans: Supervised learning is a machine learning paradigm where a model
learns from labeled data (input-output pairs) to make predictions. It follows a
structured workflow with distinct phases:
1. Problem Definition
Goal: Define the task (classification, regression) and success metrics.
Example:
o Task: Predict house prices (regression).
o Success Metric: Mean Absolute Error (MAE) < $10,000.
2. Data Collection
Goal: Gather high-quality labeled data.
Sources: Databases, APIs, surveys, sensors.
Example: Collect 10,000 house listings with features (size, location) and
prices.
3. Data Preprocessing
Goal: Clean and prepare data for modeling.
Key Steps:
1. Handling Missing Data:
o Drop rows or impute (mean/median).
2. Feature Engineering:
o Create new features (e.g., "price per sq. ft").
3. Encoding Categorical Variables:
o One-hot encode (e.g., "neighborhood").
4. Normalization/Scaling:
o Standardize numerical features (e.g., MinMaxScaler).
Example:
Convert "location" (text) to numerical labels.
Scale "square footage" to [0, 1].
4. Model Selection
Goal: Choose an algorithm based on the problem type.
Task Algorithms
Classification Logistic Regression, Random Forest, SVM
Regression Linear Regression, Decision Trees, XGBoost
Example: Use Random Forest for house price prediction.
5. Training
Goal: Teach the model to map inputs to outputs.
Process:
1. Split data into training (70%) and validation (30%) sets.
2. Feed training data to the model.
3. Adjust weights to minimize loss (e.g., MSE for regression).
Example:
Train the model on 7,000 houses to learn price patterns.
6. Evaluation
Goal: Test model performance on unseen data.
Metrics:
Classification: Accuracy, Precision, Recall, F1-Score.
Regression: MAE, RMSE, R².
Example:
Validate on 3,000 houses → MAE = $8,500 (meets goal!).
7. Hyperparameter Tuning
Goal: Optimize model settings for better performance.
Methods:
o Grid Search: Test predefined hyperparameter combinations.
o Random Search: Sample randomly from ranges.
Example:
Tune n_estimators and max_depth in Random Forest.
8. Deployment
Goal: Integrate the model into real-world systems.
Steps:
1. Save the trained model (e.g., .pkl file).
2. Deploy as an API (e.g., Flask, FastAPI).
3. Monitor performance in production.
Example:
Predict prices for new listings on a real estate website.
9. Monitoring & Maintenance
Goal: Ensure model adapts to changing data.
Actions:
o Retrain with fresh data periodically.
o Detect drift (e.g., sudden price fluctuations).
Example:
Update model every 6 months with new sales data.
Workflow:
Problem → Data → Preprocess → Train → Evaluate → Tune → Deploy → Monitor
8. Write short notes on overfitting.
Ans: Overfitting occurs when a machine learning model memorizes the
training data (including noise and outliers) instead of learning general
patterns, leading to poor performance on new, unseen data.
Key Signs of Overfitting:
High accuracy on training data, but low accuracy on test data.
The model fails to generalize to real-world scenarios.
Example: A Robot Learning to Recognize Cats vs. Dogs
Scenario
Task: Train a robot to classify images as "cat" or "dog."
Training Data: 100 images (50 cats, 50 dogs).
Test Data: 30 new images (unseen during training).
What Happens When the Robot Overfits?
1. Memorization Over Learning:
o The robot memorizes tiny details (e.g., a scratch on one cat’s ear, a
specific dog collar).
o Instead of learning general features (e.g., ear shape, fur texture), it
associates irrelevant details with labels.
2. Training vs. Test Performance:
o Training Accuracy: 99% (almost perfect).
o Test Accuracy: 60% (fails on new images).
3. Real-World Failure:
o If given a photo of a new cat with different lighting, the robot
misclassifies it because it relied on memorized patterns.
Why Does Overfitting Happen?
Model Too Complex: A deep neural network with too many layers learns
noise.
Small Dataset: Limited data makes memorization easier than
generalization.
No Regularization: No techniques (e.g., dropout, L2) to prevent over-
reliance on training data.
How to Fix Overfitting in the Robot Example
1. Simplify the Model
o Use fewer layers in the neural network.
o Example: Switch from a 10-layer CNN to a 3-layer CNN.
2. Data Augmentation
o Artificially expand training data with rotations, flips, and
brightness changes.
o Example: Generate 500 augmented images from the original 100.
3. Regularization
o Dropout: Randomly ignore 20% of neurons during training to
prevent reliance on specific features.
o L2 Regularization: Penalize large weights in the model.
4. Cross-Validation
o Use 5-fold CV to ensure the robot performs well on all data splits.
5. Early Stopping
o Stop training when test accuracy stops improving.
Visualizing Overfitting
Training Accuracy: ●●●●●●●●●● (100%)
Test Accuracy: ●●●○○○○○○○ (60%)
The gap between training and test performance indicates overfitting.
9. Explain the role of Hessian matrix in learning algorithms.
Ans: The Hessian matrix is a square matrix of second-order partial
derivatives of a scalar-valued function (typically the loss function in machine
learning). It plays a crucial role in optimization, curvature analysis, and
understanding the behavior of learning algorithms. Below is a structured
breakdown of its significance:
2. Key Roles in Learning Algorithms
(A) Optimization (Second-Order Methods)
(B) Analyzing Critical Points
Eigenvalues of H reveal:
o Positive Definite: Local minimum (all eigenvalues > 0).
o Negative Definite: Local maximum (all eigenvalues < 0).
o Saddle Point: Mixed signs (common in high-dimensional spaces).
(C) Regularization and Robustness
(D) Pruning and Compression
Optimal Brain Surgeon: Uses Hessian to prune unimportant weights
without significant loss increase.
4. Challenges and Workarounds
Challenge Solution
High computational cost Approximate HH (e.g., diagonal Hessian).
Challenge Solution
Non-convex loss Focus on local curvature (e.g., saddle point
landscapes escape).
Numerical instability Use eigenvalue clipping or damping.
10.Describe the process of network pruning.
Ans: Network pruning is a technique to reduce the size of a neural
network by removing unnecessary weights, neurons, or layers without
significantly compromising performance. It improves efficiency, reduces
computational costs, and enables deployment on edge devices (e.g.,
smartphones, IoT).
Prune Neural Networks:
Reduces model size (storage/memory).
Speeds up inference (fewer FLOPs).
Maintains accuracy (or minimizes loss).
Enables edge deployment (e.g., MobileNet, TinyML).
Step-by-Step Pruning Process
Step 1: Train a Baseline Model
Train the original network to convergence.
Example: Train a ResNet-50 on CIFAR-10 (baseline accuracy = 92%).
Step 2: Identify Pruning Candidates
Criteria for pruning:
o Magnitude-based: Remove weights near zero (e.g., |weight| <
0.01).
o Gradient-based: Prune weights with low impact on loss (using
gradients/Hessian).
o Activation-based: Drop neurons with low average activation.
Step 3: Prune the Network
Weight Pruning Example:
mask = (torch.abs(model.weight) > threshold) # Keep weights above threshold
model.weight.data *= mask.float() # Zero out others
Neuron Pruning Example:
Remove neurons with L2-norm below a threshold in a dense layer.
Step 4: Fine-Tune the Pruned Model
Retrain the pruned network to recover lost accuracy.
Example: Train for 10 more epochs with 50% fewer weights.
Step 5: Iterate (Optional)
Repeat pruning + fine-tuning cycles for aggressive compression.
4. Advanced Pruning Techniques
Technique Description
Lottery Ticket Finds sparse "winning tickets" that can train from
Hypothesis scratch.
Prunes gradually (e.g., 20% weights at a time) vs.
Iterative Pruning
one-shot.
Structured Pruning Removes entire channels/filters (hardware-friendly).
Uses RL/NAS (Neural Architecture Search) to
Automated Pruning
optimize pruning.
Types of Neural Network Pruning
Pruning techniques are categorized based on what is removed (weights,
neurons, layers) and how it is removed (structured vs. unstructured). Below are
the key types:
1. Based on Granularity
(A) Weight Pruning (Unstructured Pruning)
What: Removes individual weights (sets them to zero).
Pros: High compression rates.
Cons: Requires sparse hardware support for speedup.
Example:
# PyTorch weight pruning
mask = (torch.abs(weight_tensor) < threshold
weight_tensor[mask] = 0 # Zero out small weights
(B) Neuron/Unit Pruning (Structured Pruning)
What: Removes entire neurons or filters (e.g., in dense/conv layers).
Pros: Hardware-friendly (works with standard libraries).
Cons: Less fine-grained than weight pruning.
Example:
o Drop neurons with low activation norms in a dense layer.
(C) Channel/Filter Pruning (Structured Pruning)
What: Removes entire channels in convolutional layers.
Pros: Reduces FLOPs significantly.
Example:
o Delete a 3x3 filter in a Conv2D layer if its L1-norm is small.
(D) Layer Pruning
What: Removes entire layers (e.g., residual blocks in ResNet).
Pros: Drastically reduces model depth.
Cons: Risky (may break architecture).
11.Describe the supervised learning approach with examples.
Ans: Supervised Learning is a machine learning approach where the model
is trained on a labeled dataset—each input has a corresponding correct
output. The system learns to predict the output from inputs by minimizing
the difference between predicted and actual values.
Working:
1. Training Phase:
o The algorithm is given input-output pairs (e.g., features and labels).
o It learns the pattern or mapping function from inputs to outputs.
2. Testing/Prediction Phase:
o The trained model is tested on new, unseen data to predict the
output.
Examples:
1. Email Spam Detection
Input (Features): Email content, sender, subject line
Output (Label): Spam or Not Spam
Goal: Classify new emails as spam or not based on past data.
2. House Price Prediction
Input (Features): Square footage, number of bedrooms, location
Output (Label): Price of the house
Goal: Predict the price of a house based on its features.
Types of Supervised Learning
1. Classification
Definition: The task of predicting a discrete label or category.
Goal: Assign input data to one of the predefined classes.
Examples:
o Email Spam Detection → Spam or Not Spam
o Disease Prediction → Positive or Negative
o Image Recognition → Cat, Dog, or Human
Algorithms Used:
o Logistic Regression
o Decision Tree
o Random Forest
o Support Vector Machine (SVM)
o k-Nearest Neighbors (KNN)
2. Regression
Definition: The task of predicting a continuous numeric value.
Goal: Estimate the relationship between variables.
Examples:
o House Price Prediction → ₹25,00,000
o Stock Market Forecasting → ₹1,200.75
o Temperature Prediction → 29.5°C
Algorithms Used:
o Linear Regression
o Ridge/Lasso Regression
o Decision Tree Regressor
o Support Vector Regressor (SVR)
UNIT IV
Self-Organizing Maps (SOM)
2 Marks Questions (12)
1. What is a Self-Organizing Map?
Ans: An unsupervised neural network that projects high-dimensional data
onto a low-dimensional (usually 2D) grid, preserving topological
relationships.
Example: Visualizing customer segments in marketing data on a 2D map.
2. Define a feature map.
Ans: A transformation of input features into a structured output space where
similar inputs activate nearby units.
Example: SOM transforming pixel colours of images into a 2D colour map.
3. Mention one property of SOM.
Ans: Preserves the topology of input data in the output space.
Example: Similar words clustering together in word embedding
visualizations.
4. What is the SOM algorithm?
Ans: Iterative process where neurons compete to represent input data, and
weights are updated based on a neighbourhood function.
Example: Training a SOM to classify handwritten digits without labels.
5. Define computer simulations in neural networks.
Ans: Using software to model neural network behaviour and test learning
algorithms.
Example: Simulating SOM training on iris dataset to study clustering.
6. What is learning vector quantization?
Ans: A supervised version of SOM where prototype vectors are adjusted to
minimize classification errors.
Example: Classifying medical data (e.g., diabetic vs. non-diabetic patients).
7. Mention a use of adaptive pattern classification.
Ans: Systems that learn and adapt to recognize patterns dynamically.
Example: Spam filters improving accuracy by learning from user-marked
emails.
8. What is a feature mapping model?
Ans: A model that reduces data dimensions while retaining structural
relationships.
Example: SOM mapping gene expression data for cancer subtype analysis.
9. Define clustering in SOM.
Ans: Grouping similar data points into neighbouring neurons on the map.
Example: SOM clustering countries based on GDP, population, and literacy
rates.
10.What is neighbourhood function in SOM?
Ans: A function (e.g., Gaussian) defining how nearby neurons are updated
during training.
Example: Updating not just the "winning neuron" but also its neighbours in a
3x3 grid.
11.What does unsupervised learning mean?
Ans: Training without labelled data, relying on patterns and similarities.
Example: SOM organizing a library of unlabelled images by visual
similarity.
12.What is dimensionality reduction?
Ans: Reducing the number of input variables while preserving key
information.
Example: SOM compressing 30D sensor data into a 2D map for fault
detection.
5 Marks Questions (6)
1. Explain the two basic feature mapping models.
Ans: (A) Competitive Learning Model
Definition & Mechanism
Competitive learning is an unsupervised neural network model where
neurons compete to respond to input patterns. The key principle is "winner-
takes-all", meaning only the Best Matching Unit (BMU) updates its weights.
Steps:
1. Input Presentation: A high-dimensional input vector is fed into the
network.
2. Similarity Calculation: Each neuron computes its distance (e.g.,
Euclidean) from the input.
3. Competition: The neuron with the smallest distance wins and becomes
the BMU.
4. Weight Update: Only the BMU adjusts its weights to better match the
input.
5. Iteration: The process repeats until weights stabilize.
Example: Handwritten Digit Recognition
Input: Pixel values of handwritten digits (0-9).
Competition: Neurons compete to represent digits.
Result: Each neuron specializes in recognizing a specific digit.
Limitations
Only the winning neuron learns, ignoring neighbors.
May lead to "dead neurons" (neurons that never win).
(B) Self-Organizing Map (SOM) Model
Definition & Mechanism
SOM is an unsupervised neural network that projects high-dimensional data
into a low-dimensional (usually 2D) grid while preserving topological
relationships. Unlike competitive learning, SOM updates not just the BMU
but also its neighbors.
Steps:
1. Initialization: Assign random weights to neurons in a 2D grid.
2. Competition: For each input, find the BMU (closest neuron).
3. Cooperation: Identify neighboring neurons using a neighborhood
function (e.g., Gaussian).
4. Adaptation: Adjust weights of BMU and neighbors to resemble the input.
5. Decay: Reduce learning rate and neighborhood size over time.
Example: Customer Segmentation
Input: Customer data (age, income, purchase history).
SOM Output: A 2D map where similar customers cluster together.
Business Use: Targeted marketing strategies based on clusters.
Advantages Over Competitive Learning
Preserves topology (similar inputs stay close).
Avoids dead neurons by updating neighbors.
2. Describe the architecture of a self-organizing map.
Ans: 1. Input Layer
Receives high-dimensional data (e.g., feature vectors).
Example: A dataset with features like temperature, humidity, pressure.
2. Competitive Layer (Output Grid)
A 2D grid of neurons (e.g., 5×5, 10×10).
Each neuron has a weight vector of the same dimension as input.
3. Neighborhood Function
Determines how neighboring neurons are updated.
Common functions:
o Gaussian: Smooth updates around BMU.
o Mexican Hat: Excites nearby neurons but inhibits distant ones.
4. Learning Rate (α)
Controls how much weights are adjusted.
Starts high (~0.8) and decays over time (e.g., α(t) = α₀ e^(-kt)).
5. Training Process
1. Initialization: Random weights.
2. Competition: BMU selection.
3. Cooperation: Neighborhood identification.
4. Adaptation: Weight updates.
5. Convergence: Weights stabilize.
Example: Image Color Clustering
Input: RGB pixel values.
Output: A 2D color map grouping similar shades.
3. Explain the SOM algorithm with an example.
Ans: Step-by-Step SOM Algorithm
1. Initialize Weights: Randomly assign weight vectors to neurons.
2. Select Input Vector: Pick a data sample (e.g., customer features).
3. Find BMU: Calculate distances (Euclidean) and select the closest neuron.
4. Determine Neighborhood: Use a Gaussian function to define nearby
neurons.
5. Update Weights: Adjust BMU and neighbors using:
1. where:
o h(t) = neighborhood function
o α(t) = learning rate
2. Repeat: Until convergence (no significant weight changes).
Example: Animal Classification
Input: Features (size, diet, habitat).
SOM Output:
o Cluster 1: Small herbivores (rabbits, squirrels).
o Cluster 2: Large carnivores (lions, tigers).
4. Write short notes on properties of feature maps.
Ans: (1) Topology Preservation
Similar inputs map to nearby neurons.
Example: In a SOM for text data, "cat" and "dog" appear close.
(2) Dimensionality Reduction
Projects high-D data into 2D/3D for visualization.
Example: Reducing 30D financial data to a 2D risk map.
(3) Unsupervised Learning
No labeled data required.
Example: Clustering news articles by topic.
(4) Neighborhood Influence
Nearby neurons update together, ensuring smooth transitions.
Example: In weather data, "hot" and "warm" regions are adjacent.
(5) Convergence
Weights stabilize after sufficient iterations.
Example: SOM for fraud detection stops updating after learning patterns.
5. Discuss the process of learning vector quantization.
Ans: Key Differences from SOM
Supervised (uses labeled data).
No neighborhood function (only BMU updates).
Steps in LVQ
1. Initialize Prototypes: Assign random weight vectors per class.
2. Input Presentation: Feed labeled training data.
3. Find BMU: Select the closest prototype.
4. Update Rule:
o If BMU’s class = input class:
5. Iterate Until Convergence.
Example: Medical Diagnosis
Input: Patient symptoms (fever, cough).
Prototypes: "Flu" vs. "Cold" classifiers.
Result: New patients are classified based on learned prototypes.
6. Describe adaptive pattern classification.
Ans: A system that dynamically updates its classification rules based on new
data.
Phases
1. Training: Learns initial patterns (e.g., SOM/LVQ).
2. Adaptation: Adjusts weights when new data arrives.
3. Classification: Assigns labels to new inputs.
Example: Spam Filter
Initial Training: Learns from labeled emails (spam/ham).
Adaptation: Updates when users mark new emails.
Result: Improves accuracy over time.
10 Marks Questions (6)
1. Explain the working and applications of SOM in detail.
Ans: Working of Self-Organizing Maps (SOM)
Self-Organizing Maps (SOM), also known as Kohonen Networks, are a type
of artificial neural network that uses unsupervised learning to produce a
low-dimensional (typically 2D) representation of input space while
preserving the topological properties of the input data.
Key Components of SOM:
1. Input Layer: Receives high-dimensional data (e.g., feature vectors).
2. Output Layer (Competitive Layer): A 2D grid of neurons, each with a
weight vector of the same dimension as the input.
3. Neighborhood Function: Defines how neighboring neurons are updated
based on the Best Matching Unit (BMU).
4. Learning Rate: Controls the magnitude of weight updates.
Working Mechanism:
1. Initialization: Assign random weight vectors to each neuron in the 2D
grid.
2. Competition: For each input vector, the neuron whose weight vector is
closest (e.g., Euclidean distance) is declared the BMU.
3. Cooperation: The BMU and its neighboring neurons (determined by the
neighborhood function) adjust their weights to become more similar to
the input vector.
4. Adaptation: Weights are updated using the formula:
where:
o α(t)α(t) = learning rate (decreases over time)
o h(t)h(t) = neighborhood function (e.g., Gaussian)
5. Iteration: Repeat until convergence (weights stabilize).
Applications of SOM:
1. Data Visualization: Reducing high-dimensional data to 2D for easier
interpretation (e.g., gene expression data).
2. Clustering: Grouping similar data points (e.g., customer segmentation in
marketing).
3. Image Processing: Color quantization, feature extraction.
4. Anomaly Detection: Identifying outliers in datasets (e.g., fraud
detection).
5. Speech Recognition: Mapping phonetic features to a 2D grid.
2. Discuss the SOM algorithm and its steps with example.
Ans: SOM Algorithm Steps:
1. Initialization:
o Assign random weight vectors to each neuron in the 2D grid.
o Example: For a 5×5 SOM with 3D input data, each neuron has a
random 3D weight vector.
2. Input Presentation:
o Feed an input vector XX (e.g., a customer’s age, income, spending
score).
3. Competition (Finding BMU):
o Calculate the Euclidean distance between XX and each neuron’s
weight vector.
o The neuron with the smallest distance is the BMU.
4. Cooperation (Neighborhood Function):
o Define a neighborhood around the BMU (e.g., Gaussian function).
o Neurons within this neighborhood will update their weights.
5. Adaptation (Weight Update):
Adjust weights of BMU and neighbors using:
o Example: If BMU is neuron (2,3), update it and its neighbors.
2. Iteration:
o Repeat for all input vectors over multiple epochs.
o Gradually reduce learning rate α(t)α(t) and neighborhood
size h(t)h(t).
3. Convergence:
o Weights stabilize, and the map organizes itself.
Example: Customer Segmentation
Input Data: Age, Income, Spending Score.
SOM Output: A 2D map where similar customers cluster together (e.g.,
young high-spenders vs. older low-spenders).
3. Explain how feature maps are formed in SOM.
Ans: Formation of Feature Maps:
1. Input Data Representation:
o High-dimensional data (e.g., 30D gene expression data) is fed into
the SOM.
2. Competitive Learning:
o Neurons compete to represent input patterns (winner-takes-all).
o BMU and its neighbors adjust weights to resemble input.
3. Topology Preservation:
o Similar inputs activate nearby neurons in the 2D grid.
o Example: In a SOM for animal features, "cat" and "dog" neurons
are close.
4. Dimensionality Reduction:
o High-D data is projected onto a 2D grid while preserving
relationships.
o Example: A 100D dataset is visualized as a 2D heatmap.
5. Convergence:
o After training, the map stabilizes, and clusters emerge.
Example: Image Color Mapping
Input: RGB pixel values (3D).
SOM Output: A 2D grid where similar colors are grouped together.
4. Describe learning vector quantization and its properties.
Ans: LVQ Overview:
LVQ is a supervised version of SOM used for classification. It adjusts
prototype vectors to minimize classification errors.
Steps in LVQ:
1. Initialize Prototypes: Assign random weight vectors for each class.
2. Input Presentation: Feed labeled training data.
3. Find BMU: Select the closest prototype.
4. Update Rule:
o If BMU’s class = input class:
5. Iterate Until Convergence.
Properties of LVQ:
1. Supervised Learning: Requires labeled data.
2. No Neighborhood Function: Only BMU updates.
3. Efficient for Classification: Works well with small datasets.
4. Interpretable Prototypes: Each prototype represents a class.
Example: Medical Diagnosis
Input: Patient symptoms (fever, cough).
Prototypes: "Flu" vs. "Cold" classifiers.
Result: New patients are classified based on learned prototypes.
5. Compare supervised and unsupervised learning using SOM.
Ans:
Unsupervised Supervised LVQ
Aspect SOM (SOM-based)
Requires
1 Learning Type No labels needed
labeled data
Grouping Classifying
Example customers by emails as
purchase behavior spam/ham
Labeled
Raw input features examples with
2 Training Data
only correct
outputs
Diagnosing
Analyzing gene diseases from
Example
expression patterns labeled patient
data
Minimize
Discover hidden
3 Objective classification
patterns
errors
Handwritten
Market
Example digit
segmentation
recognition
Only BMU
Neuron BMU + neighbors
4 updates
Updates update weights
weights
Unsupervised Supervised LVQ
Aspect SOM (SOM-based)
Fraud
Color quantization
Example detection in
in images
transactions
Uses decreasing
No
radius
5 Neighborhood neighborhood
(Gaussian/Mexican
function
hat)
Predicting
Organizing
Example customer
document topics
churn
Topological map of Direct class
6 Output
similar patterns predictions
2D visualization of Medical
Example high-dimensional diagnosis
data (healthy/sick)
Quantization error, Precision,
7 Evaluation topographic recall,
accuracy accuracy
Measuring cluster Testing spam
Example quality in customer filter
data performance
Interpret clusters Provide labels
8 Human Role
post-training before training
Unsupervised Supervised LVQ
Aspect SOM (SOM-based)
Labeling
Analyzing new
Example training
customer segments
images for AI
Limited to
Discovers unknown
9 Flexibility predefined
patterns
classes
Classifying
Finding novel
Example known plant
patient subgroups
species
Exploratory data Predictive
10 Best For
analysis modeling
Building
Understanding
Example decision
complex datasets
systems
Key Differences Illustrated:
1. In customer analytics:
o Unsupervised SOM might reveal 5 natural customer clusters
o Supervised LVQ would predict "will buy" vs "won't buy"
2. In medical research:
o Unsupervised SOM could discover new disease subtypes
o Supervised LVQ would diagnose known conditions
3. In image processing:
o Unsupervised SOM groups similar images by visual features
o Supervised LVQ classifies images into predefined categories
6. Describe the role of SOM in classification tasks.
Ans: 1. Feature Extraction:
o Reduces high-D data to 2D, making classification easier.
o Example: Extracting key features from images before
classification.
2. Cluster Identification:
o Groups similar data points, which can then be labeled.
o Example: Identifying tumor types in medical data.
3. Preprocessing for Supervised Models:
o SOM can be used before SVM or k-NN for better accuracy.
o Example: Reducing 100D data to 2D before applying k-NN.
4. Anomaly Detection:
o Outliers appear far from clusters in the SOM.
o Example: Detecting fraudulent transactions.
Example: Handwritten Digit Classification
Input: Pixel values of digits.
SOM Output: 2D map where similar digits cluster.
Classification: Assign labels to clusters (e.g., "0", "1").