Introduction to ArtificialNeural
Networks
Dr Munawwar Iqbal (Assistant Professor)
Institute of Information Technology
Quaid-i-Azam University Islamabad, Pakistan
[email protected]
2.
What is MachineLearning?
• Machine Learning (ML) is a branch of artificial intelligence (AI)
that enables computers to learn from data and improve their
performance on tasks without being explicitly programmed for
each specific task.
• Instead of writing fixed rules for a program, ML uses algorithms
that can automatically identify patterns in data and make
predictions or decisions based on that data.
2
3.
How Does MachineLearning Work?
• Data Collection: Gather relevant data (e.g., images, text,
numbers).
• Training: Feed data to a machine learning model, which learns by
finding patterns or relationships.
• Evaluation: Test the model on new, unseen data to check how well
it performs.
• Prediction: Use the trained model to make decisions or
predictions on real-world data.
3
4.
Types of MachineLearning
• Supervised Learning: The model is trained on labeled data
(input-output pairs). Example: Predicting house prices based on
features like size, location.
• Unsupervised Learning: The model finds patterns in unlabeled
data. Example: Grouping customers by purchasing behavior
(clustering).
• Reinforcement Learning: The model learns by interacting with
an environment and receiving rewards or penalties. Example:
Training a robot to navigate a maze.
4
5.
Applications of MachineLearning
• Speech recognition (e.g., virtual assistants like Siri)
• Image recognition (e.g., detecting objects in photos)
• Recommendation systems (e.g., Netflix, Amazon)
• Fraud detection in banking
• Autonomous vehicles
5
6.
Life cycle ofmachine learning
1. Problem Definition
• Understand and define the problem clearly.
• Determine if ML is the right approach.
• Identify the goal (e.g., classification, regression, clustering).
2. Data Collection
• Gather the relevant data from various sources (databases, APIs,
sensors, web scraping).
• Data can be structured or unstructured.
6
7.
Life cycle ofmachine learning…
3. Data Preparation / Data Cleaning
• Handle missing values, duplicates, and outliers.
• Normalize, scale, or transform data if necessary.
• Convert categorical data into numerical formats (e.g., one-hot
encoding).
• Split data into training, validation, and testing sets.
4. Exploratory Data Analysis (EDA)
• Analyze the data to understand patterns, correlations, and
distributions.
• Visualize the data using graphs and plots.
• Identify important features.
7
8.
Life cycle ofmachine learning…
5. Feature Engineering
• Create new features from raw data to improve model performance.
• Select relevant features.
• Dimensionality reduction techniques (e.g., PCA) may be applied.
6. Model Selection
• Choose appropriate ML algorithms based on the problem type
(e.g., decision trees, SVM, neural networks).
• Try different models to compare performance.
7. Model Training
• Train the model using the training dataset.
• Tune hyperparameters to optimize performance.
8
9.
Life cycle ofmachine learning…
8. Model Evaluation
• Evaluate the model on validation/testing data using metrics
(accuracy, precision, recall, F1-score, RMSE, etc.).
• Check for overfitting or underfitting.
9. Model Deployment
• Deploy the trained model into a production environment.
• Integrate the model into applications or systems.
10. Monitoring & Maintenance
• Continuously monitor the model's performance.
• Update the model with new data when necessary.
• Handle model drift or degradation over time.
9
10.
Common Types ofMachine Learning Tasks
Classification
• Goal: Assign input data to one of several predefined categories.
• Example: Email spam detection (spam vs. not spam), image
recognition (cat vs. dog).
Regression
• Goal: Predict a continuous output value based on input data.
• Example: Predicting house prices based on features like size
and location.
Clustering
• Goal: Group similar data points together without predefined
labels.
• Example: Customer segmentation based on purchasing
behavior.
10
11.
Common Types ofMachine Learning Tasks…
Anomaly Detection
• Goal: Identify unusual data points that don’t conform to expected
patterns.
• Example: Fraud detection in credit card transactions.
Recommendation
• Goal: Suggest items to users based on their preferences and
behavior.
• Example: Movie or product recommendation systems.
Ranking
• Goal: Order items by relevance or importance.
• Example: Search engine results ranking.
Dimensionality Reduction
• Goal: Reduce the number of features while preserving important
information.
• Example: Visualizing high-dimensional data.
11
12.
Why are TasksImportant?
• They define what kind of data and labels you need.
• They guide the choice of algorithms and evaluation metrics.
• They shape the entire machine learning pipeline (data preparation,
model training, validation).
12
13.
Categorical Data inMachine Learning
1. Binary (Dichotomous) Categories
• Gender: Male, Female
• Yes/No Questions: Yes, No
• Loan Approval: Approved, Rejected
2. Nominal Categories (no inherent order)
• Color: Red, Blue, Green
• Country: USA, Canada, Mexico
• Product Type: Smartphone, Laptop, Tablet
• City Names: New York, Paris, Tokyo
13
14.
Categorical Data inMachine Learning…
3. Ordinal Categories (with a clear order or ranking)
• Education Level: High School, Bachelor’s, Master’s, PhD
• Customer Satisfaction: Very Unsatisfied, Unsatisfied, Neutral,
Satisfied, Very Satisfied
• Size: Small, Medium, Large, Extra Large
• Rating: 1 Star, 2 Stars, ..., 5 Stars
4. Time-Based Categories (when treated as categorical)
• Day of Week: Monday, Tuesday, ..., Sunday
• Month: January, February, ..., December
• Shift: Morning, Afternoon, Night
14
15.
Continuous Data inMachine Learning
🔬 1. Temperature
• Use case: Predicting climate patterns, controlling HVAC (Heating,
Ventilation, and Air Conditioning) systems.
• Data type: Float (e.g., 22.4°C, 35.1°C).
🏃 2. Height and Weight
• Use case: Health risk prediction, biometric identification.
• Data type: Float (e.g., 175.5 cm, 72.3 kg).
💰 3. Income
• Use case: Credit scoring, market segmentation.
• Data type: Float (e.g., $45,323.75).
15
16.
Continuous Data inMachine Learning…
📈 4. Stock Prices
• Use case: Time series forecasting, trading bots.
• Data type: Float (e.g., $132.56).
🏠 5. House Size or Price
• Use case: House price prediction.
• Data type: Square footage (e.g., 2100.5 sq ft), price in currency.
🚗 6. Speed or Acceleration
• Use case: Autonomous vehicles, motion detection.
• Data type: Float (e.g., 45.8 km/h, 3.2 m/s²).
16
17.
Continuous Data inMachine Learning…
️
⏱️7. Time Duration
• Use case: Performance evaluation, latency prediction.
• Data type: Float (e.g., 2.3 seconds).
💓 8. Heart Rate / Blood Pressure
• Use case: Medical diagnostics, fitness tracking.
• Data type: e.g., 72.6 bpm, 120.5 mmHg.
🧠 9. Sensor Readings
• Use case: IoT systems, predictive maintenance.
• Data type: Voltage, humidity, pH level, etc.
17
18.
Overfitting
• Overfitting inmachine learning occurs when a model learns the
training data too well, including its noise and random
fluctuations, instead of just the underlying patterns.
• This leads to excellent performance on the training set but poor
generalization to new, unseen data.
Key Symptoms of Overfitting:
• High accuracy on training data, but
• Low accuracy on validation/test data
18
19.
Overfitting…
Example:
If you're traininga model to recognize cats in images, an overfitted
model might learn to associate specific lighting or background
features (unique to the training images) with cats, rather than learning
what cats actually look like.
Visual Representation:
• Training Loss ↓ continues to decrease
• Validation Loss ↓ at first, then ↑ (starts increasing)
Common Causes:
• Model too complex (too many parameters)
• Too little data
• Too many training epochs
• Lack of regularization
19
20.
Overfitting…
1. Image Classification
Scenario:Training a deep neural network to classify cats vs. dogs
using a small dataset.
Overfitting Example: The model memorizes the exact fur
patterns or backgrounds in the training images. So, it performs
well on the training set but fails on new images where the
background is different or the pet is in a different pose.
Signs: Very high training accuracy (e.g., 99%) but much lower
validation/test accuracy (e.g., 70%).
20
21.
Overfitting…
2. Spam EmailDetection
Scenario: Building a classifier to detect spam emails based on
word frequency.
Overfitting Example: The model learns that emails with the
word “offer” are always spam because of the training set bias. In
reality, legitimate emails may also use that word.
Signs: Excellent performance on training emails, but lots of false
positives or false negatives in real-world usage.
21
22.
Underfitting
• Underfitting inmachine learning occurs when a model is too
simple to capture the underlying patterns in the data.
• It performs poorly on both the training data and unseen (test) data.
22
23.
What Causes Underfitting?
Modelis too simple
Example: Using linear regression on data that requires a
polynomial fit.
Not enough features
The model lacks the necessary input data to make accurate
predictions.
Excessive regularization (Home Work)
Overly penalizing model complexity (e.g., high L1/L2
regularization) can constrain learning.
Insufficient training
Training is stopped too early or with too few iterations (especially
in deep learning).
23
24.
Bestfit
• In machinelearning, "best fit" generally refers to how well a
model captures the underlying patterns in the training data without
overfitting or underfitting.
• It’s most often used in the context of regression, but the concept
applies across all supervised learning models.
The best fit is a balance where the model:
• Minimizes error on the training data (low bias)
• Generalizes well to unseen data (low variance)
This involves finding a hypothesis or function that maps input
features to output targets as accurately as possible.
24
25.
Evaluating "Best Fit“metrics
For regression:
• Mean Squared Error (MSE)
• R² score
• Root Mean Squared Error (RMSE)
For classification:
• Accuracy
• Precision / Recall / F1-Score
25
26.
Underfitting vs Overfitting
26
TermDescription Effect
Underfitting
Model too simple, cannot capture
complexity
High bias
Overfitting
Model too complex, memorizes
training data
High variance
Best Fit
Optimal complexity, balances bias
and variance
Generalizes well
Underfitting vs Overfitting
🎯Low Bias, Low Variance → Predictions tightly clustered at the
center (ideal).
➡️High Bias, Low Variance → Predictions tightly clustered but far
from center (consistently wrong).
🔀 Low Bias, High Variance → Predictions spread out around the
center (inconsistently right/wrong).
❌ High Bias, High Variance → Predictions spread out and far from
center (worst case).
28
29.
How to Managethe Trade-off
29
Situation Possible Fix
High Bias (Underfitting) Use a more complex model
Add more features
Reduce regularization
High Variance (Overfitting) Use simpler models
Regularization (e.g., L1, L2)
Get more training data
Artificial Neural Networks(ANNs)
• Artificial Neural Networks (ANNs) are computing systems
inspired by the biological neural networks that make up animal
brains.
• They consist of interconnected layers of nodes (called neurons).
• Neurons process data by responding to inputs, transforming them
through weighted connections, and producing outputs.
• ANNs are used in machine learning to recognize patterns, classify
data, and solve complex problems by learning from examples.
34
35.
History of ArtificialNeural Networks (ANNs)
Early Foundations (1940s-1950s)
• 1943: Warren McCulloch and Walter Pitts published a pioneering
paper describing a simplified model of a neuron as a binary
threshold unit, laying the groundwork for neural networks.
• 1949: Donald Hebb proposed the Hebbian learning rule,
suggesting that connections between neurons strengthen when they
activate together ("cells that fire together wire together").
35
36.
History of ArtificialNeural Networks (ANNs)…
Perceptron and Early Neural Networks (1950s-1960s)
• 1958: Frank Rosenblatt invented the Perceptron, one of the first
algorithms for supervised learning in neural networks, which could
classify input patterns.
• However, in 1969, Marvin Minsky and Seymour Papert published
Perceptrons, highlighting the limitations of single-layer
perceptrons, particularly their inability to solve non-linear
problems like XOR. This led to a temporary decline in neural
network research (known as the "AI winter").
36
37.
History of ArtificialNeural Networks (ANNs)…
Revival and Backpropagation (1980s)
• In the 1980s, neural networks regained interest thanks to the
development of the backpropagation algorithm (popularized by
Rumelhart, Hinton, and Williams in 1986), which allowed training
of multi-layer networks and could solve more complex, non-linear
problems.
• This era saw the rise of multi-layer perceptrons (MLPs) and
feedforward neural networks.
37
38.
History of ArtificialNeural Networks (ANNs)…
Advances and New Architectures (1990s-2000s)
• Introduction of various architectures such as:
• Convolutional Neural Networks (CNNs) (early work by
Fukushima in 1980s, but popularized later by Yann LeCun in the
1990s).
• Recurrent Neural Networks (RNNs) for sequence data.
• However, limited computing power and data constrained their
practical applications.
38
39.
History of ArtificialNeural Networks (ANNs)…
Deep Learning Revolution (2010s-Present)
• With the advent of powerful GPUs and large datasets, deep neural
networks became feasible.
• Breakthroughs include:
• AlexNet (2012): Won ImageNet competition, demonstrating the
power of deep CNNs.
• Development of architectures like LSTM (Long Short-Term
Memory) networks (1997 but widely applied later), Transformer
models, and more.
• Today, ANNs underpin much of modern AI, from image
recognition and language processing to autonomous systems.
39
40.
Biological Neurons —The Inspiration
Structure:
• A biological neuron has dendrites (input receivers), a cell body
(soma), and an axon (output sender).
• It receives electrical signals through dendrites, processes them
in the soma, and sends signals down the axon to other neurons.
Function:
• Neurons communicate via synapses, where signals get
transmitted chemically.
• The strength of these synapses (synaptic weights) affects how
signals propagate.
40
41.
Artificial Neurons —The Simplified Model
Inspired by biology but simplified for computation.
Basic components:
• Inputs: Analogous to dendrites receiving signals.
• Weights: Each input is multiplied by a weight, mimicking
synaptic strength.
• Summation: Weighted inputs are summed, similar to the soma
integrating signals.
• Activation function: This decides whether the neuron “fires,”
analogous to an action potential in biology.
Outputs from one artificial neuron can feed into others, forming a
network.
41
42.
Biological Neuron VSArtificial Neuron
Biological Neuron Artificial Neuron
Dendrites receive signals Inputs to the neuron
Synapses with varying
strength
Weights applied to inputs
Soma sums inputs and fires
Weighted sum + activation
function
Axon sends output signal Neuron output
42
43.
Why use ANNModel?
• To mimic learning in the brain by adjusting weights (like synaptic
strengths).
• To solve complex tasks (vision, speech, decision-making) using
networks of these artificial neurons.
• The abstraction balances biological realism with computational
feasibility.
43
44.
Home work
• Detailedconcept of activation functions in ANN e.g. ReLU,
Sigmoid, and Softmax
44
• Submit hand written home work at start of class