Introduction to Artificial Neural
Networks
Dr Munawwar Iqbal (Assistant Professor)
Institute of Information Technology
Quaid-i-Azam University Islamabad, Pakistan
mmic@qau.edu.pk
What is Machine Learning?
• Machine Learning (ML) is a branch of artificial intelligence (AI)
that enables computers to learn from data and improve their
performance on tasks without being explicitly programmed for
each specific task.
• Instead of writing fixed rules for a program, ML uses algorithms
that can automatically identify patterns in data and make
predictions or decisions based on that data.
2
How Does Machine Learning Work?
• Data Collection: Gather relevant data (e.g., images, text,
numbers).
• Training: Feed data to a machine learning model, which learns by
finding patterns or relationships.
• Evaluation: Test the model on new, unseen data to check how well
it performs.
• Prediction: Use the trained model to make decisions or
predictions on real-world data.
3
Types of Machine Learning
• Supervised Learning: The model is trained on labeled data
(input-output pairs). Example: Predicting house prices based on
features like size, location.
• Unsupervised Learning: The model finds patterns in unlabeled
data. Example: Grouping customers by purchasing behavior
(clustering).
• Reinforcement Learning: The model learns by interacting with
an environment and receiving rewards or penalties. Example:
Training a robot to navigate a maze.
4
Applications of Machine Learning
• Speech recognition (e.g., virtual assistants like Siri)
• Image recognition (e.g., detecting objects in photos)
• Recommendation systems (e.g., Netflix, Amazon)
• Fraud detection in banking
• Autonomous vehicles
5
Life cycle of machine learning
1. Problem Definition
• Understand and define the problem clearly.
• Determine if ML is the right approach.
• Identify the goal (e.g., classification, regression, clustering).
2. Data Collection
• Gather the relevant data from various sources (databases, APIs,
sensors, web scraping).
• Data can be structured or unstructured.
6
Life cycle of machine learning…
3. Data Preparation / Data Cleaning
• Handle missing values, duplicates, and outliers.
• Normalize, scale, or transform data if necessary.
• Convert categorical data into numerical formats (e.g., one-hot
encoding).
• Split data into training, validation, and testing sets.
4. Exploratory Data Analysis (EDA)
• Analyze the data to understand patterns, correlations, and
distributions.
• Visualize the data using graphs and plots.
• Identify important features.
7
Life cycle of machine learning…
5. Feature Engineering
• Create new features from raw data to improve model performance.
• Select relevant features.
• Dimensionality reduction techniques (e.g., PCA) may be applied.
6. Model Selection
• Choose appropriate ML algorithms based on the problem type
(e.g., decision trees, SVM, neural networks).
• Try different models to compare performance.
7. Model Training
• Train the model using the training dataset.
• Tune hyperparameters to optimize performance.
8
Life cycle of machine learning…
8. Model Evaluation
• Evaluate the model on validation/testing data using metrics
(accuracy, precision, recall, F1-score, RMSE, etc.).
• Check for overfitting or underfitting.
9. Model Deployment
• Deploy the trained model into a production environment.
• Integrate the model into applications or systems.
10. Monitoring & Maintenance
• Continuously monitor the model's performance.
• Update the model with new data when necessary.
• Handle model drift or degradation over time.
9
Common Types of Machine Learning Tasks
Classification
• Goal: Assign input data to one of several predefined categories.
• Example: Email spam detection (spam vs. not spam), image
recognition (cat vs. dog).
Regression
• Goal: Predict a continuous output value based on input data.
• Example: Predicting house prices based on features like size
and location.
Clustering
• Goal: Group similar data points together without predefined
labels.
• Example: Customer segmentation based on purchasing
behavior.
10
Common Types of Machine Learning Tasks…
Anomaly Detection
• Goal: Identify unusual data points that don’t conform to expected
patterns.
• Example: Fraud detection in credit card transactions.
Recommendation
• Goal: Suggest items to users based on their preferences and
behavior.
• Example: Movie or product recommendation systems.
Ranking
• Goal: Order items by relevance or importance.
• Example: Search engine results ranking.
Dimensionality Reduction
• Goal: Reduce the number of features while preserving important
information.
• Example: Visualizing high-dimensional data.
11
Why are Tasks Important?
• They define what kind of data and labels you need.
• They guide the choice of algorithms and evaluation metrics.
• They shape the entire machine learning pipeline (data preparation,
model training, validation).
12
Categorical Data in Machine Learning
1. Binary (Dichotomous) Categories
• Gender: Male, Female
• Yes/No Questions: Yes, No
• Loan Approval: Approved, Rejected
2. Nominal Categories (no inherent order)
• Color: Red, Blue, Green
• Country: USA, Canada, Mexico
• Product Type: Smartphone, Laptop, Tablet
• City Names: New York, Paris, Tokyo
13
Categorical Data in Machine Learning…
3. Ordinal Categories (with a clear order or ranking)
• Education Level: High School, Bachelor’s, Master’s, PhD
• Customer Satisfaction: Very Unsatisfied, Unsatisfied, Neutral,
Satisfied, Very Satisfied
• Size: Small, Medium, Large, Extra Large
• Rating: 1 Star, 2 Stars, ..., 5 Stars
4. Time-Based Categories (when treated as categorical)
• Day of Week: Monday, Tuesday, ..., Sunday
• Month: January, February, ..., December
• Shift: Morning, Afternoon, Night
14
Continuous Data in Machine Learning
🔬 1. Temperature
• Use case: Predicting climate patterns, controlling HVAC (Heating,
Ventilation, and Air Conditioning) systems.
• Data type: Float (e.g., 22.4°C, 35.1°C).
🏃 2. Height and Weight
• Use case: Health risk prediction, biometric identification.
• Data type: Float (e.g., 175.5 cm, 72.3 kg).
💰 3. Income
• Use case: Credit scoring, market segmentation.
• Data type: Float (e.g., $45,323.75).
15
Continuous Data in Machine Learning…
📈 4. Stock Prices
• Use case: Time series forecasting, trading bots.
• Data type: Float (e.g., $132.56).
🏠 5. House Size or Price
• Use case: House price prediction.
• Data type: Square footage (e.g., 2100.5 sq ft), price in currency.
🚗 6. Speed or Acceleration
• Use case: Autonomous vehicles, motion detection.
• Data type: Float (e.g., 45.8 km/h, 3.2 m/s²).
16
Continuous Data in Machine Learning…
️
⏱️7. Time Duration
• Use case: Performance evaluation, latency prediction.
• Data type: Float (e.g., 2.3 seconds).
💓 8. Heart Rate / Blood Pressure
• Use case: Medical diagnostics, fitness tracking.
• Data type: e.g., 72.6 bpm, 120.5 mmHg.
🧠 9. Sensor Readings
• Use case: IoT systems, predictive maintenance.
• Data type: Voltage, humidity, pH level, etc.
17
Overfitting
• Overfitting in machine learning occurs when a model learns the
training data too well, including its noise and random
fluctuations, instead of just the underlying patterns.
• This leads to excellent performance on the training set but poor
generalization to new, unseen data.
Key Symptoms of Overfitting:
• High accuracy on training data, but
• Low accuracy on validation/test data
18
Overfitting…
Example:
If you're training a model to recognize cats in images, an overfitted
model might learn to associate specific lighting or background
features (unique to the training images) with cats, rather than learning
what cats actually look like.
Visual Representation:
• Training Loss ↓ continues to decrease
• Validation Loss ↓ at first, then ↑ (starts increasing)
Common Causes:
• Model too complex (too many parameters)
• Too little data
• Too many training epochs
• Lack of regularization
19
Overfitting…
1. Image Classification
Scenario: Training a deep neural network to classify cats vs. dogs
using a small dataset.
Overfitting Example: The model memorizes the exact fur
patterns or backgrounds in the training images. So, it performs
well on the training set but fails on new images where the
background is different or the pet is in a different pose.
Signs: Very high training accuracy (e.g., 99%) but much lower
validation/test accuracy (e.g., 70%).
20
Overfitting…
2. Spam Email Detection
Scenario: Building a classifier to detect spam emails based on
word frequency.
Overfitting Example: The model learns that emails with the
word “offer” are always spam because of the training set bias. In
reality, legitimate emails may also use that word.
Signs: Excellent performance on training emails, but lots of false
positives or false negatives in real-world usage.
21
Underfitting
• Underfitting in machine learning occurs when a model is too
simple to capture the underlying patterns in the data.
• It performs poorly on both the training data and unseen (test) data.
22
What Causes Underfitting?
Model is too simple
Example: Using linear regression on data that requires a
polynomial fit.
Not enough features
The model lacks the necessary input data to make accurate
predictions.
Excessive regularization (Home Work)
Overly penalizing model complexity (e.g., high L1/L2
regularization) can constrain learning.
Insufficient training
Training is stopped too early or with too few iterations (especially
in deep learning).
23
Bestfit
• In machine learning, "best fit" generally refers to how well a
model captures the underlying patterns in the training data without
overfitting or underfitting.
• It’s most often used in the context of regression, but the concept
applies across all supervised learning models.
The best fit is a balance where the model:
• Minimizes error on the training data (low bias)
• Generalizes well to unseen data (low variance)
This involves finding a hypothesis or function that maps input
features to output targets as accurately as possible.
24
Evaluating "Best Fit“ metrics
For regression:
• Mean Squared Error (MSE)
• R² score
• Root Mean Squared Error (RMSE)
For classification:
• Accuracy
• Precision / Recall / F1-Score
25
Underfitting vs Overfitting
26
Term Description Effect
Underfitting
Model too simple, cannot capture
complexity
High bias
Overfitting
Model too complex, memorizes
training data
High variance
Best Fit
Optimal complexity, balances bias
and variance
Generalizes well
Underfitting vs Overfitting
27
Underfitting vs Overfitting
🎯 Low Bias, Low Variance → Predictions tightly clustered at the
center (ideal).
➡️High Bias, Low Variance → Predictions tightly clustered but far
from center (consistently wrong).
🔀 Low Bias, High Variance → Predictions spread out around the
center (inconsistently right/wrong).
❌ High Bias, High Variance → Predictions spread out and far from
center (worst case).
28
How to Manage the Trade-off
29
Situation Possible Fix
High Bias (Underfitting) Use a more complex model
Add more features
Reduce regularization
High Variance (Overfitting) Use simpler models
Regularization (e.g., L1, L2)
Get more training data
Confusion Matrix
30
Confusion Matrix…
31
Confusion Matrix…
32
Topics discussed in this section:
• Overview of Neural Networks:
• Definition,
• History,
• Biological Neurons,
• Models of Neurons,
• Artificial & Biological Neural Networks, and
• Neural Network Models.
33
Artificial Neural Networks (ANNs)
• Artificial Neural Networks (ANNs) are computing systems
inspired by the biological neural networks that make up animal
brains.
• They consist of interconnected layers of nodes (called neurons).
• Neurons process data by responding to inputs, transforming them
through weighted connections, and producing outputs.
• ANNs are used in machine learning to recognize patterns, classify
data, and solve complex problems by learning from examples.
34
History of Artificial Neural Networks (ANNs)
Early Foundations (1940s-1950s)
• 1943: Warren McCulloch and Walter Pitts published a pioneering
paper describing a simplified model of a neuron as a binary
threshold unit, laying the groundwork for neural networks.
• 1949: Donald Hebb proposed the Hebbian learning rule,
suggesting that connections between neurons strengthen when they
activate together ("cells that fire together wire together").
35
History of Artificial Neural Networks (ANNs)…
Perceptron and Early Neural Networks (1950s-1960s)
• 1958: Frank Rosenblatt invented the Perceptron, one of the first
algorithms for supervised learning in neural networks, which could
classify input patterns.
• However, in 1969, Marvin Minsky and Seymour Papert published
Perceptrons, highlighting the limitations of single-layer
perceptrons, particularly their inability to solve non-linear
problems like XOR. This led to a temporary decline in neural
network research (known as the "AI winter").
36
History of Artificial Neural Networks (ANNs)…
Revival and Backpropagation (1980s)
• In the 1980s, neural networks regained interest thanks to the
development of the backpropagation algorithm (popularized by
Rumelhart, Hinton, and Williams in 1986), which allowed training
of multi-layer networks and could solve more complex, non-linear
problems.
• This era saw the rise of multi-layer perceptrons (MLPs) and
feedforward neural networks.
37
History of Artificial Neural Networks (ANNs)…
Advances and New Architectures (1990s-2000s)
• Introduction of various architectures such as:
• Convolutional Neural Networks (CNNs) (early work by
Fukushima in 1980s, but popularized later by Yann LeCun in the
1990s).
• Recurrent Neural Networks (RNNs) for sequence data.
• However, limited computing power and data constrained their
practical applications.
38
History of Artificial Neural Networks (ANNs)…
Deep Learning Revolution (2010s-Present)
• With the advent of powerful GPUs and large datasets, deep neural
networks became feasible.
• Breakthroughs include:
• AlexNet (2012): Won ImageNet competition, demonstrating the
power of deep CNNs.
• Development of architectures like LSTM (Long Short-Term
Memory) networks (1997 but widely applied later), Transformer
models, and more.
• Today, ANNs underpin much of modern AI, from image
recognition and language processing to autonomous systems.
39
Biological Neurons — The Inspiration
Structure:
• A biological neuron has dendrites (input receivers), a cell body
(soma), and an axon (output sender).
• It receives electrical signals through dendrites, processes them
in the soma, and sends signals down the axon to other neurons.
Function:
• Neurons communicate via synapses, where signals get
transmitted chemically.
• The strength of these synapses (synaptic weights) affects how
signals propagate.
40
Artificial Neurons — The Simplified Model
Inspired by biology but simplified for computation.
Basic components:
• Inputs: Analogous to dendrites receiving signals.
• Weights: Each input is multiplied by a weight, mimicking
synaptic strength.
• Summation: Weighted inputs are summed, similar to the soma
integrating signals.
• Activation function: This decides whether the neuron “fires,”
analogous to an action potential in biology.
Outputs from one artificial neuron can feed into others, forming a
network.
41
Biological Neuron VS Artificial Neuron
Biological Neuron Artificial Neuron
Dendrites receive signals Inputs to the neuron
Synapses with varying
strength
Weights applied to inputs
Soma sums inputs and fires
Weighted sum + activation
function
Axon sends output signal Neuron output
42
Why use ANN Model?
• To mimic learning in the brain by adjusting weights (like synaptic
strengths).
• To solve complex tasks (vision, speech, decision-making) using
networks of these artificial neurons.
• The abstraction balances biological realism with computational
feasibility.
43
Home work
• Detailed concept of activation functions in ANN e.g. ReLU,
Sigmoid, and Softmax
44
• Submit hand written home work at start of class

Lecture 01-02 ANN BS IT (Overview).pptxx

  • 1.
    Introduction to ArtificialNeural Networks Dr Munawwar Iqbal (Assistant Professor) Institute of Information Technology Quaid-i-Azam University Islamabad, Pakistan [email protected]
  • 2.
    What is MachineLearning? • Machine Learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn from data and improve their performance on tasks without being explicitly programmed for each specific task. • Instead of writing fixed rules for a program, ML uses algorithms that can automatically identify patterns in data and make predictions or decisions based on that data. 2
  • 3.
    How Does MachineLearning Work? • Data Collection: Gather relevant data (e.g., images, text, numbers). • Training: Feed data to a machine learning model, which learns by finding patterns or relationships. • Evaluation: Test the model on new, unseen data to check how well it performs. • Prediction: Use the trained model to make decisions or predictions on real-world data. 3
  • 4.
    Types of MachineLearning • Supervised Learning: The model is trained on labeled data (input-output pairs). Example: Predicting house prices based on features like size, location. • Unsupervised Learning: The model finds patterns in unlabeled data. Example: Grouping customers by purchasing behavior (clustering). • Reinforcement Learning: The model learns by interacting with an environment and receiving rewards or penalties. Example: Training a robot to navigate a maze. 4
  • 5.
    Applications of MachineLearning • Speech recognition (e.g., virtual assistants like Siri) • Image recognition (e.g., detecting objects in photos) • Recommendation systems (e.g., Netflix, Amazon) • Fraud detection in banking • Autonomous vehicles 5
  • 6.
    Life cycle ofmachine learning 1. Problem Definition • Understand and define the problem clearly. • Determine if ML is the right approach. • Identify the goal (e.g., classification, regression, clustering). 2. Data Collection • Gather the relevant data from various sources (databases, APIs, sensors, web scraping). • Data can be structured or unstructured. 6
  • 7.
    Life cycle ofmachine learning… 3. Data Preparation / Data Cleaning • Handle missing values, duplicates, and outliers. • Normalize, scale, or transform data if necessary. • Convert categorical data into numerical formats (e.g., one-hot encoding). • Split data into training, validation, and testing sets. 4. Exploratory Data Analysis (EDA) • Analyze the data to understand patterns, correlations, and distributions. • Visualize the data using graphs and plots. • Identify important features. 7
  • 8.
    Life cycle ofmachine learning… 5. Feature Engineering • Create new features from raw data to improve model performance. • Select relevant features. • Dimensionality reduction techniques (e.g., PCA) may be applied. 6. Model Selection • Choose appropriate ML algorithms based on the problem type (e.g., decision trees, SVM, neural networks). • Try different models to compare performance. 7. Model Training • Train the model using the training dataset. • Tune hyperparameters to optimize performance. 8
  • 9.
    Life cycle ofmachine learning… 8. Model Evaluation • Evaluate the model on validation/testing data using metrics (accuracy, precision, recall, F1-score, RMSE, etc.). • Check for overfitting or underfitting. 9. Model Deployment • Deploy the trained model into a production environment. • Integrate the model into applications or systems. 10. Monitoring & Maintenance • Continuously monitor the model's performance. • Update the model with new data when necessary. • Handle model drift or degradation over time. 9
  • 10.
    Common Types ofMachine Learning Tasks Classification • Goal: Assign input data to one of several predefined categories. • Example: Email spam detection (spam vs. not spam), image recognition (cat vs. dog). Regression • Goal: Predict a continuous output value based on input data. • Example: Predicting house prices based on features like size and location. Clustering • Goal: Group similar data points together without predefined labels. • Example: Customer segmentation based on purchasing behavior. 10
  • 11.
    Common Types ofMachine Learning Tasks… Anomaly Detection • Goal: Identify unusual data points that don’t conform to expected patterns. • Example: Fraud detection in credit card transactions. Recommendation • Goal: Suggest items to users based on their preferences and behavior. • Example: Movie or product recommendation systems. Ranking • Goal: Order items by relevance or importance. • Example: Search engine results ranking. Dimensionality Reduction • Goal: Reduce the number of features while preserving important information. • Example: Visualizing high-dimensional data. 11
  • 12.
    Why are TasksImportant? • They define what kind of data and labels you need. • They guide the choice of algorithms and evaluation metrics. • They shape the entire machine learning pipeline (data preparation, model training, validation). 12
  • 13.
    Categorical Data inMachine Learning 1. Binary (Dichotomous) Categories • Gender: Male, Female • Yes/No Questions: Yes, No • Loan Approval: Approved, Rejected 2. Nominal Categories (no inherent order) • Color: Red, Blue, Green • Country: USA, Canada, Mexico • Product Type: Smartphone, Laptop, Tablet • City Names: New York, Paris, Tokyo 13
  • 14.
    Categorical Data inMachine Learning… 3. Ordinal Categories (with a clear order or ranking) • Education Level: High School, Bachelor’s, Master’s, PhD • Customer Satisfaction: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied • Size: Small, Medium, Large, Extra Large • Rating: 1 Star, 2 Stars, ..., 5 Stars 4. Time-Based Categories (when treated as categorical) • Day of Week: Monday, Tuesday, ..., Sunday • Month: January, February, ..., December • Shift: Morning, Afternoon, Night 14
  • 15.
    Continuous Data inMachine Learning 🔬 1. Temperature • Use case: Predicting climate patterns, controlling HVAC (Heating, Ventilation, and Air Conditioning) systems. • Data type: Float (e.g., 22.4°C, 35.1°C). 🏃 2. Height and Weight • Use case: Health risk prediction, biometric identification. • Data type: Float (e.g., 175.5 cm, 72.3 kg). 💰 3. Income • Use case: Credit scoring, market segmentation. • Data type: Float (e.g., $45,323.75). 15
  • 16.
    Continuous Data inMachine Learning… 📈 4. Stock Prices • Use case: Time series forecasting, trading bots. • Data type: Float (e.g., $132.56). 🏠 5. House Size or Price • Use case: House price prediction. • Data type: Square footage (e.g., 2100.5 sq ft), price in currency. 🚗 6. Speed or Acceleration • Use case: Autonomous vehicles, motion detection. • Data type: Float (e.g., 45.8 km/h, 3.2 m/s²). 16
  • 17.
    Continuous Data inMachine Learning… ️ ⏱️7. Time Duration • Use case: Performance evaluation, latency prediction. • Data type: Float (e.g., 2.3 seconds). 💓 8. Heart Rate / Blood Pressure • Use case: Medical diagnostics, fitness tracking. • Data type: e.g., 72.6 bpm, 120.5 mmHg. 🧠 9. Sensor Readings • Use case: IoT systems, predictive maintenance. • Data type: Voltage, humidity, pH level, etc. 17
  • 18.
    Overfitting • Overfitting inmachine learning occurs when a model learns the training data too well, including its noise and random fluctuations, instead of just the underlying patterns. • This leads to excellent performance on the training set but poor generalization to new, unseen data. Key Symptoms of Overfitting: • High accuracy on training data, but • Low accuracy on validation/test data 18
  • 19.
    Overfitting… Example: If you're traininga model to recognize cats in images, an overfitted model might learn to associate specific lighting or background features (unique to the training images) with cats, rather than learning what cats actually look like. Visual Representation: • Training Loss ↓ continues to decrease • Validation Loss ↓ at first, then ↑ (starts increasing) Common Causes: • Model too complex (too many parameters) • Too little data • Too many training epochs • Lack of regularization 19
  • 20.
    Overfitting… 1. Image Classification Scenario:Training a deep neural network to classify cats vs. dogs using a small dataset. Overfitting Example: The model memorizes the exact fur patterns or backgrounds in the training images. So, it performs well on the training set but fails on new images where the background is different or the pet is in a different pose. Signs: Very high training accuracy (e.g., 99%) but much lower validation/test accuracy (e.g., 70%). 20
  • 21.
    Overfitting… 2. Spam EmailDetection Scenario: Building a classifier to detect spam emails based on word frequency. Overfitting Example: The model learns that emails with the word “offer” are always spam because of the training set bias. In reality, legitimate emails may also use that word. Signs: Excellent performance on training emails, but lots of false positives or false negatives in real-world usage. 21
  • 22.
    Underfitting • Underfitting inmachine learning occurs when a model is too simple to capture the underlying patterns in the data. • It performs poorly on both the training data and unseen (test) data. 22
  • 23.
    What Causes Underfitting? Modelis too simple Example: Using linear regression on data that requires a polynomial fit. Not enough features The model lacks the necessary input data to make accurate predictions. Excessive regularization (Home Work) Overly penalizing model complexity (e.g., high L1/L2 regularization) can constrain learning. Insufficient training Training is stopped too early or with too few iterations (especially in deep learning). 23
  • 24.
    Bestfit • In machinelearning, "best fit" generally refers to how well a model captures the underlying patterns in the training data without overfitting or underfitting. • It’s most often used in the context of regression, but the concept applies across all supervised learning models. The best fit is a balance where the model: • Minimizes error on the training data (low bias) • Generalizes well to unseen data (low variance) This involves finding a hypothesis or function that maps input features to output targets as accurately as possible. 24
  • 25.
    Evaluating "Best Fit“metrics For regression: • Mean Squared Error (MSE) • R² score • Root Mean Squared Error (RMSE) For classification: • Accuracy • Precision / Recall / F1-Score 25
  • 26.
    Underfitting vs Overfitting 26 TermDescription Effect Underfitting Model too simple, cannot capture complexity High bias Overfitting Model too complex, memorizes training data High variance Best Fit Optimal complexity, balances bias and variance Generalizes well
  • 27.
  • 28.
    Underfitting vs Overfitting 🎯Low Bias, Low Variance → Predictions tightly clustered at the center (ideal). ➡️High Bias, Low Variance → Predictions tightly clustered but far from center (consistently wrong). 🔀 Low Bias, High Variance → Predictions spread out around the center (inconsistently right/wrong). ❌ High Bias, High Variance → Predictions spread out and far from center (worst case). 28
  • 29.
    How to Managethe Trade-off 29 Situation Possible Fix High Bias (Underfitting) Use a more complex model Add more features Reduce regularization High Variance (Overfitting) Use simpler models Regularization (e.g., L1, L2) Get more training data
  • 30.
  • 31.
  • 32.
  • 33.
    Topics discussed inthis section: • Overview of Neural Networks: • Definition, • History, • Biological Neurons, • Models of Neurons, • Artificial & Biological Neural Networks, and • Neural Network Models. 33
  • 34.
    Artificial Neural Networks(ANNs) • Artificial Neural Networks (ANNs) are computing systems inspired by the biological neural networks that make up animal brains. • They consist of interconnected layers of nodes (called neurons). • Neurons process data by responding to inputs, transforming them through weighted connections, and producing outputs. • ANNs are used in machine learning to recognize patterns, classify data, and solve complex problems by learning from examples. 34
  • 35.
    History of ArtificialNeural Networks (ANNs) Early Foundations (1940s-1950s) • 1943: Warren McCulloch and Walter Pitts published a pioneering paper describing a simplified model of a neuron as a binary threshold unit, laying the groundwork for neural networks. • 1949: Donald Hebb proposed the Hebbian learning rule, suggesting that connections between neurons strengthen when they activate together ("cells that fire together wire together"). 35
  • 36.
    History of ArtificialNeural Networks (ANNs)… Perceptron and Early Neural Networks (1950s-1960s) • 1958: Frank Rosenblatt invented the Perceptron, one of the first algorithms for supervised learning in neural networks, which could classify input patterns. • However, in 1969, Marvin Minsky and Seymour Papert published Perceptrons, highlighting the limitations of single-layer perceptrons, particularly their inability to solve non-linear problems like XOR. This led to a temporary decline in neural network research (known as the "AI winter"). 36
  • 37.
    History of ArtificialNeural Networks (ANNs)… Revival and Backpropagation (1980s) • In the 1980s, neural networks regained interest thanks to the development of the backpropagation algorithm (popularized by Rumelhart, Hinton, and Williams in 1986), which allowed training of multi-layer networks and could solve more complex, non-linear problems. • This era saw the rise of multi-layer perceptrons (MLPs) and feedforward neural networks. 37
  • 38.
    History of ArtificialNeural Networks (ANNs)… Advances and New Architectures (1990s-2000s) • Introduction of various architectures such as: • Convolutional Neural Networks (CNNs) (early work by Fukushima in 1980s, but popularized later by Yann LeCun in the 1990s). • Recurrent Neural Networks (RNNs) for sequence data. • However, limited computing power and data constrained their practical applications. 38
  • 39.
    History of ArtificialNeural Networks (ANNs)… Deep Learning Revolution (2010s-Present) • With the advent of powerful GPUs and large datasets, deep neural networks became feasible. • Breakthroughs include: • AlexNet (2012): Won ImageNet competition, demonstrating the power of deep CNNs. • Development of architectures like LSTM (Long Short-Term Memory) networks (1997 but widely applied later), Transformer models, and more. • Today, ANNs underpin much of modern AI, from image recognition and language processing to autonomous systems. 39
  • 40.
    Biological Neurons —The Inspiration Structure: • A biological neuron has dendrites (input receivers), a cell body (soma), and an axon (output sender). • It receives electrical signals through dendrites, processes them in the soma, and sends signals down the axon to other neurons. Function: • Neurons communicate via synapses, where signals get transmitted chemically. • The strength of these synapses (synaptic weights) affects how signals propagate. 40
  • 41.
    Artificial Neurons —The Simplified Model Inspired by biology but simplified for computation. Basic components: • Inputs: Analogous to dendrites receiving signals. • Weights: Each input is multiplied by a weight, mimicking synaptic strength. • Summation: Weighted inputs are summed, similar to the soma integrating signals. • Activation function: This decides whether the neuron “fires,” analogous to an action potential in biology. Outputs from one artificial neuron can feed into others, forming a network. 41
  • 42.
    Biological Neuron VSArtificial Neuron Biological Neuron Artificial Neuron Dendrites receive signals Inputs to the neuron Synapses with varying strength Weights applied to inputs Soma sums inputs and fires Weighted sum + activation function Axon sends output signal Neuron output 42
  • 43.
    Why use ANNModel? • To mimic learning in the brain by adjusting weights (like synaptic strengths). • To solve complex tasks (vision, speech, decision-making) using networks of these artificial neurons. • The abstraction balances biological realism with computational feasibility. 43
  • 44.
    Home work • Detailedconcept of activation functions in ANN e.g. ReLU, Sigmoid, and Softmax 44 • Submit hand written home work at start of class