2 - Intro To Neural Network

The document provides an overview of neural networks, particularly deep neural networks (DNNs), explaining their structure, training process, and applications in tasks like image recognition and natural language processing. It covers key concepts such as forward and backward propagation, gradient descent, loss functions, activation functions, and techniques to prevent overfitting. Additionally, it discusses data scaling methods, learning rates, weight initialization, and the steps involved in training a neural network.

Uploaded by

James Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

2 - Intro To Neural Network

Uploaded by

James Acosta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Intro to Neural Network

Neural Network
A neural network is a type of machine learning algorithm modeled after the structure and function
of the human brain. A deep neural network (DNN) is a type of neural network that has multiple
layers, typically more than two or three. The extra layers allow the network to learn and represent
more complex and abstract features of the input data. DNNs are particularly useful for tasks such
as image and speech recognition, natural language processing, and decision making.
The layers of a DNN are made up of interconnected "neurons," which process and transmit
information. Each neuron takes in a set of inputs, performs a computation on them, and produces
an output. The computation is typically a simple mathematical operation, such as a dot product
followed by a non-linear function called an activation function. The outputs from one layer of
neurons are then fed as inputs to the next layer, and so on.
Training a DNN involves showing it a large dataset of labeled examples and adjusting the weights
of the connections between neurons so that the network can correctly classify new examples. This
is typically done using a variant of stochastic gradient descent, an optimization algorithm that
adjusts the weights incrementally based on the error the network makes on the training examples.
DNNs have had great success in recent years, achieving state-of-the-art results on a variety of tasks
such as image and speech recognition, natural language processing and decision making. With the
help of large amounts of data and powerful hardware, DNNs are able to learn rich, abstract
representations of the input data, allowing them to generalize well to new examples.
One of the main challenges in using DNNs is the need for a large amount of labeled training data
and computational resources. The training process can also be time-consuming and requires a high
level of expertise. Additionally, DNNs can be difficult to interpret, making it hard to understand
how they are making decisions.
Despite these challenges, DNNs have proven to be a powerful tool for solving a wide range of
problems and are an active area of research in the field of machine learning.

Forward Propagation
Forward propagation is the process of passing input data through a neural network to generate
output predictions. It is the first step in the process of training a neural network and also used in
the prediction phase.
In forward propagation, the input data is passed through the layers of the network in a sequential
manner, starting from the input layer and ending at the output layer. Each layer performs a
computation on the inputs it receives, and passes the result to the next layer.
The computation performed by each neuron in a layer typically consists of two steps: a dot product
of the input vector with the weight vector, and an activation function, which is applied element-
wise to the dot product. The dot product is used to compute a weighted sum of the inputs and the
weights, which represents the raw output of the neuron. The activation function is used to introduce
non-linearity into the network and is used to calculate the final output of the neuron.
In this way, the input data is transformed and propagated through the layers of the network, until
the final output predictions are generated. The output predictions are then compared with the true
labels to compute the error, which is used to update the weights of the network during the training
process.
It is worth noting that forward propagation is a deterministic process, meaning that given the same
input and the same weights, it will always produce the same output.

Backward Propagation
Backward propagation, also known as backpropagation, is the process of updating the weights of
a neural network in order to minimize the error between the network's predictions and the true
labels. It is the second step in the process of training a neural network, following the forward
propagation step.
The basic idea behind backward propagation is to use the error computed during the forward
propagation step to adjust the weights of the network in a way that will reduce the error on the
next forward pass. This is done using a technique called gradient descent, which is an optimization
algorithm that adjusts the weights incrementally based on the error the network makes on the
training examples.
The process of backward propagation involves computing the gradient of the error with respect to
the weights of the network. This gradient tells us how much the error changes as we adjust the
weights.
The calculation of the gradient starts from the output layer, where the error is computed directly.
Then it propagates backwards through the layers, computing the gradient for each layer using the
gradients of the next layer. This process is called backpropagation of errors.
The backpropagation algorithm is an efficient way to calculate the gradient of the error with respect
to the weights, which is done using the chain rule of calculus. This allows for efficient training of
deep neural networks, which have many layers.
It is worth noting that the optimization algorithm used in the backward propagation can be different
from simple gradient descent, such as variations of gradient descent like Adam, Adagrad, etc. They
all have the same goal of reducing the error but have different approaches to achieve that.

Gradient Descent
Gradient descent is an optimization algorithm that is used to update the weights of a neural network
during the training process. The goal of the algorithm is to find the set of weights that minimizes
the error between the network's predictions and the true labels.
The basic idea behind gradient descent is to adjust the weights of the network in the direction that
reduces the error. This is done by computing the gradient of the error with respect to the weights
and adjusting the weights in the opposite direction of the gradient. The amount by which the
weights are adjusted is determined by the learning rate, a hyperparameter that controls the step
size.
There are different variants of gradient descent algorithm, but the most common one is called
stochastic gradient descent (SGD). In SGD, the weights are updated after each training example.
The gradient is computed for the current training example and the weights are updated accordingly.
Another variant of gradient descent is called mini-batch gradient descent, where the weights are
updated after each mini-batch of training examples, rather than after each example. This is more
computationally efficient than SGD because it allows the use of vectorized operations, which are
faster than iterating over individual examples.

Loss Function
In a neural network, the loss function is used to measure the error between the network's
predictions and the true labels. The goal of the training process is to find the set of weights that
minimizes the loss function. There are several different types of loss functions that can be used in
a neural network, each with its own strengths and weaknesses. Some of the most commonly used
loss functions are:
Mean Squared Error (MSE): This is a common loss function for regression problems. It measures
the average squared difference between the predicted output and the true output. It is also called
as L2 loss.
Mean Absolute Error (MAE): This is also common loss function for regression problems. It
measures the average absolute difference between the predicted output and the true output. It is
also called as L1 loss.
L2 loss is more sensitive to outliers then L1 loss. When there is large difference in predicted and
actual value, the error get big and squaring a big number, the value gets bigger.
Root Mean Squared Error (RMSE): It apply the square root to MSE.
Binary Cross-Entropy (BCE): This is a common loss function for binary classification problems.
It measures the distance between the predicted probability of the positive class and the true binary
label.
Categorical Cross-Entropy (CCE): This is a common loss function for multi-class classification
problems. It measures the distance between the predicted probability distribution over all classes
and the true categorical label.
The choice of loss function depends on the specific problem that the network is being used to
solve. For example, mean squared error is a common choice for regression problems, while cross-
entropy loss is a common choice for classification problems. It is also worth noting that the choice
of loss function can have a significant impact on the performance of the network and the speed of
convergence during training.

Activation Function
An activation function is a non-linear function applied to the output of each neuron in a neural
network. It is used to introduce non-linearity into the network and is essential for the network to
be able to learn and represent complex patterns in the data. There are several different types of
activation functions that can be used in a neural network, each with their own strengths and
weaknesses. Some of the most commonly used activation functions are:
Sigmoid: The sigmoid function maps any real-valued number to a value between 0 and 1. It is
commonly used in the output layer of a binary classification network.
ReLU (Rectified Linear Unit): The ReLU function maps any negative value to 0 and any positive
value to itself. It is computationally efficient and often used as the default activation function in
deep neural networks.
Tanh (hyperbolic tangent): The tanh function maps any real-valued number to a value between -1
and 1. It is similar to sigmoid function but maps the input to a wider range of values.
The choice of activation function depends on the specific problem that the network is being used
to solve. For example, sigmoid activation functions is commonly used in the output layer of a
classification network, while ReLU and its variants are commonly used in the hidden layers of a
deep neural network. It is also worth noting that the choice of activation function can have a
significant impact on the performance of the network and the speed of convergence during training.

Overfitting Problem
Overfitting is a common problem in machine learning, and it occurs when a model is trained too
well on the training data and performs poorly on unseen data. In other words, the model has learned
the noise in the training data and it is not generalizing well.
In the case of neural networks, overfitting occurs when the network has too many parameters and
is able to fit the training data perfectly, but it is not able to generalize well to new data. This is
because the network has learned the noise in the training data and it is not able to generalize to
new examples that it has not seen before.
There are several ways to detect and prevent overfitting in a neural network, such as:
1) Using a smaller network: By reducing the number of parameters in the network, the model
is less likely to fit the noise in the training data.
2) Using regularization techniques: Techniques like dropout and early stopping help to
prevent overfitting by adding noise to the training process and by regularizing the model's
parameters.
3) Using a larger dataset: By increasing the amount of training data, the model is less likely
to overfit the data.
4) Using cross-validation: By evaluating the model's performance on a held-out validation
dataset, it is possible to detect overfitting and adjust the model accordingly.
5) Monitoring the learning curve: By monitoring the performance of the model on the training
and validation datasets during the training process, it is possible to detect overfitting and
adjust the model accordingly.
6) DropOut: Dropout is a technique for regularizing neural networks by randomly dropping
out (i.e., setting to zero) a certain percentage of the neurons during the training process.
The idea behind dropout is to prevent the neurons from co-adapting too much, which can
lead to overfitting. By randomly dropping out neurons during each forward pass, the
network is forced to learn multiple independent representations of the input data, which
can make it more robust to overfitting. During the training process, dropout is applied to
the neurons of the network by randomly setting a certain percentage of the neurons' output
to zero. Dropout rate is a hyperparameter that controls the percentage of neurons to drop
out. During the prediction phase, dropout is not applied to the network, and all neurons are
used to make predictions. Dropout is a simple yet powerful technique for regularizing
neural networks, and it has been shown to be effective in reducing overfitting and
improving the generalization performance of deep neural networks.
7) DropConnect: DropConnect is a regularization technique that is similar to dropout. The
main difference is that instead of dropping out entire neurons, It drops out individual
connections between neurons. It helps to prevent overfitting by adding noise to the training
process and regularizing the model's parameters. It is a way of applying dropout to the
weights instead of the activations. By randomly dropping out connections, the network is
forced to learn multiple independent representations of the input data, which can make it
more robust to overfitting.

Data Scaling
1) Normalization: Normalization is a technique that scales the values of the input data to a
specific range, typically between 0 and 1. This is done by subtracting the minimum value
of the data and dividing by the range (i.e., the difference between the maximum and
minimum values). This technique is useful when the input data has different scales, and it
is important to bring them to a common scale before training the model.
2) Standardization: Standardization is a technique that scales the values of the input data to
have zero mean and unit variance. This is done by subtracting the mean and dividing by
the standard deviation. This technique is useful when the input data has a Gaussian
(normal) distribution and it is important to center the data around zero before training the
model.
3) Batch Normalization: Batch normalization is a technique used to normalize the activations
of a neural network during the training process. It helps to stabilize the training process
and improve the generalization performance of the network by normalizing the inputs to
each neuron in a layer. It is typically applied to the inputs of each neuron before the
activation function is applied. It is applied in the forward pass, so the normalization is
different for each batch, hence the name "batch normalization". The normalization is done
by computing the mean and standard deviation of the activations for each batch of training
examples, and then using these statistics to normalize the activations.
4) Layer Normalization: Layer normalization is a technique similar to batch normalization,
but instead of normalizing the activations for each batch of training examples, it normalizes
the activations across all the dimensions of the input. This means that instead of computing
the mean and standard deviation for each batch, the mean and standard deviation are
computed for each feature (i.e. across all the examples in the batch)

Learning Rate
The learning rate is a hyperparameter of a neural network that controls the step size of the gradient
descent optimization algorithm. It determines the amount by which the weights of the network are
updated during each iteration of the training process. A small learning rate implies that the
optimization algorithm will take small steps towards the optimal solution, which can lead to slow
convergence but with a smaller chance of overshooting the optimal solution. On the other hand, a
large learning rate implies that the optimization algorithm will take large steps towards the optimal
solution, which can lead to faster convergence but with a larger chance of overshooting the optimal
solution and getting stuck in a suboptimal solution.
The learning rate is a critical hyperparameter that can have a significant impact on the performance
of a neural network. Finding an appropriate learning rate is essential for the network to converge
to an optimal solution and avoid overfitting or underfitting.
There are different ways to set the learning rate, such as:
1) Manual tuning: This is the simplest approach, where the learning rate is set to a fixed value
and the network is trained until convergence.
2) Adaptive learning rate: This approach adjusts the learning rate during training based on
the performance of the network. For example, the learning rate can be decreased when the
performance of the network stops improving.
3) Learning rate schedule: This approach schedules the learning rate to change over time. For
example, the learning rate can be set to a high value at the beginning of training and
gradually decrease over time.
It is worth noting that finding an appropriate learning rate is an important task that requires some
trial and error and it can be a challenging task. There are also techniques like learning rate finder
and learning rate range test that can help to find a good learning rate.

Weight Initialization
Weight initialization is the process of initializing the weights of a neural network before the
training process begins. The goal of weight initialization is to set the initial values of the weights
in a way that will make the training process more efficient and stable.
Random initialization: The weights are initialized randomly using a distribution such as Gaussian
or uniform distribution. This is a simple and widely used method, but the initial weights are highly
dependent on the chosen distribution.
Steps for training a Neural Network
1) Initialize the weights.
2) Pass the first observation and perform forward Propagation.
3) Compare the actual value with the predicted value and calculate the cost function.
4) Perform back Propagation: Update weights.
5) Repeat steps 1 to 4 until we get the desired error.
When whole dataset is passed for one time, it’s called one epoch. Try with multiple epoch.

Future Scope and Conclusion
No ratings yet
Future Scope and Conclusion
13 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
No ratings yet
Neural Networks Training Loss Functions, Stochastic Gradient Descent, Backpropagation Algorithm, Bias-Variance Tradeoff
29 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Supervised Learning with Neural Networks
No ratings yet
Supervised Learning with Neural Networks
4 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
12 pages
1 Intro
No ratings yet
1 Intro
91 pages
Deep Learning for Stock Prediction
No ratings yet
Deep Learning for Stock Prediction
21 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Ia Davma Unidad 2
No ratings yet
Ia Davma Unidad 2
113 pages
Unit-IV NN and Rule Based Algorithm
No ratings yet
Unit-IV NN and Rule Based Algorithm
13 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Dl-Unit 2
No ratings yet
Dl-Unit 2
7 pages
Understanding Neural Networks and MSE
No ratings yet
Understanding Neural Networks and MSE
29 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
UNIT-2 Machine Learning
No ratings yet
UNIT-2 Machine Learning
35 pages
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
No ratings yet
Unit-5: Introduction To Deep Learning: Artificial Neural Networks
14 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
21 pages
Module 1
No ratings yet
Module 1
64 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
6 pages
Deep Learning
No ratings yet
Deep Learning
102 pages
Deep Learning and Neural Networks Overview
No ratings yet
Deep Learning and Neural Networks Overview
10 pages
Deep Learning Class Notes for CSE 2024
No ratings yet
Deep Learning Class Notes for CSE 2024
23 pages
Unit 4
No ratings yet
Unit 4
123 pages
Deep Learning vs. Machine Learning Guide
No ratings yet
Deep Learning vs. Machine Learning Guide
11 pages
Fundamentals of Deep Learning Explained
No ratings yet
Fundamentals of Deep Learning Explained
72 pages
Mathematics of Artificial Neural Networks - Wikipedia
No ratings yet
Mathematics of Artificial Neural Networks - Wikipedia
5 pages
Unit 2
No ratings yet
Unit 2
16 pages
Unit 2 (DL)
No ratings yet
Unit 2 (DL)
44 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Intro to Machine Learning Basics
100% (2)
Intro to Machine Learning Basics
16 pages
DL Unit 1
No ratings yet
DL Unit 1
10 pages
Chapter5 Artificial Neural Networks
No ratings yet
Chapter5 Artificial Neural Networks
18 pages
Understanding Deep Learning Activation Functions
No ratings yet
Understanding Deep Learning Activation Functions
70 pages
Understanding Neural Networks Basics
No ratings yet
Understanding Neural Networks Basics
51 pages
Unit 5
No ratings yet
Unit 5
10 pages
Deep Learning
No ratings yet
Deep Learning
220 pages
The Deep Neural Network-A Review
No ratings yet
The Deep Neural Network-A Review
5 pages
Understanding Neurons and Perceptrons
No ratings yet
Understanding Neurons and Perceptrons
23 pages
Unit 3
No ratings yet
Unit 3
7 pages
Overview of Artificial Neural Networks
No ratings yet
Overview of Artificial Neural Networks
5 pages
Neural Networks for Software Estimation
No ratings yet
Neural Networks for Software Estimation
40 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Continuous Neural Networks
No ratings yet
Continuous Neural Networks
14 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Neural Networks: Structure & Training
No ratings yet
Neural Networks: Structure & Training
52 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Overview of Neural Networks Basics
No ratings yet
Overview of Neural Networks Basics
35 pages
SGD Variants in Neural Networks
No ratings yet
SGD Variants in Neural Networks
211 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
21 pages
Neural Networks: Keras & Backpropagation
No ratings yet
Neural Networks: Keras & Backpropagation
19 pages
Week 2 Artificial Neural Networks - Part II
No ratings yet
Week 2 Artificial Neural Networks - Part II
40 pages
Training Multi-Layer Feedforward DNNs
No ratings yet
Training Multi-Layer Feedforward DNNs
9 pages
Unit 3
No ratings yet
Unit 3
8 pages
CSC 323-06 Artificial Neural Network
No ratings yet
CSC 323-06 Artificial Neural Network
29 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Configure Payslip Form Class CEDT
No ratings yet
Configure Payslip Form Class CEDT
14 pages
Kirk Morey: Senior Engineer Profile
No ratings yet
Kirk Morey: Senior Engineer Profile
8 pages
Hd107s LED Datasheet
No ratings yet
Hd107s LED Datasheet
9 pages
How To Print Layouts in Sap Business One PDF
No ratings yet
How To Print Layouts in Sap Business One PDF
65 pages
Iibf All Q&A Hindi by - PDF
No ratings yet
Iibf All Q&A Hindi by - PDF
97 pages
Emerging Tech for Engineering Students
No ratings yet
Emerging Tech for Engineering Students
35 pages
Ghost in The Shell 2017 Bluray 1080P Truehd Atmos 7 1 Avc Remux-Framestor
No ratings yet
Ghost in The Shell 2017 Bluray 1080P Truehd Atmos 7 1 Avc Remux-Framestor
2 pages
$1000+ Monthly Value Stack For $97 - Month
No ratings yet
$1000+ Monthly Value Stack For $97 - Month
5 pages
This Is A System-Generated Statement. Hence, It Does Not Require Any Signature
No ratings yet
This Is A System-Generated Statement. Hence, It Does Not Require Any Signature
16 pages
Safe Automated Driving Requirements and Architectures
No ratings yet
Safe Automated Driving Requirements and Architectures
162 pages
Adjustable Power Supply Using LM723
No ratings yet
Adjustable Power Supply Using LM723
26 pages
Computer Storage: Disks and Drives
No ratings yet
Computer Storage: Disks and Drives
18 pages
X86 Microprocessor Architecture Overview
No ratings yet
X86 Microprocessor Architecture Overview
30 pages
Digital Imaging Basics and Applications
No ratings yet
Digital Imaging Basics and Applications
26 pages
Project Risk Management
100% (1)
Project Risk Management
57 pages
Bud - Recap - Speaking Week 8
No ratings yet
Bud - Recap - Speaking Week 8
9 pages
Epic 2 ABB
100% (1)
Epic 2 ABB
121 pages
EM-1 Manual
No ratings yet
EM-1 Manual
32 pages
Cloud Computing and Service Models
No ratings yet
Cloud Computing and Service Models
19 pages
Signals and Systems Uday Kumar Text PDF
100% (1)
Signals and Systems Uday Kumar Text PDF
5 pages
Online Fertilizers & Pesticides System
No ratings yet
Online Fertilizers & Pesticides System
62 pages
Neomedia'S Patent On Using Identification Code To Access Networked Computers. U.S. Patent No. 6,199,048
No ratings yet
Neomedia'S Patent On Using Identification Code To Access Networked Computers. U.S. Patent No. 6,199,048
6 pages
Desktop Support Engineer
No ratings yet
Desktop Support Engineer
35 pages
Embedded Systems Design Overview
No ratings yet
Embedded Systems Design Overview
41 pages
Secccss
No ratings yet
Secccss
3 pages
R24 Ads Unit-I
No ratings yet
R24 Ads Unit-I
60 pages
Reviewer Final Examination Name
No ratings yet
Reviewer Final Examination Name
10 pages
Information Risk Checklist
No ratings yet
Information Risk Checklist
4 pages

2 - Intro To Neural Network

Uploaded by

2 - Intro To Neural Network

Uploaded by

Intro to Neural Network

You might also like