Single Neuron Model
Single Neuron Model
SCO 411
Dr. Elijah Maseno
SCO 411 1
SCO 411 2
1. In the first step, Input units are passed i.e data is
passed with some weights attached to it to the hidden
layer. We can have any number of hidden layers. In the
above image inputs x1,x2,x3,….xn is passed.
2. Each hidden layer consists of neurons. All the inputs are
connected to each neuron.
3. After passing on the inputs, all the computation is
performed in the hidden layer.
Computation performed in hidden layers are done in two steps
which are as follows:
• First of all, all the inputs are multiplied by their weights.
Weight is the gradient or coefficient of each variable. It
shows the strength of the particular input. After assigning
the weights, a bias variable is added. Bias is a constant that
helps the model to fit in the best way possible.
Z1 = W1*In1 + W2*In2 + W3*In3 + W4*In4 + W5*In5 + b
W1, W2, W3, W4, W5 are the weights assigned to the inputs
In1, In2, In3, In4, In5, and b is the bias.
• Then in the second step, the activation function is applied
to the linear equation Z1. The activation function is a
nonlinear transformation that is applied to the input before
sending it to the next layer of neurons. The importance of
the activation function is to inculcate nonlinearity in the
model.
4
4. The whole process described in point 3 is performed in each
hidden layer. After passing through every hidden layer, we
move to the last layer i.e our output layer which gives us
the final output.
The process explained above is known as forwarding
Propagation.
5. After getting the predictions from the output layer, the error is
calculated i.e the difference between the actual and the predicted
output.
If the error is large, then the steps are taken to minimize the error and
for the same purpose, Back Propagation is performed.
5
What is Back Propagation and How it
works?
• In the network or model, each link assigned with some
weight.
• Weight is nothing an integer number that controls the signal
between the two neurons. If the network generates a “good
or desired” output, there is no need to adjust the weights.
• However, if the network generates a “poor or undesired”
output or an error, then the system update the weights in
order to improve subsequent results.
Back Propagation with Gradient
Descent
Gradient Descent is one of the optimizers which helps in
calculating the new weights. Let’s understand step by step
how Gradient Descent optimizes the cost function.
In the image below, the curve is our cost function curve and
our aim is the minimize the error such that Jmin i.e global
minima is achieved.
7
• First, the weights are initialized randomly i.e random value
of the weight, and intercepts are assigned to the model
while forward propagation and the errors are
calculated after all the computation. (As discussed above)
• Then the gradient is calculated i.e derivative of error
w.r.t current weights
• Then new weights are calculated using the below formula,
where a is the learning rate which is the parameter also
known as step size to control the speed or steps of the
backpropagation. It gives additional control on how fast we
want to move on the curve to reach global minima.
8
• This process of calculating the new weights, then errors from
the new weights, and then updation of weights continues till we
reach global minima and loss is minimized.
A point to note here is that the learning rate i.e a in our
weight updation equation should be chosen wisely. Learning
rate is the amount of change or step size taken towards
reaching global minima. It should not be very small as it will
take time to converge as well as it should not be very
large that it doesn’t reach global minima at all. Therefore, the
learning rate is the hyperparameter that we have to choose
based on the model.
9
10
Activation Functions
• Activation functions are attached to each neuron and are
mathematical equations that determine whether a neuron should
be activated or not based on whether the neuron’s input is
relevant for the model’s prediction or not. The purpose of the
activation function is to introduce the nonlinearity in the data.