0% found this document useful (0 votes)
18 views15 pages

Gradient Descent for Beginners

The document explains the concept of Gradient Descent as an iterative optimization algorithm used in machine learning to minimize the cost function, which evaluates the performance of a model. It emphasizes the importance of small, user-guided iterations in both Agile software development and Gradient Descent, highlighting the role of learning rates and derivatives in finding the minimum value of a function. The goal is to achieve the lowest error in predictions by adjusting parameters effectively through the algorithm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Gradient Descent for Beginners

The document explains the concept of Gradient Descent as an iterative optimization algorithm used in machine learning to minimize the cost function, which evaluates the performance of a model. It emphasizes the importance of small, user-guided iterations in both Agile software development and Gradient Descent, highlighting the role of learning rates and derivatives in finding the minimum value of a function. The goal is to achieve the lowest error in predictions by adjusting parameters effectively through the algorithm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Gradient Descent

Prakash P
Background
• Agile is a pretty well-known term in the software development process. The
basic idea behind it is simple:
• build something quickly ➡️get it out there ➡️get some feedback ➡️make changes
depending upon the feedback ➡️repeat the process.
• The goal is to get the product near the user and let the user guide you with
the feedback to obtain the best possible product with the least error.
• Also, the steps taken for improvement need to be small and should constantly
involve the user.
• In a way, an Agile software development process involves rapid iterations.
• The idea of — start with a solution as soon as possible, measure and iterate
as frequently as possible, is basically Gradient descent under the hood
Objective
• Gradient descent algorithm is an iterative process that takes us to the
minimum of a function.
• The formula below sums up the entire Gradient Descent algorithm in
a single line
A Machine Learning Model
• Consider a bunch of data points in a 2 D space. Assume that the data
is related to the height and weight of a group of students.
• Trying to predict some kind of relationship between these quantities
so that we could predict the weight of some new students afterwards.
Predictions
• Given a known set of inputs and their corresponding outputs, A
machine learning model tries to make some predictions for a new set
of inputs

The Error would be the difference between the two


predictions.

This relates to the idea of a Cost function or Loss function.


Cost Function
• A Cost Function/Loss Function evaluates the performance of our
Machine Learning Algorithm.
• The Loss function computes the error for a single training example
while the Cost function is the average of the loss functions for all the
training examples.
A Cost function basically tells us ‘ how good’ our model is at
making predictions for a given value of m and b.

Let’s say, there are a total of ’N’ points in the dataset and for all
those ’N’ data points we want to minimize the error. So the
Cost function would be the total squared error
Minimizing the Cost Function

• The goal of any Machine Learning Algorithm is to minimize the Cost


Function.

• Lower error between the actual and the predicted values signifies
that the algorithm has done a good job in learning.

• Since we want the lowest error value, we want those‘ m’ and ‘b’
values which give the smallest possible error
How do we actually minimize any
function?
• Cost function is of the form Y = X²
• In a Cartesian coordinate system, this is an equation for a parabola
and can be graphically represented as
To minimise the function above, need to find that value of X
that produces the lowest value of Y which is the red dot

It is quite easy to locate the minima here since it is a 2D graph


but this may not always be the case especially in case of higher
dimensions.
For those cases, need to devise an algorithm to locate the
minima, and that algorithm is called Gradient Descent
Gradient Descent
• Gradient descent is one of the most popular algorithms to perform optimization
and by far the most common way to optimize neural networks.
• It is an iterative optimisation algorithm used to find the minimum value for a
function.

Intuition:

Consider that you are walking along the graph below, and you
are currently at the ‘green’ dot.

Your aim is to reach the minimum i.e the ‘red’ dot, but from
your position, you are unable to view it.

Possible actions would be:


• You might go upward or downward Gradient Descent Algorithm helps us to make
these decisions efficiently and effectively with
• If you decide on which way to go, you might take a bigger the use of derivatives
step or a little step to reach your destination.
The Minimum Value

• A derivative is a term that comes from calculus and is calculated as


the slope of the graph at a particular point

The slope at the blue point is less steep than that at the green
point which means it will take much smaller steps to reach the
minimum from the blue point than from the green point.
Mathematical Interpretation of Cost
Function In the equation, y = mX+b, ‘m’ and ‘b’ are its
parameters. During the training process, there
will be a small change in their values.

Let that small change be denoted by δ.

The value of parameters will be updated as


m=m-δm and b=b-δb respectively

Our aim here is to find those values of m and b in


y = mx+b , for which the error is minimum i.e
values which minimize the cost function.
The Learning rate
• This size of steps taken to reach the minimum or bottom is
called Learning Rate.

• We can cover more area with larger steps/higher learning rate but are
at the risk of overshooting the minima

• On the other hand, small steps/smaller learning rates will consume a


lot of time to reach the lowest point
Calculating Gradient Descent

m¹,b¹ = next position parameters; m⁰,b⁰ = current position parameters


This 2 in this equation isn’t that significant since it just says that we have a learning rate twice as big
Conclusion
• Hence, to solve for the gradient,
• iterate through our data points using our new m and b values and
compute the partial derivatives.
• This new gradient tells us the slope of our cost function at our current
position and the direction we should move to update our parameters.
• The size of our update is controlled by the learning rate.
References
• https://machinelearningmastery.com/gradient-descent-for-machine-l
earning/

• https://towardsdatascience.com/understanding-the-mathematics-be
hind-gradient-descent-dde5dc9be06e

You might also like