0% found this document useful (0 votes)
27 views

Gradient Descent

Gradient descent is an optimization algorithm used to minimize errors in machine learning models. It works by iteratively adjusting parameters to reduce the cost function, with the learning rate determining step sizes. There are different types including batch gradient descent, which processes the entire dataset before each update, stochastic gradient descent, which updates after each sample, and mini-batch gradient descent, which processes updates in small batches to balance efficiency and noise.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Gradient Descent

Gradient descent is an optimization algorithm used to minimize errors in machine learning models. It works by iteratively adjusting parameters to reduce the cost function, with the learning rate determining step sizes. There are different types including batch gradient descent, which processes the entire dataset before each update, stochastic gradient descent, which updates after each sample, and mini-batch gradient descent, which processes updates in small batches to balance efficiency and noise.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Gradient Descent

Gradient Descent is known as one of the most commonly used optimization algorithms to
train machine learning models by means of minimizing errors between actual and expected
results. Further, gradient descent is also used to train Neural Networks.

Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of machine learning to train the machine learning and deep learning
models. It helps in finding the local minimum of a function.

What is Cost-function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the
form of a single real number.

Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the
cost function. If the learning rate is high, it results in larger steps but also leads to
risks of overshooting the minimum. At the same time, a low learning rate shows the
small step sizes, which compromises overall efficiency but gives the advantage of
more precision.
Types of Gradient Descent
Based on the error in various training models, the Gradient Descent learning
algorithm can be divided into Batch gradient descent, stochastic gradient
descent, and mini-batch gradient descent. Let's understand these different types
of gradient descent:

1. Batch Gradient Descent:


Batch gradient descent (BGD) is used to find the error for each point in the training
set and update the model after evaluating all training examples. This procedure is
known as the training epoch. In simple words, it is a greedy approach where we have
to sum over all examples for each update.

2. Stochastic gradient descent


Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration. Or in other words, it processes a training epoch for each
example within a dataset and updates each training example's parameters one at a
time. As it requires only one training example at a time, hence it is easier to store in
allocated memory. However, it shows some computational efficiency losses in
comparison to batch gradient systems as it shows frequent updates that require
more detail and speed. Further, due to frequent updates, it is also treated as a noisy
gradient. However, sometimes it can be helpful in finding the global minimum and
also escaping the local minimum.

3. MiniBatch Gradient Descent:


Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes
then performs the updates on those batches separately. Splitting training datasets
into smaller batches make a balance to maintain the computational efficiency of
batch gradient descent and speed of stochastic gradient descent. Hence, we can
achieve a special type of gradient descent with higher computational efficiency and
less noisy gradient descent.

You might also like