0% found this document useful (0 votes)
108 views

Gradient Descent

Gradient descent is an optimization algorithm used to find the minimum of a function. It works by taking steps in the direction of the negative gradient of the function at the current point, moving toward the local minimum. The algorithm initializes with a starting point and iteratively calculates the negative gradient to determine the direction and step size to reduce the function value at each step until it converges on a local minimum. Gradient descent is commonly used to train machine learning models like neural networks and logistic regression.

Uploaded by

Manoj Kudur
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Gradient Descent

Gradient descent is an optimization algorithm used to find the minimum of a function. It works by taking steps in the direction of the negative gradient of the function at the current point, moving toward the local minimum. The algorithm initializes with a starting point and iteratively calculates the negative gradient to determine the direction and step size to reduce the function value at each step until it converges on a local minimum. Gradient descent is commonly used to train machine learning models like neural networks and logistic regression.

Uploaded by

Manoj Kudur
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Gradient Descent

Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient descent. Gradient descent is also known as steepest descent, or the method of steepest descent. When known as the latter, gradient descent should not be confused with the method of steepest descent for approximating integrals.

Gradient descent is based on the observation that if the multivariable function is defined and differentiable in a neighborhood of a point a, then decreases fastest if one goes from in the direction of the negative gradient of F at a, . It follows that, if

for small enough, then guess 0 for a local minimum of

. With this observation in mind, one starts with a


, and considers the sequence such that

We have

So hopefully the sequence converges to the desired local minimum. Note that the value of the step size is allowed to change at every iteration. With certain assumptions on the function and particular choices of , convergence to a local minimum can be guaranteed.

When the function is convex, all local minima are also global minima, so in this case gradient descent can converge to the global solution. This process is illustrated in the picture to the right. Here is assumed to be defined on the plane, and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on which the value of is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the contour line going through that point. We see that gradient descent leads us to the bottom of the bowl, that is, to the point where the value of the function is minimal.

Applications Gradient descent is a popular algorithm for training a wide range of models in machine learning, including (linear) support vector machines, logistic regression and graphical models. It competes with the L-BFGS algorithm, which is also widely used. SGD has been used since at least 1960 for training linear regression models, originally under the name ADALINE. When combined with the back propagation algorithm, it is the de facto standard algorithm for training (shallow) artificial neural networks. Another popular stochastic gradient descent algorithm is the least mean squares (LMS) adaptive filter.

You might also like