SlideShare a Scribd company logo
Gradient descent method 
2013.11.10 
SanghyukChun 
Many contents are from 
Large Scale Optimization Lecture 4 & 5 by Caramanis& Sanghavi 
Convex Optimization Lecture 10 by Boyd & Vandenberghe 
Convex Optimization textbook Chapter 9 by Boyd & Vandenberghe 1
Contents 
•Introduction 
•Example code & Usage 
•Convergence Conditions 
•Methods & Examples 
•Summary 
2
Introduction 
Unconstraint minimization problem, Description, Pros and Cons 
3
Unconstrained minimization problems 
•Recall: Constrained minimization problems 
•From Lecture 1, the formation of a general constrained convex optimization problem is as follows 
•min푓푥푠.푡.푥∈χ 
•Where 푓:χ→Ris convex and smooth 
•From Lecture 1, the formation of an unconstrained optimization problem is as follows 
•min푓푥 
•Where 푓:푅푛→푅is convex and smooth 
•In this problem, the necessary and sufficient condition for optimal solution x0 is 
•훻푓푥=0푎푡푥=푥0 
4
Unconstrained minimization problems 
•Minimize f(x) 
•When f is differentiable and convex, a necessary and sufficient condition for a point 푥∗to be optimal is훻푓푥∗=0 
•Minimize f(x) is the same as fining solution of 훻푓푥∗=0 
•Min f(x): Analytically solving the optimality equation 
•훻푓푥∗=0: Usually be solved by an iterative algorithm 
5
Description of Gradient Descent Method 
•The idea relies on the fact that −훻푓(푥(푘))is a descent direction 
•푥(푘+1)=푥(푘)−η푘훻푓(푥(푘))푤푖푡ℎ푓푥푘+1<푓(푥푘) 
•Δ푥(푘)is the step, or search direction 
•η푘is the step size, or step length 
•Too small η푘will cause slow convergence 
•Too large η푘could cause overshoot the minima and diverge 
6
Description of Gradient Descent Method 
•Algorithm (Gradient Descent Method) 
•given a starting point 푥∈푑표푚푓 
•repeat 
1.Δ푥≔−훻푓푥 
2.Line search: Choose step size ηvia exact or backtracking line search 
3.Update 푥≔푥+ηΔ푥 
•untilstopping criterion is satisfied 
•Stopping criterion usually 훻푓(푥)2≤휖 
•Very simple, but often very slow; rarely used in practice 
7
Pros and Cons 
•Pros 
•Can be applied to every dimension and space (even possible to infinite dimension) 
•Easy to implement 
•Cons 
•Local optima problem 
•Relatively slow close to minimum 
•For non-differentiable functions, gradient methods are ill-defined 
8
Example Code & Usage 
Example Code, Usage, Questions 
9
Gradient Descent Example Code 
•http://mirlab.org/jang/matlab/toolbox/machineLearning/ 
10
Usage of Gradient Descent Method 
•Linear Regression 
•Find minimum loss function to choose best hypothesis 
11 
Example of Loss function: 
푑푎푡푎푝푟푒푑푖푐푡−푑푎푡푎표푏푠푒푟푣푒푑 2 
Find the hypothesis (function) which minimize the loss function
Usage of Gradient Descent Method 
•Neural Network 
•Back propagation 
•SVM (Support Vector Machine) 
•Graphical models 
•Least Mean Squared Filter 
…and many other applications! 
12
Questions 
•Does Gradient Descent Method always converge? 
•If not, what is condition for convergence? 
•How can make Gradient Descent Method faster? 
•What is proper value for step size η푘 
13
Convergence Conditions 
L-Lipschitzfunction, Strong Convexity, Condition number 
14
L-Lipschitzfunction 
•Definition 
•A function 푓:푅푛→푅is called L-Lipschitzif and only if 훻푓푥−훻푓푦2≤퐿푥−푦2,∀푥,푦∈푅푛 
•We denote this condition by 푓∈퐶퐿, where 퐶퐿is class of L-Lipschitzfunctions 
15
L-Lipschitzfunction 
•Lemma 4.1 
•퐼푓푓∈퐶퐿,푡ℎ푒푛푓푦−푓푥−훻푓푥,푦−푥≤ 퐿 2 푦−푥2 
•Theorem 4.2 
•퐼푓푓∈퐶퐿푎푛푑푓∗=min 푥 푓푥>−∞,푡ℎ푒푛푡ℎ푒푔푟푎푑푖푒푛푡푑푒푠푐푒푛푡 푎푙푔표푟푖푡ℎ푚푤푖푡ℎ푓푖푥푒푑푠푡푒푝푠푖푧푒푠푡푎푡푖푠푓푦푖푛푔η< 2 퐿 푤푖푙푙푐표푛푣푒푟푔푒푡표 푎푠푡푎푡푖표푛푎푟푦푝표푖푛푡 
16
Strong Convexity and implications 
•Definition 
•If there exist a constant m > 0 such that 훻2푓≻=푚퐼푓표푟∀푥∈푆, then the function f(x) is strongly convex function on S 
17
Strong Convexity and implications 
•Lemma 4.3 
•If f is strongly convex on S, we have the following inequality: 
•푓푦≥푓푥+<훻푓푥,푦−푥>+ 푚 2 푦−푥2푓표푟∀푥,푦∈푆 
•Proof 
18 
( ) 
useful as stopping criterion (if you know m)
Strong Convexity and implications 
19 
Proof
Upper Bound of 훻2푓(푥) 
•Lemma 4.3 implies that the sublevel sets contained in S are bounded, so in particular, S is bounded. Therefore the maximum eigenvalue of 훻2푓푥is bounded above on S 
•ThereexistsaconstantMsuchthat훻2푓푥=≺푀퐼푓표푟∀푥∈푆 
•Lemma 4.4 
•퐹표푟푎푛푦푥,푦∈푆,푖푓훻2푓푥=≺푀퐼푓표푟푎푙푙푥∈푆푡ℎ푒푛 푓푦≤푓푥+<훻푓푥,푦−푥>+ 푀 2 푦−푥2 
20
Condition Number 
•From Lemma 4.3 and 4.4 we have 
푚퐼=≺훻2푓푥=≺푀퐼푓표푟∀푥∈푆,푚>0,푀>0 
•The ratio k=M/m is thus an upper bound on the condition number of the matrix 훻2푓푥 
•When the ratio is close to 1, we call it well-conditioned 
•When the ratio is much larger than 1, we call it ill-conditioned 
•When the ratio is exactly 1, it is the best case that only one step will lead to the optimal solution (there is no wrong direction) 
21
Condition Number 
•Theorem 4.5 
•Gradient descent for a strongly convex function f and step sizeη= 1 푀 will converge as 
•푓푥∗−푓∗≤푐푘푓푥0−푓∗,푤ℎ푒푟푒푐≤1− 푚 푀 
•Rate of convergence c is known as linear convergence 
•Since we usually do not know the value of M, we do line search 
•For exact line search, 푐=1− 푚 푀 
•For backtracking line search, 푐=1−min2푚훼,2훽훼푚 푀 <1 
22
Methods & Examples 
Exact Line Search, Backtracking Line Search, Coordinate Descent Method, Steepest Descent Method 
23
Exact Line Search 
• The optimal line search method in which η is chosen to 
minimize f along the ray 푥 − η훻푓 푥 , as shown in below 
• Exact line search is used when the cost of minimization 
problem with one variable is low compared to the cost of 
computing the search direction itself. 
• It is not very practical 
24
Exact Line Search 
•Convergence Analysis 
• 
•푓푥푘−푓∗decreases by at least a constant factor in every iteration 
•Converging to 0 geometric fast. (linear convergence) 
25 
1
Backtracking Line Search 
•It depends on two constants 훼,훽푤푖푡ℎ0<훼<0.5,0<훽<1 
•It starts with unit step size and then reduces it by the factor 훽until the stopping condition 
푓(푥−η훻푓(푥))≤푓(푥)−훼η훻푓푥2 
•Since −훻푓푥is a descent direction and −훻푓푥2<0, so for small enough step size η, we have 
푓푥−η훻푓푥≈푓푥−η훻푓푥2<푓푥−훼η훻푓푥2 
•It shows that the backtracking line search eventually terminates 
•훼is typically chosen between 0.01 and 0.3 
•훽is often chosen to be between 0.1 and 0.8 
26
Backtracking Line Search 
• 
27
Backtracking Line Search 
•Convergence Analysis 
•Claim: η≤ 1 푀 always satisfies the stopping condition 
•Proof 
28
Backtracking Line Search 
•Proof (cont) 
29
Line search types 
•Slide from Optimization Lecture 10 by Boyd 30
Line search example 
•Slide from Optimization Lecture 10 by Boyd 31
Coordinate Descent Method 
• Coordinate descent belongs to the class of several non 
derivative methods used for minimizing differentiable 
functions. 
• Here, cost is minimized in one coordinate direction in each 
iteration. 
32
Coordinate Descent Method 
•Pros 
•It is well suited for parallel computation 
•Cons 
•May not reach the local minimum even for convex function 
33
Converge of Coordinate Descent 
•Lemma 5.4 
34
Coordinate Descent Method 
•Method of selecting the coordinate for next iteration 
•Cyclic Coordinate Descent 
•Greedy Coordinate Descent 
•(Uniform) Random Coordinate Descent 
35
Steepest Descent Method 
•The gradient descent method takes many iterations 
•Steepest Descent Method aims at choosing the best direction at each iteration 
•Normalized steepest descent direction 
•Δ푥푛푠푑=푎푟푔푚푖푛훻푓푥푇푣푣=1} 
•Interpretation: for small 푣,푓푥+푣≈푓푥+훻푓푥푇푣direction Δ푥푛푠푑 is unit-norm step with most negative directional derivative 
•Iteratively, the algorithm follows the following steps 
•Calculate direction of descent Δ푥푛푠푑 
•Calculate step size, t 
•푥+=푥+푡Δ푥푛푠푑 
36
Steepest Descent for various norms 
•The choice of norm used the steepest descent direction can be have dramatic effect on converge rate 
•푙2norm 
•The steepest descent direction is as follows 
•Δ푥푛푠푑= −훻푓(푥) 훻푓(푥)2 
•푙1norm 
•For 푥1= 푖푥푖,a descent direction is as follows, 
•Δ푥푛푑푠=−푠푖푔푛 휕푓푥 휕푥푖 ∗푒푖 ∗ 
•푖∗=푎푟푔min 푖 휕푓 휕푥푖 
•푙∞norm 
•For 푥∞=argmin 푖 푥푖,a descent direction is as follows 
•Δ푥푛푑푠=−푠푖푔푛−훻푓푥 
37
Steepest Descent for various norms 
38 
Quadratic Norm 
푙1-Norm
Steepest Descent for various norms 
•Example 
39
Steepest Descent Convergence Rate 
•Fact: Any norm can be bounded by ∙2, i.e., ∃훾, 훾∈(0,1] such that, 푥≥훾푥2푎푛푑푥∗≥훾푥2 
•Theorem 5.5 
•If f is strongly convex with respect to m and M, and ∙2has 훾, 훾as above then steepest decent with backtracking line search has linear convergence with rate 
•푐=1−2푚훼 훾2min1, 훽훾 푀 
•Proof: Will be proved in the lecture 6 
40
Summary 
41
Summary 
•Unconstrained Convex Optimization Problem 
•Gradient Descent Method 
•Step Size Trade-off between safety and speed 
•Convergence Conditions 
•L-LipschtizFunction 
•Strong Convexity 
•Condition Number 
42
Summary 
•Exact Line Search 
•Backtracking Line Search 
•Coordinate Descent Method 
•Good for parallel computation but not always converge 
•Steepest Descent Method 
•The choice of norm is important 
43
44 
END OF DOCUMENT

More Related Content

What's hot (20)

Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
kandelin
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Decision tree
Decision treeDecision tree
Decision tree
R A Akerkar
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Suraj Parmar
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
swapnac12
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Inductive bias
Inductive biasInductive bias
Inductive bias
swapnac12
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
International Institute of Information Technology (I²IT)
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Edureka!
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
SreerajVA
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
Srinivasan R
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
kandelin
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Suraj Parmar
 
Concept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithmConcept learning and candidate elimination algorithm
Concept learning and candidate elimination algorithm
swapnac12
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Inductive bias
Inductive biasInductive bias
Inductive bias
swapnac12
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Edureka!
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
SreerajVA
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
Srinivasan R
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 

Viewers also liked (20)

CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
Jinho Choi
 
Using Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and LearningUsing Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and Learning
Dr. Volkan OBAN
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Chris Fregly
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
Hiroyuki Fukuda
 
04.第四章用Matlab求偏导数
04.第四章用Matlab求偏导数04.第四章用Matlab求偏导数
04.第四章用Matlab求偏导数
Xin Zheng
 
Neiwpcc2010.ppt
Neiwpcc2010.pptNeiwpcc2010.ppt
Neiwpcc2010.ppt
John B. Cook, PE, CEO
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
Arsalan Qadri
 
Art2
Art2Art2
Art2
ESCOM
 
CS571: Sentiment Analysis
CS571: Sentiment AnalysisCS571: Sentiment Analysis
CS571: Sentiment Analysis
Jinho Choi
 
Lecture29
Lecture29Lecture29
Lecture29
Dr Sandeep Kumar Poonia
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
Wiw Miu
 
Reducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an AnalysisReducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an Analysis
CSCJournals
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
Tommaso Teofili
 
02.03 Artificial Intelligence: Search by Optimization
02.03 Artificial Intelligence: Search by Optimization02.03 Artificial Intelligence: Search by Optimization
02.03 Artificial Intelligence: Search by Optimization
Andres Mendez-Vazquez
 
A Multiple-Shooting Differential Dynamic Programming Algorithm
A Multiple-Shooting Differential Dynamic Programming AlgorithmA Multiple-Shooting Differential Dynamic Programming Algorithm
A Multiple-Shooting Differential Dynamic Programming Algorithm
Etienne Pellegrini
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
Olivier Grisel
 
Optimization of Cairo West Power Plant for Generation
Optimization of Cairo West Power Plant for GenerationOptimization of Cairo West Power Plant for Generation
Optimization of Cairo West Power Plant for Generation
Hossam Zein
 
Optmization techniques
Optmization techniquesOptmization techniques
Optmization techniques
Deepshika Reddy
 
Outdoor propagatiom model
Outdoor propagatiom modelOutdoor propagatiom model
Outdoor propagatiom model
Krishnapavan Samudrala
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
Jinho Choi
 
Using Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and LearningUsing Gradient Descent for Optimization and Learning
Using Gradient Descent for Optimization and Learning
Dr. Volkan OBAN
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Chris Fregly
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
Hiroyuki Fukuda
 
04.第四章用Matlab求偏导数
04.第四章用Matlab求偏导数04.第四章用Matlab求偏导数
04.第四章用Matlab求偏导数
Xin Zheng
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
Arsalan Qadri
 
Art2
Art2Art2
Art2
ESCOM
 
CS571: Sentiment Analysis
CS571: Sentiment AnalysisCS571: Sentiment Analysis
CS571: Sentiment Analysis
Jinho Choi
 
Adaptive filtersfinal
Adaptive filtersfinalAdaptive filtersfinal
Adaptive filtersfinal
Wiw Miu
 
Reducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an AnalysisReducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an Analysis
CSCJournals
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
Tommaso Teofili
 
02.03 Artificial Intelligence: Search by Optimization
02.03 Artificial Intelligence: Search by Optimization02.03 Artificial Intelligence: Search by Optimization
02.03 Artificial Intelligence: Search by Optimization
Andres Mendez-Vazquez
 
A Multiple-Shooting Differential Dynamic Programming Algorithm
A Multiple-Shooting Differential Dynamic Programming AlgorithmA Multiple-Shooting Differential Dynamic Programming Algorithm
A Multiple-Shooting Differential Dynamic Programming Algorithm
Etienne Pellegrini
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
Olivier Grisel
 
Optimization of Cairo West Power Plant for Generation
Optimization of Cairo West Power Plant for GenerationOptimization of Cairo West Power Plant for Generation
Optimization of Cairo West Power Plant for Generation
Hossam Zein
 
Ad

Similar to Gradient descent method (20)

Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
gnans Kgnanshek
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdf
MTrang34
 
Methods for Non-Linear Least Squares Problems
Methods for Non-Linear Least Squares ProblemsMethods for Non-Linear Least Squares Problems
Methods for Non-Linear Least Squares Problems
apariciovenegas28
 
Gradient Descent in Machine Learning and
Gradient Descent in Machine Learning andGradient Descent in Machine Learning and
Gradient Descent in Machine Learning and
Papi28
 
CI_L01_Optimization.pdf
CI_L01_Optimization.pdfCI_L01_Optimization.pdf
CI_L01_Optimization.pdf
SantiagoGarridoBulln
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
SantiagoGarridoBulln
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
SEMINARGROOT
 
Lecture_15_Proximal_Gradient_Descent.pptx
Lecture_15_Proximal_Gradient_Descent.pptxLecture_15_Proximal_Gradient_Descent.pptx
Lecture_15_Proximal_Gradient_Descent.pptx
SmileySachin1
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
Northwestern University
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
devesh604174
 
AOT3 Multivariable Optimization Algorithms.pdf
AOT3 Multivariable Optimization Algorithms.pdfAOT3 Multivariable Optimization Algorithms.pdf
AOT3 Multivariable Optimization Algorithms.pdf
SandipBarik8
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
Amir Saleh
 
Optimization
OptimizationOptimization
Optimization
Naveen Saggu
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
Emad Nabil
 
Steepest descent method
Steepest descent methodSteepest descent method
Steepest descent method
Prof. Neeta Awasthy
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
Revanth Kumar
 
lecture6.ppt
lecture6.pptlecture6.ppt
lecture6.ppt
AbhiYadav655132
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methods
SantiagoGarridoBulln
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
Sanghyuk Chun
 
Steepest descent method in sc
Steepest descent method in scSteepest descent method in sc
Steepest descent method in sc
rajshreemuthiah
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
gnans Kgnanshek
 
Gradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdfGradient_Descent_Unconstrained.pdf
Gradient_Descent_Unconstrained.pdf
MTrang34
 
Methods for Non-Linear Least Squares Problems
Methods for Non-Linear Least Squares ProblemsMethods for Non-Linear Least Squares Problems
Methods for Non-Linear Least Squares Problems
apariciovenegas28
 
Gradient Descent in Machine Learning and
Gradient Descent in Machine Learning andGradient Descent in Machine Learning and
Gradient Descent in Machine Learning and
Papi28
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
SEMINARGROOT
 
Lecture_15_Proximal_Gradient_Descent.pptx
Lecture_15_Proximal_Gradient_Descent.pptxLecture_15_Proximal_Gradient_Descent.pptx
Lecture_15_Proximal_Gradient_Descent.pptx
SmileySachin1
 
cos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.pptcos323_s06_lecture03_optimization.ppt
cos323_s06_lecture03_optimization.ppt
devesh604174
 
AOT3 Multivariable Optimization Algorithms.pdf
AOT3 Multivariable Optimization Algorithms.pdfAOT3 Multivariable Optimization Algorithms.pdf
AOT3 Multivariable Optimization Algorithms.pdf
SandipBarik8
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
Amir Saleh
 
2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx2. Linear regression with one variable.pptx
2. Linear regression with one variable.pptx
Emad Nabil
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
Revanth Kumar
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methods
SantiagoGarridoBulln
 
Coordinate Descent method
Coordinate Descent methodCoordinate Descent method
Coordinate Descent method
Sanghyuk Chun
 
Steepest descent method in sc
Steepest descent method in scSteepest descent method in sc
Steepest descent method in sc
rajshreemuthiah
 
Ad

Recently uploaded (20)

Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 

Gradient descent method

  • 1. Gradient descent method 2013.11.10 SanghyukChun Many contents are from Large Scale Optimization Lecture 4 & 5 by Caramanis& Sanghavi Convex Optimization Lecture 10 by Boyd & Vandenberghe Convex Optimization textbook Chapter 9 by Boyd & Vandenberghe 1
  • 2. Contents •Introduction •Example code & Usage •Convergence Conditions •Methods & Examples •Summary 2
  • 3. Introduction Unconstraint minimization problem, Description, Pros and Cons 3
  • 4. Unconstrained minimization problems •Recall: Constrained minimization problems •From Lecture 1, the formation of a general constrained convex optimization problem is as follows •min푓푥푠.푡.푥∈χ •Where 푓:χ→Ris convex and smooth •From Lecture 1, the formation of an unconstrained optimization problem is as follows •min푓푥 •Where 푓:푅푛→푅is convex and smooth •In this problem, the necessary and sufficient condition for optimal solution x0 is •훻푓푥=0푎푡푥=푥0 4
  • 5. Unconstrained minimization problems •Minimize f(x) •When f is differentiable and convex, a necessary and sufficient condition for a point 푥∗to be optimal is훻푓푥∗=0 •Minimize f(x) is the same as fining solution of 훻푓푥∗=0 •Min f(x): Analytically solving the optimality equation •훻푓푥∗=0: Usually be solved by an iterative algorithm 5
  • 6. Description of Gradient Descent Method •The idea relies on the fact that −훻푓(푥(푘))is a descent direction •푥(푘+1)=푥(푘)−η푘훻푓(푥(푘))푤푖푡ℎ푓푥푘+1<푓(푥푘) •Δ푥(푘)is the step, or search direction •η푘is the step size, or step length •Too small η푘will cause slow convergence •Too large η푘could cause overshoot the minima and diverge 6
  • 7. Description of Gradient Descent Method •Algorithm (Gradient Descent Method) •given a starting point 푥∈푑표푚푓 •repeat 1.Δ푥≔−훻푓푥 2.Line search: Choose step size ηvia exact or backtracking line search 3.Update 푥≔푥+ηΔ푥 •untilstopping criterion is satisfied •Stopping criterion usually 훻푓(푥)2≤휖 •Very simple, but often very slow; rarely used in practice 7
  • 8. Pros and Cons •Pros •Can be applied to every dimension and space (even possible to infinite dimension) •Easy to implement •Cons •Local optima problem •Relatively slow close to minimum •For non-differentiable functions, gradient methods are ill-defined 8
  • 9. Example Code & Usage Example Code, Usage, Questions 9
  • 10. Gradient Descent Example Code •http://mirlab.org/jang/matlab/toolbox/machineLearning/ 10
  • 11. Usage of Gradient Descent Method •Linear Regression •Find minimum loss function to choose best hypothesis 11 Example of Loss function: 푑푎푡푎푝푟푒푑푖푐푡−푑푎푡푎표푏푠푒푟푣푒푑 2 Find the hypothesis (function) which minimize the loss function
  • 12. Usage of Gradient Descent Method •Neural Network •Back propagation •SVM (Support Vector Machine) •Graphical models •Least Mean Squared Filter …and many other applications! 12
  • 13. Questions •Does Gradient Descent Method always converge? •If not, what is condition for convergence? •How can make Gradient Descent Method faster? •What is proper value for step size η푘 13
  • 14. Convergence Conditions L-Lipschitzfunction, Strong Convexity, Condition number 14
  • 15. L-Lipschitzfunction •Definition •A function 푓:푅푛→푅is called L-Lipschitzif and only if 훻푓푥−훻푓푦2≤퐿푥−푦2,∀푥,푦∈푅푛 •We denote this condition by 푓∈퐶퐿, where 퐶퐿is class of L-Lipschitzfunctions 15
  • 16. L-Lipschitzfunction •Lemma 4.1 •퐼푓푓∈퐶퐿,푡ℎ푒푛푓푦−푓푥−훻푓푥,푦−푥≤ 퐿 2 푦−푥2 •Theorem 4.2 •퐼푓푓∈퐶퐿푎푛푑푓∗=min 푥 푓푥>−∞,푡ℎ푒푛푡ℎ푒푔푟푎푑푖푒푛푡푑푒푠푐푒푛푡 푎푙푔표푟푖푡ℎ푚푤푖푡ℎ푓푖푥푒푑푠푡푒푝푠푖푧푒푠푡푎푡푖푠푓푦푖푛푔η< 2 퐿 푤푖푙푙푐표푛푣푒푟푔푒푡표 푎푠푡푎푡푖표푛푎푟푦푝표푖푛푡 16
  • 17. Strong Convexity and implications •Definition •If there exist a constant m > 0 such that 훻2푓≻=푚퐼푓표푟∀푥∈푆, then the function f(x) is strongly convex function on S 17
  • 18. Strong Convexity and implications •Lemma 4.3 •If f is strongly convex on S, we have the following inequality: •푓푦≥푓푥+<훻푓푥,푦−푥>+ 푚 2 푦−푥2푓표푟∀푥,푦∈푆 •Proof 18 ( ) useful as stopping criterion (if you know m)
  • 19. Strong Convexity and implications 19 Proof
  • 20. Upper Bound of 훻2푓(푥) •Lemma 4.3 implies that the sublevel sets contained in S are bounded, so in particular, S is bounded. Therefore the maximum eigenvalue of 훻2푓푥is bounded above on S •ThereexistsaconstantMsuchthat훻2푓푥=≺푀퐼푓표푟∀푥∈푆 •Lemma 4.4 •퐹표푟푎푛푦푥,푦∈푆,푖푓훻2푓푥=≺푀퐼푓표푟푎푙푙푥∈푆푡ℎ푒푛 푓푦≤푓푥+<훻푓푥,푦−푥>+ 푀 2 푦−푥2 20
  • 21. Condition Number •From Lemma 4.3 and 4.4 we have 푚퐼=≺훻2푓푥=≺푀퐼푓표푟∀푥∈푆,푚>0,푀>0 •The ratio k=M/m is thus an upper bound on the condition number of the matrix 훻2푓푥 •When the ratio is close to 1, we call it well-conditioned •When the ratio is much larger than 1, we call it ill-conditioned •When the ratio is exactly 1, it is the best case that only one step will lead to the optimal solution (there is no wrong direction) 21
  • 22. Condition Number •Theorem 4.5 •Gradient descent for a strongly convex function f and step sizeη= 1 푀 will converge as •푓푥∗−푓∗≤푐푘푓푥0−푓∗,푤ℎ푒푟푒푐≤1− 푚 푀 •Rate of convergence c is known as linear convergence •Since we usually do not know the value of M, we do line search •For exact line search, 푐=1− 푚 푀 •For backtracking line search, 푐=1−min2푚훼,2훽훼푚 푀 <1 22
  • 23. Methods & Examples Exact Line Search, Backtracking Line Search, Coordinate Descent Method, Steepest Descent Method 23
  • 24. Exact Line Search • The optimal line search method in which η is chosen to minimize f along the ray 푥 − η훻푓 푥 , as shown in below • Exact line search is used when the cost of minimization problem with one variable is low compared to the cost of computing the search direction itself. • It is not very practical 24
  • 25. Exact Line Search •Convergence Analysis • •푓푥푘−푓∗decreases by at least a constant factor in every iteration •Converging to 0 geometric fast. (linear convergence) 25 1
  • 26. Backtracking Line Search •It depends on two constants 훼,훽푤푖푡ℎ0<훼<0.5,0<훽<1 •It starts with unit step size and then reduces it by the factor 훽until the stopping condition 푓(푥−η훻푓(푥))≤푓(푥)−훼η훻푓푥2 •Since −훻푓푥is a descent direction and −훻푓푥2<0, so for small enough step size η, we have 푓푥−η훻푓푥≈푓푥−η훻푓푥2<푓푥−훼η훻푓푥2 •It shows that the backtracking line search eventually terminates •훼is typically chosen between 0.01 and 0.3 •훽is often chosen to be between 0.1 and 0.8 26
  • 28. Backtracking Line Search •Convergence Analysis •Claim: η≤ 1 푀 always satisfies the stopping condition •Proof 28
  • 29. Backtracking Line Search •Proof (cont) 29
  • 30. Line search types •Slide from Optimization Lecture 10 by Boyd 30
  • 31. Line search example •Slide from Optimization Lecture 10 by Boyd 31
  • 32. Coordinate Descent Method • Coordinate descent belongs to the class of several non derivative methods used for minimizing differentiable functions. • Here, cost is minimized in one coordinate direction in each iteration. 32
  • 33. Coordinate Descent Method •Pros •It is well suited for parallel computation •Cons •May not reach the local minimum even for convex function 33
  • 34. Converge of Coordinate Descent •Lemma 5.4 34
  • 35. Coordinate Descent Method •Method of selecting the coordinate for next iteration •Cyclic Coordinate Descent •Greedy Coordinate Descent •(Uniform) Random Coordinate Descent 35
  • 36. Steepest Descent Method •The gradient descent method takes many iterations •Steepest Descent Method aims at choosing the best direction at each iteration •Normalized steepest descent direction •Δ푥푛푠푑=푎푟푔푚푖푛훻푓푥푇푣푣=1} •Interpretation: for small 푣,푓푥+푣≈푓푥+훻푓푥푇푣direction Δ푥푛푠푑 is unit-norm step with most negative directional derivative •Iteratively, the algorithm follows the following steps •Calculate direction of descent Δ푥푛푠푑 •Calculate step size, t •푥+=푥+푡Δ푥푛푠푑 36
  • 37. Steepest Descent for various norms •The choice of norm used the steepest descent direction can be have dramatic effect on converge rate •푙2norm •The steepest descent direction is as follows •Δ푥푛푠푑= −훻푓(푥) 훻푓(푥)2 •푙1norm •For 푥1= 푖푥푖,a descent direction is as follows, •Δ푥푛푑푠=−푠푖푔푛 휕푓푥 휕푥푖 ∗푒푖 ∗ •푖∗=푎푟푔min 푖 휕푓 휕푥푖 •푙∞norm •For 푥∞=argmin 푖 푥푖,a descent direction is as follows •Δ푥푛푑푠=−푠푖푔푛−훻푓푥 37
  • 38. Steepest Descent for various norms 38 Quadratic Norm 푙1-Norm
  • 39. Steepest Descent for various norms •Example 39
  • 40. Steepest Descent Convergence Rate •Fact: Any norm can be bounded by ∙2, i.e., ∃훾, 훾∈(0,1] such that, 푥≥훾푥2푎푛푑푥∗≥훾푥2 •Theorem 5.5 •If f is strongly convex with respect to m and M, and ∙2has 훾, 훾as above then steepest decent with backtracking line search has linear convergence with rate •푐=1−2푚훼 훾2min1, 훽훾 푀 •Proof: Will be proved in the lecture 6 40
  • 42. Summary •Unconstrained Convex Optimization Problem •Gradient Descent Method •Step Size Trade-off between safety and speed •Convergence Conditions •L-LipschtizFunction •Strong Convexity •Condition Number 42
  • 43. Summary •Exact Line Search •Backtracking Line Search •Coordinate Descent Method •Good for parallel computation but not always converge •Steepest Descent Method •The choice of norm is important 43
  • 44. 44 END OF DOCUMENT