0% found this document useful (0 votes)
4 views33 pages

L14 OptimizationSingleVariable

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views33 pages

L14 OptimizationSingleVariable

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ED5340 - Data Science: Theory

and Practise
L14 - Optimization

Ramanathan Muthuganapathy (https://ed.iitm.ac.in/~raman)


Course web page: https://ed.iitm.ac.in/~raman/datascience.html
Moodle page: Available at https://courses.iitm.ac.in/
Why optimization

• Fundamental to machine and deep learning


• Cost function solving needs optimization (or solve using direct methods)
• Basic differential calculus / linear algebra

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimization

2 3
• Unconstrained (e.g. min J(w), e.g J(w) = w , J(w) = w ,
2
J(w) = w + 54/w)
• constrained optimization (e.g. min J(w), w > 0)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Unconstrained optimization

2 3
• Single variable (e.g. min J(w), e.g J(w) = w , J(w) = w ,
2
J(w) = w + 54/w)
2 2
• multivariable (e.g. min J(w0, w1) = (w0 − 2) + (w1 − 2) )

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable

2 3 4
• Single variable (e.g J(w) = w , J(w) = w , J(w) = w )
• min J(w)
• The value of w for which the function J(w) has the least (minimum) value
• Unimodal function
• Local minimum (in this case, this is also global minimum)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Demo of various power functions

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable
2
J(w) = w

2
• J(w) = w
′ J(w)
• J (w) = dJ(w)/dw

dw ( dw )
2
′′ d J(w) d dJ
• J (w) = =
dw 2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable

J (w) = dJ(w)/dw


• J (w) = dJ(w)/dw
• slope / tangent at a J(w)
point on the curve.

• Continuous curve

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable

J (w) = dJ(w)/dw


• J (w) = dJ(w)/dw
• slope / tangent at a J(w)
point on the curve.

• At minimum function

value, J (w) = 0

• The corresponding
w=w w

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable
dw ( dw )
2
′′ d J(w) d dJ
J (w) = =
dw 2

dw ( dw )
2
′′ d J(w) d dJ
• J (w) = =
dw 2
J(w)
• rate of change of
slope / tangent at a
point on the curve.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - single variable
dw ( dw )
2
′′ d J(w) d dJ
J (w) = =
dw 2

dw ( dw )
2
′′ d J(w) d dJ
• J (w) = =
dw 2
J(w)
• > 0 in the nbghd of
minimum point.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Critical Points

• Minimum
• Maximum
• Inflection

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - Maximum
2
J(w) = − w

2
• J(w) = − w
′ J(w)
• J (w) = 0, at w = w,
′′
• At w, J (w) < 0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - Inflection
3
J(w) = w

3
• J(w) = w
′ J(w)
• J (w) = 0, at w = w,
′′
• At w, J (w) is ?
′′′
• At w, J (w) is ?

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria ?
4
J(w) = w

4
• J(w) = w
′ J(w)
• J (w) = 0, at w = w,
′′
• At w, J (w) is ?

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria ?
4
J(w) = w

4
• J(w) = w
′ J(w)
• J (w) = 0, at w = w,
′′
• At w, J (w) is ?
′′′
• At w, J (w) is ?
′′′′
• At w, J (w) is ?
w

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - Generalization
From Kalyanmoy Deb

• Suppose at point w the first derivative is zero and the first nonzero higher
order derivative is denoted by n; then

• If n is odd, w an inflection point


• If n is even, w is a local optimum.
• (i) If the derivative is positive, w is a local minimum.
• (ii) If the derivative is negative, w is a local maximum.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - How to use

• Given a point on the curve, whether it belongs to any of the optimal ones
• The more pressing one - Given a function J(w), how to find the optimal points
(in our case, mostly ‘min’)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Methods to find local minimum
Unimodal functions

• Given a point on the curve, whether it belongs to any of the optimal ones
• The more pressing one - Given a function J(w), how to find the optimal points
(in our case, mostly ‘min’)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Methods to find local minimum
Unimodal functions

• First, a crude approach to find bounds


• Use a more sophisticated method to find the min
• In general, any method takes the following pattern:
• Identify initial guess and and their function values
• Make appropriate changes in the next values for w’s
• Continue the procedure till the termination is reached.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Methods (iterative) to find local minimum
Unimodal functions

• Bracketing methods
• Exhaustive search
• Bounding phase
• Region elimination approaches
• Interval halving
• Fibonacci search
• Golden section search
• Gradient-based ones
• Newton-Raphson
• Bisection
• Secant

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Bracketing - Exhaustive search method
Unimodal functions

• Let n be the number of


intermediate points .

• Step 1: J(w)

• Δw = (b − a)/n
• w1 = a, w2 = w1 + Δw, w3 = w2 + Δw
a w2 w3 b
w1

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Bracketing - Exhaustive search method
Unimodal functions

• Step 2:
• If J(w1) ≥ J(w2) ≤ J(w3)
J(w)
• then min lies between
(w1, w3)
• Else
a w1 w2 w3 b
• w1 = w2, w2 = w3, w3 = w2 + Δw
• Go to Step 3

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Bracketing - Exhaustive search method
Unimodal functions

• w1 = w2, w2 = w3, w3 = w2 + Δw
• Step 3:
J(w)
• Is w3 ≤ b, the go to
Step 2

• Otherwise, no min
exists between (a, b). a w1 w2 w3 b

• Min could be one of the


bdry points.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Overall idea

• w1 ≤ w2, if J(w1) ≥ J(w2)


• Region (a, w1) can be J(w)
eliminated

a w1 w2 b

w3

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Steps

• Step 1
• Choose
a, b, ϵ, wm = (a + b)/2, L = (b − a) J(w)

• Compute J(wm)
a w1 wm w2 b

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Steps

• Step 2
• Set
w1 = a + L/4, w2 = b − L/4 J(w)

• Compute J(w1), J(w2)


a w1 wm w2 b

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Steps

• Step 3
• If J(w1) < J(wm) J(w)
• set
b = wm, wm = w1, go
to Step 5 a w1 wm w2 b
• Else
• go to Step 4

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Steps

• Step 4
• If J(w2) < J(wm) J(w)
• set
a = wm, wm = w2; go
to Step 5 a w1 wm w2 b
• Else a wm b

• a = w1, b = w2; go
to Step 5

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Region elimination method
Steps

• Step 5
• Calculate L = b − a
J(w)
• If | L | < ϵ
• Terminate a w1 wm w2 b
• Else
• go to Step 2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient-based approaches

J (w) = dJ(w)/dw


• uses J (w) = dJ(w)/dw
and other higher order
derivatives. J(w)

• Exact / Numerical
approach for derivative

• Mim —-> point where



J (w) ≈ 0
w

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Newton-Raphson

J (w) = dJ(w)/dw

(k)
• uses w to compute (1)
(k+1) w
w
J(w) w (1)

• Using Taylor’s appox,



(k+1) (k) J (w)
• w = w −
J (w)
′′

• k = 1 to N (num of iterations)
w
(1)
• Initial guess w is needed.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Newton-Raphson
Steps

(1) ′ (1)
• Step 1: Choose w , ϵ, set k = 1. Compute J (w )
′′ (k)
• Step 2: Compute J (w )

(k+1) (k) J (w) ′ (k+1)
Step 3: Calculate w = w − . Compute J (w )
• J′′(w)
′ (k+1)
• Step 4: If | J (w ) < ϵ | , Terminate. Else set k = k + 1 go to Step 2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

You might also like