0% found this document useful (0 votes)
17 views

Cheatsheet

Uploaded by

garryrem80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Cheatsheet

Uploaded by

garryrem80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Optimization in DS Katholische Universität Eichstätt-Ingolstadt

winter semester 2023/2024 Thomas Jahn

Cheat sheet
Notation
∇f (x) gradient, ∇2 f (x) Hessian
A ⪰ B means B − A is positive semidefinite
A ≻ B means B − A is positive definite
x ≥ 0 means that all entries of the vector x are ≥ 0.

Special functions and sets


linear function f (x) = Ax with a matrix A
affine function f (x) = Ax + b with a matrix A and a vector b
quadratic function f (x) = x⊤ Ax + b⊤ x + r with a matrix A, a vector b, a number r
µ-strongly convex function (just “convex” if µ = 0)
µ 2
f (λx + (1 − λ)y) + λ(1 − λ) ∥x − y∥2 ≤ λf (x) + (1 − λ)f (y) for all x, y ∈ Rd , λ ∈ (0, 1)
2
f cont.diff. µ 2
⇐⇒ f (x) − f (y) ≥ ∇f (y)⊤ (x − y) + ∥x − y∥2 for all x, y ∈ Rd
2
f twice cont.diff.
⇐⇒ ∇2 f (x) ⪰ µI for all x ∈ Rd

L-smooth function

∥∇f (x) − ∇f (y)∥2 ≤ L ∥x − y∥2 for all x, y ∈ Rd


f cont.diff. L 2 L 2
⇐⇒ f (y) + ∇f (y)⊤ (x − y) − ∥x − y∥2 ≤ f (x) ≤ f (y) + ∇f (y)⊤ (x − y) + ∥x − y∥2 for all x, y ∈ Rd
2 2
f twice cont.diff.
⇐⇒ −LI ⪯ ∇2 f (x) ⪯ LI for all x ∈ Rd

convex set S ⊂ Rd : λx + (1 − λ)y ∈ S for all x, y ∈ S, all λ ∈ [0, 1].

Algorithms, rates of convergence

k→∞ ∥x(k+1) −x∗ ∥2


• x(k) −→ x∗ q-linearly ⇐⇒ limk→∞ ∈ (0, 1)
∥x(k) −x∗ ∥2
k→∞ ∥x(k+1) −x∗ ∥
• x(k) −→ x∗ q-superlinearly ⇐⇒ limk→∞ x(k) −x∗ 2 = 0
∥ ∥2
∥x(k+1) −x∗ ∥2
• The convergence is q-quadratic ⇐⇒ limk∈∞ 2 < ∞.
∥x(k) −x∗ ∥2

Unconstrained optimization
min f (x) over x ∈ Rd

• 1st-order necessary conditions: x∗ ∈ Rd be a local minimizer of cont.diff. f : Rd → R =⇒ ∇f (x∗ ) = 0.


• 2nd-order necessary conditions: x∗ ∈ Rd be a local minimizer of twice cont.diff. f : Rd → R =⇒ ∇2 f (x∗ ) ⪰ 0
• 2nd-order sufficient condition: f : Rd → R twice cont.diff., ∇f (x∗ ) = 0, ∇2 f (x∗ ) ≻ 0 =⇒ x∗ strict local
minimizer of f
• convex objective function: locally optimal solutions are optimal solutions, set of optimal solutions is convex
but may be empty
• strongly convex, cont.diff. objective function: there exists a unique optimal solution

Line search methods


Setup:
• general form: x(k+1) = x(k) + αk p(k)
• search directions: gradient method p(k) = −∇f(x(k) ), Newton p(k) = −∇2 f (x(k) )−1 ∇f (x(k) )
• step sizes: constant αk = α, Armijo αk = max ᾱρj : f (x(k) + ᾱρj p(k) ) ≤ f (x(k) ) + c1 ᾱρj ∇f (x(k) )⊤ p(k)

1
Theorems:
• f cont.diff., gradient method with Armijo step size =⇒ every accumulation point of (x(k) )k∈N0 is a stationary
point of f
• f cont.diff., µ-strongly convex, L-smooth, gradient method with αk = L1 =⇒ (x(k) )k∈N0 and (f (x(k) ))k∈N0
converge q-linearly
• f twice cont.diff., ∇f (x∗ ) = 0, ∇2 f (x∗ ) invertible =⇒ for x(0) ≈ x∗ , αk = 1 and Newton directions p(k) ,
convergence x(k) → x∗ is q-superlinear (q-quadratic when ∇2 f is Lipschitz)
Linear optimization min c⊤ x over x ∈ Rd s.t. constraints of the form a⊤ x ≤ b, a⊤ x ≥ b, or a⊤ x = b for given
a, c ∈ Rd and b ∈ R.

• c is called cost vector.


• standard form: min c⊤ x s.t. Ax = b and x ≥ 0. We require b ≥ 0.
• Locally optimal solutions are global ones. (Linearity implies convexity.)
• Set of optimal solutions = a face of the feasible set (a polyhedral set), i.e., a vertex, an edge, . . . , or the whole
feasible set.
• If there are optimal solutions, there is a solution among the vertices of the feasible set.
• simplex algorithm: cleverly traversing the vertices of the feasible set
• reduced costs: A⊤ B y = cB , zN = cN − AN y

• Optimality test: zN ≥ 0 ⇒ STOP


• Choose k ∈ N with zk < 0 and solve AB w = Ak . Unboundedness test: w ≤ 0 ⇒ STOP.
x
• Ratio test: Compute wjj for j ∈ B with wj > 0. Call the smallest value t and the corresponding index r
• Update xB ← xB − tw, xk ← t, k enters B, r leaves B.
• Phase 1 problem: min v1 + . . . + vm over (x, v) s.t. Ax + v = b, x ≥ 0, v ≥ 0, start simplex algorithm with
feasible basis solution (0, b)
• dual problem: max b⊤ y over (y, z) s.t. A⊤ y + z = c, z ≥ 0
• dual variables y, z are computed in the simplex algorithm
• weak duality: b⊤ y ≤ c⊤ x “dual values are less or equal than primal values”
• no duality gap: If b⊤ y = c⊤ x, then x is optimal for the primal problem, (y, z) optimal for the dual problem
• strong duality gap: If one problem is solvable, then both of them are, and the optimal values coincide. If one
problem is unbounded, then the other is infeasible.
• Lagrangian: L(x, y) = c⊤ x + (b − Ax)⊤ y
• KKT conditions: Ax = b, x ≥ 0, A⊤ y + z = c, z ≥ 0, x⊤ z = 0
• KKT conditions have a solution if and only if the primal problem has an optimal solution if and only if the
dual problem has an optimal solution.

Nonlinear constrained optimization min f (x) over x ∈ Rd s.t. gj (x) ≤ 0, hi (x) = 0 with f, gj , hi continuously
differentiable, let C be the feasible set
n (k)
o
• tangent cone TC (x) = y ∈ Rd : ∃ (x(k) ∈ C N , αk ∈ (0, ∞)N , x(k) → x, αk → 0, x αk−x → y
• first-order necessary  optimality condition: x∗ is a locally optimal solution ⇒ ∇f (x∗ )⊤ y ≤ 0 for all y ∈ TC (x∗ )
• polar cone: K = y : y x ≤ 0 ∀ x ∈ K
◦ ⊤

• first-order necessary optimality  condition again: x∗ is a locally optimal solution ⇒ −nablaf (x∗ ) ∈ TC (x∗ )◦
• linearized tangent cone F (x) = y ∈ Rd : ∇hi (x)⊤ y = 0 ∀ i, ∇gj (x)⊤ y ≤ 0 ∀ j with gj (x) = 0
• always TC (x) ⊆ F (x) and F (x)◦ ⊆ TC (x)◦
• F (x) depends
nP on the constraint functions, TC (x) only dependso on the feasible set
• F (x)◦ = (x) + (x) : 0,
P
µ
j:gj (x)=0 j ∇g j λ h
i i i µj ≥ λi ∈ R
• F (x)◦ = TC (x)◦ is true when constraint qualifications are satisfied
• LICQ is satisfied at x: All the vectors ∇gj (x) and ∇hi (x) are linearly independent (all i, only those j for
which gj (x) = 0)

You might also like