0% found this document useful (0 votes)
55 views70 pages

L01 Intro

Uploaded by

Josh Low
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views70 pages

L01 Intro

Uploaded by

Josh Low
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Lecture 1: Introduction

DSA3102 Essential Data Analytics Tools: Convex Optimisation

Lam Xin Yee

[email protected]
S17-08-11

Past contributors of the material - Prof Tan Geok Choo, Prof Toh Kim Chuan, Prof Pang
Chin How Jeffrey.
Pre-requisites

Pre-requisites: Linear algebra, multivariate calculus, some programming


knowledge (preferably Matlab or python).

Python will be used to present algorithms in the lecture most of the


time, so it might be good to install in your computer. Refer to
https://www.python.org/ for installation guide.
A handy interactive web tool for running Python: Jupyter Notebook
(https://jupyter.org/)
For some algorithms, MATLAB code will be provided as well as
supplementary material. Refer to https://nusit.nus.edu.sg/
services/software_and_os/software/software-student/ for
installation guide of MATLAB.
Optimization model
Optimization models attempt to express the goal of solving a problem in
the “best” way in mathematical terms.

Examples:
Optimal time management
Optimal allocation of resources
Optimal design of manufacturing processes and instruments
Technique in machine learning

Types of optimization models:


1 Linear objective function over linear constraints (MA3252 linear and

network optimization)
2 Nonlinear objective function over convex sets (This course)

3 Linear/nonlinear objective function over discrete sets (MA4254

discrete optimization)
4 ...
Simple examples

Example I. If K units of capital and L units of labor are used, a company


can produce KL units of a product.
Capital can be purchased at $4 per unit and labor can be purchased
at $1 per unit.
A total of $8000 is available to purchase capital and labor.
How can the firm maximize the quantity of the product manufactured?
Simple examples

Example I. If K units of capital and L units of labor are used, a company


can produce KL units of a product.
Capital can be purchased at $4 per unit and labor can be purchased
at $1 per unit.
A total of $8000 is available to purchase capital and labor.
How can the firm maximize the quantity of the product manufactured?

Solution. Let K = units of capital purchased, and L = units of labor


purchased. The problem to solve is:

maximize KL
s.t. 4K + L ≤ 8000, K , L ≥ 0.
Simple examples

Example II. It costs a company $c to produce a unit of a product. If the


company charges $p per unit, and the customers demand D(p) units,
what price should the company charge to maximize its profit?
Simple examples
Example III. Two products A, B are produced on the same machine using
the same raw material, of which 200kg are available.
Product A uses 2kg and product B uses 3kg of the material per unit
produced.
The machine is available for 50 hours.
Product A requires 30 minutes and Product B requires 20 minutes of
machine time for each unit produced.
If the profit for one unit of product A and B are $150 and $300
respectively, and the manufacturing cost is 3x12 for x1 units of Product A
and 5x22 for x2 units of Product B, determine how many units of A and B
should be produced to maximize the total net profit.
Example: portfolio selection

Consider an investor who has a certain amount of money to be invested in


a number of different securities (stocks, bonds, etc) with random returns.
Example: portfolio selection

Consider an investor who has a certain amount of money to be invested in


a number of different securities (stocks, bonds, etc) with random returns.
For each security i = 1, . . . , n, estimates of its expected return µi and
variance σi2 are given.
For any two securities i and j, their correlation coefficient ρij is also
assumed to be known.
Let the proportion of the total funds invested in security i be xi . The
vector x = [x1 ; . . . ; xn ] is called a portfolio vector.

expected return of x = E [x] = x1 µ1 + · · · + xn µn = µT x


X
variance of x = Var [x] = ρij σi σj xi xj = xT Qx
i,j

where Qij = ρij σi σj and µ = [µ1 ; . . . ; µn ].


P
Also, i xi = 1, xi ≥ 0 ∀ i = 1, . . . , n.
Example: portfolio selection (cont’d)

For a given target expected return R, a valid portfolio vector x is called


efficient if it has the minimum variance among all portfolios that have at
least expected return R.

Markowitz’s efficient portfolio (also called mean-variance) optimization


problem:

min xT Qx
x
n
X
s.t. xi = 1,
i=1

µT x ≥ R,
xi ≥ 0, i = 1, . . . , n.
Example: linear regression
Application: to estimate a quantity of interest from several observed
variables
Eg. given floor area, location, built year, house types etc., predict the
price of a house
Data: (a1 , b1 ), ..., (am , bm )
Input: ai ∈ Rp
Output: bi ∈ R
For a linear model, assume:

f (a) = x̄1 ai1 + x̄2 ai2 + · · · + x̄p aip + α + i = bi ,

where x̄ ∈ Rp and α ∈ R are the unknown coefficient vector and


offset, and i is stochastic noise that satisfies various assumptions
(e.g. independent and identically distributed (i.i.d.), normally
distributed).
We wish to learn the parameters x̄ and α from the data.
Example: linear regression (cont’d)

1.6

1.4

b=a+

1.2

=0.5 : y-intercept
errors i

0.8

0.6
data points (ai , bi )

0.4

0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

This goal can be formulated as the optimization problem


n1 X m o
min (bi − aiT x̄ − α)2 | x̄ ∈ Rp , α ∈ R
2
i=1
Example: sensor network localization and molecular
conformation
Aim: find the positions of atoms in a molecule (typically a protein
molecule) given distances estimated from Nuclear Magnetic Resonance
spectroscopy.

Figure: A sensor network with 4 anchors (filled squares). An edge in the graph
means that the sensor-sensor or sensor-anchor pairs are within the radio range R.
Example: sensor network localization and molecular
conformation (cont’d)
[Unknown] Coordinates of n sensors: xj ∈ Rd , j = 1, . . . , n
[Known]
Coordinate of m anchors: ai ∈ Rd , i = 1, . . . , m
Pairwise distances of the sensors and anchors within the radio range R:
kai − xj k ≈ fij ∀ (i, j) ∈ M, M := {(i, j) : kai − xj k ≤ R}
kxi − xj k ≈ dij ∀ (i, j) ∈ N , N := {(i, j) : kxi − xj k ≤ R}
k · k is the Euclidean norm (length) defined by
q
kyk = y12 + y22 + · · · + yn2 , for a vector y ∈ Rn .
Because of noise, the distances fij and dij are not estimated exactly
but only approximately.
The problem to solve is the following:
X X
min (kxi − xj k2 − dij2 )2 + (kai − xj k2 − fij2 )2 .
x1 ,...,xn ∈Rd
(i,j)∈N (i,j)∈M
General nonlinear programming (NLP) problems

Minimize (or Maximize) f (x)


x
Subject to x ∈ S ⊆ Rn

Variable x = (x1 , x2 , · · · , xn )T is a column vector in Rn


The function f : Rn → R which we wish to minimize (or maximize) is
known as the objective function.
S is known as the feasible set.
A solution or a point in the feasible set is a feasible solution or a
feasible point; otherwise, it is an infeasible solution or infeasible
point.
Terminology and notation

Minimize (or Maximize) f (x)


x
Subject to x ∈ S ⊆ Rn

(a) For a minimization problem, a feasible solution x∗ for which


f (x∗ ) ≤ f (x) for all feasible solutions x ∈ S is called an optimal
solution to the NLP. We can write

x∗ = argmin f (x)
x∈S
Terminology and notation

Minimize (or Maximize) f (x)


x
Subject to x ∈ S ⊆ Rn

(a) For a minimization problem, a feasible solution x∗ for which


f (x∗ ) ≤ f (x) for all feasible solutions x ∈ S is called an optimal
solution to the NLP. We can write

x∗ = argmin f (x)
x∈S

(b) For a maximization problem, a feasible solution x∗ is optimal if


f (x∗ ) ≥ f (x) for all feasible solutions x ∈ S is called an optimal
solution to the NLP. We can write

x∗ = argmax f (x)
x∈S

The value of f (x∗ ) is then called the optimal value.


Unboundedness

Minimize (or Maximize) f (x)


x
Subject to x ∈ S ⊆ Rn

(a) For a minimization problem, the objective value is said to be


unbounded (the optimal value is −∞) if
∀K , ∃x ∈ S such that f (x) < K .
Unboundedness

Minimize (or Maximize) f (x)


x
Subject to x ∈ S ⊆ Rn

(a) For a minimization problem, the objective value is said to be


unbounded (the optimal value is −∞) if
∀K , ∃x ∈ S such that f (x) < K .
(b) For a maximization problem, the objective value is said to be
unbounded (the optimal value is +∞)
if ∀K , ∃x ∈ S such that f (x) > K .
In these cases, we say the NLP is unbounded.
Symmetry

The following optimization problems are equivalent:

Maximize f (x) Minimize −f (x)



Subject to x∈S Subject to x ∈ S

x∗ ∈ S is an optimal solution for the maximization problem with


optimal objective value f (x∗ ) if and only if it is also an optimal
solution for the minimization problem with optimal objective value
−f (x∗ ).
Else, both problems are infeasible or both problems have unbounded
objective value.
We will mostly focus on the discussion of minimization problems.
Topics
An unconstrained nonlinear programme

Minimize (or Maximize) f (x)


x
Subject to x∈X

Objective function f : Rn → R is a nonlinear function of x


Feasible set X is an open subset of Rn

Definition 1.1 (Open set)


A subset S ⊆ Rn is open if for every x ∈ S there exists  > 0 such that
the open ball B(x, ) ⊆ S. Here, the open ball centered at x having radius
 is defined by
B(x; ) := {y ∈ Rn : ky − xk < }.
An unconstrained nonlinear programme

Minimize (or Maximize) f (x)


x
Subject to x∈X

Objective function f : Rn → R is a nonlinear function of x


Feasible set X is an open subset of Rn

Definition 1.1 (Open set)


A subset S ⊆ Rn is open if for every x ∈ S there exists  > 0 such that
the open ball B(x, ) ⊆ S. Here, the open ball centered at x having radius
 is defined by
B(x; ) := {y ∈ Rn : ky − xk < }.

Example of an unconstrained NLP: linear regression, the molecular


conformation problem
A constrained nonlinear programme

Minimize (or Maximize) f (x)


x
Subject to gi (x) = 0, i = 1, 2, 3, · · · , m,
hj (x) ≤ 0, j = 1, 2, 3, · · · , p,

f : Rn → R is the objective function


Each gi : Rn → R is an equality constraint
Each hj : Rn → R is an inequality constraint
Some functions of f , gi , hj are nonlinear. In this course, we assume
that f , gi , hj are continuous functions.
Feasible set
S := {x ∈ Rn | gi (x) = 0, i = 1, 2, ..., m, hj (x) ≤ 0, j = 1, 2, ..., p}
is a closed subset of Rn
Example of a constrained NLP: portfolio selection model
Closed set
Remark. A set S is closed if its complement is open.
Closed set
Remark. A set S is closed if its complement is open.
Example 1.2
Determine if the following set is open or closed.
1 {x ∈ R | a < x < b}
 
x1
x =  ...  ∈ Rn | ai ≤ xi ≤ bi , i = 1, 2, · · · , n
n o
2
 

xn
3 Rn
4 ∅
Closed set
Remark. A set S is closed if its complement is open.
Example 1.2
Determine if the following set is open or closed.
1 {x ∈ R | a < x < b}
 
x1
x =  ...  ∈ Rn | ai ≤ xi ≤ bi , i = 1, 2, · · · , n
n o
2
 

xn
3 Rn
4 ∅
By default, a NLP where the feasible set is Rn is classified as an
unconstrained NLP.
Closed set
The following result is useful to show if a set is closed.
Proposition 1
Let g : Rn → R be a continuous function. Then
(a) The set S = {x ∈ Rn | g (x) ≤ 0} is closed.
(b) The set S = {x ∈ Rn | g (x) ≥ 0} is closed.
(c) The set S = {x ∈ Rn | g (x) = 0} is closed.
Closed set
The following result is useful to show if a set is closed.
Proposition 1
Let g : Rn → R be a continuous function. Then
(a) The set S = {x ∈ Rn | g (x) ≤ 0} is closed.
(b) The set S = {x ∈ Rn | g (x) ≥ 0} is closed.
(c) The set S = {x ∈ Rn | g (x) = 0} is closed.

Example 1.3
 
x
Verify that the set C = { ∈ R2 } is closed.
x2
Solution. Note that
   
x 2 x1
C ={ ∈R }={ ∈ R2 | g (x) := x2 − x12 = 0}.
x2 x2

Since g is continuous, C is closed by Proposition 1 (c).


Closed set

Remark. Unions and intersections of closed sets are closed.


Closed set

Remark. Unions and intersections of closed sets are closed.


Corollary 1.4
Suppose gi , hj : Rn → R are continuous. The set

S = {x ∈ Rn | gi (x) = 0, i = 1, 2, · · · , m; hj (x) ≤ 0, j = 1, 2, · · · , p}

is closed.
Note that S is the feasible set of the constrained NLP.
Closed set

Remark. Unions and intersections of closed sets are closed.


Corollary 1.4
Suppose gi , hj : Rn → R are continuous. The set

S = {x ∈ Rn | gi (x) = 0, i = 1, 2, · · · , m; hj (x) ≤ 0, j = 1, 2, · · · , p}

is closed.
Note that S is the feasible set of the constrained NLP.

Example 1.5
The set {[x1 ; x2 ] ∈ R2 | x12 + x22 ≤ 3, x1 − 2 sin x2 ≥ 0} is closed.
Example 1.6
Change the following into the standard formulation

maxx1 ,x2 x1 x2
s.t. x1 ≥ 0, x2 ≥ 0,
x1 + x2 = 24.

Is this an unconstrained or constrained NLP?


In the remaining section, we will learn
how to solve simple optimization models (in R2 ) using graphical
methods.
notion of local vs global minimizers
special case where global optimizers are guarantee to exist
Example 1.7 (An NLP (in R2 ) with linear constraints but nonlinear
objective function.)
minimize f (x) = (x1 − 4)2 + (x2 − 6)2
subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.
Example 1.7 (An NLP (in R2 ) with linear constraints but nonlinear
objective function.)
minimize f (x) = (x1 − 4)2 + (x2 − 6)2
subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

• (x1 − 4)2 + (x2 − 6)2 = r 2 (r > 0)


describes a circle with center (4, 6)T
and radius r .

Figure: A 2-variable minimization problem with


minimizer occurring on the boundary of the
feasible region. Contour values are 0, 1, 2, 3, 4.
Example 1.7 (An NLP (in R2 ) with linear constraints but nonlinear
objective function.)
minimize f (x) = (x1 − 4)2 + (x2 − 6)2
subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

• (x1 − 4)2 + (x2 − 6)2 = r 2 (r > 0)


describes a circle with center (4, 6)T
and radius r .
• Graphically, we want to find the
shortest distance of feasible points
from the point (4, 6)T .

Figure: A 2-variable minimization problem with


minimizer occurring on the boundary of the
feasible region. Contour values are 0, 1, 2, 3, 4.
Example 1.7 (An NLP (in R2 ) with linear constraints but nonlinear
objective function.)
minimize f (x) = (x1 − 4)2 + (x2 − 6)2
subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

• (x1 − 4)2 + (x2 − 6)2 = r 2 (r > 0)


describes a circle with center (4, 6)T
and radius r .
• Graphically, we want to find the
shortest distance of feasible points
from the point (4, 6)T .
• We see that the minimizer x∗ must
occur on the boundary defined by the
Figure: A 2-variable minimization problem with
minimizer occurring on the boundary of the
line 3x1 + 2x2 = 18.
feasible region. Contour values are 0, 1, 2, 3, 4.
Example 1.7 (cont’d)

minimize f (x) = (x1 − 4)2 + (x2 − 6)2


subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

We can reduce the 2-variable problem into a single variable


optimization problem.
The optimal solution occurs on the line 3x1 + 2x2 = 18.
Substituting x2 = 9 − 32 x1 into f (x), we obtain a function in x1

f (x) = (x1 − 4)2 + (3 − 1.5x1 )2 =: g (x1 ), 2 ≤ x1 ≤ 4.


Example 1.7 (cont’d)

minimize f (x) = (x1 − 4)2 + (x2 − 6)2


subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

We can reduce the 2-variable problem into a single variable


optimization problem.
The optimal solution occurs on the line 3x1 + 2x2 = 18.
Substituting x2 = 9 − 32 x1 into f (x), we obtain a function in x1

f (x) = (x1 − 4)2 + (3 − 1.5x1 )2 =: g (x1 ), 2 ≤ x1 ≤ 4.

Use 1-variable calculus to determine a global minimizer of g (x1 ).


g 0 (x1 ) = 2(x1 − 4) − 3(3 − 1.5x1 ) = 13 34
2 x1 − 17 = 0 yields x1 = 13 .
The optimal solution of the above NLP is x∗ = [ 13 34 66
; 13 ], with optimal
∗ 468
value f (x ) = 169 .
Example 1.7 (cont’d)

minimize f (x) = (x1 − 4)2 + (x2 − 6)2


subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

Question. Can we always use the above technique to reduce number of


variables?
Example 1.7 (cont’d)

minimize f (x) = (x1 − 4)2 + (x2 − 6)2


subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

Question. Can we always use the above technique to reduce number of


variables?

Note: No! In the above, we assumed that we know the optimal solution is
on the boundary defined by the constraint 3x1 + 2x2 = 18. Such an
information is in general not known a priori.
Example 1.7 (cont’d)

minimize f (x) = (x1 − 4)2 + (x2 − 6)2


subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.

Programming practice: write a python code to generate the following figure.


Example 1.8 (Same as Example 1.7 but with a different objective
function.)
minimize f (x) = (x1 − 2)2 + (x2 − 2)2
subject to x1 ≤ 4
x2 ≤ 6
3x1 + 2x2 ≤ 18
x1 , x2 ≥ 0.
9
• Minimizer x∗ = [2; 2] — in the interior
8 f(x) = (x1 − 2)2 + (x2 − 2)2

7
of the feasible region.
6

5
80
x2

3 60

2
40
1

0
0 2 4 6 8 20
x1

0
0
Figure: A 2-variable minimization 5
2
4
6
8

10 0
problem with minimizer occurring in the x2

interior of the feasible region. Contour


values are 0, 1, 2, 3, 4. Figure: 3D plot.
Discussion of example 1.7 and 1.8

How do we know whether the minimizer would occur on the boundary


or in the interior of the feasible region? Is it very important to know
this piece of information before we can find the minimizer?
What necessary conditions must the minimizer satisfies?
How do we know a given point x∗ is a minimizer if we cannot
visualize the graph of f (x)?

We will discuss these questions in topic 2 and topic 4.


Example 1.9 (A 2-variable NLP with a nonlinear objective function
and a nonlinear constraint.)

minimize x12 − x2
subject to 2x1 − x2 ≤ 4
9x12 + 25x22 ≤ 225
x1 , x2 ≥ 0.
Example 1.9 (A 2-variable NLP with a nonlinear objective function
and a nonlinear constraint.)

minimize x12 − x2
subject to 2x1 − x2 ≤ 4
9x12 + 25x22 ≤ 225
x1 , x2 ≥ 0.

8
Feasible region is bounded, but
6
not a polygon.
4

2 the equation
α(x1 − a)2 + β(x2 − b)2 = r 2
x2

−2
(α > 0, β > 0) describes an
−4

−6
ellipse.
−8
−5 0 5
x1

Figure: A 2-variable minimization


problem with a non-polygonal feasible
region.
Example 1.10 (Same as example 1.9 but with different feasible region)

minimize x12 − x2
subject to 2x1 − x2 ≤ 4
9x12 + 25x22 ≥ 225
x1 , x2 ≥ 0.
8

2
Its feasible region is unbounded and non-
convex.
x2

-2

-4

-6
In this example, is the optimal objective value
-8
-8 -6 -4 -2 0 2 4 6 8
unbounded as well?
x1
Example 1.10 (Same as example 1.9 but with different feasible region)

minimize x12 − x2
subject to 2x1 − x2 ≤ 4
9x12 + 25x22 ≥ 225
x1 , x2 ≥ 0.
8

2
Its feasible region is unbounded and non-
convex.
x2

-2

-4

-6
In this example, is the optimal objective value
-8
-8 -6 -4 -2 0 2 4 6 8
unbounded as well?
x1

Can we have an optimization problem with unbounded feasible region, but


finite optimal objective value?
How good is the graphical method?

Pros:
Good for intuition
Provide rough solution

Cons:
Possible only for problems with 1 or 2 variables
Requires the computation of the function values on a dense grid of
points is a costly task in general
May need to rely on graph plotting toolbox (eg. python/MATLAB)

We need to devise some other algebraic methods to solve higher


dimensional problem!
Local vs global minimizers
First, we study the notion on local vs global minimizers.
8 2 x 2
f(x) = sin(pi*x) e − x
6

0
f(x)

−2

−4

−6

−8

−10
0 0.5 1 1.5 2 2.5 3
x

What is the minimizer of f for the entire region [0, 3]?


What is the minimizer of f for a small region [1.8, 2.3]?
Inner product/ Dot product

Definition 1.11 (Inner product/ Dot product)


   
x1 y1
 x2   y2 
The inner product of vectors x =   and y =   ∈ Rn is
   
.. ..
 .   . 
xn yn
defined as
n
X
T
hx, yi = x y = xi yi = x1 y1 + x2 y2 + · · · + xn yn .
i=1

Note that we also have hx, yi = kxkkyk cos(θ), where θ is the angle
between x and y.
Euclidean norm

Definition 1.12 (Euclidean norm)


Suppose x = (x1 , x2 , · · · , xn )T ∈ Rn . The Euclidean norm of x is defined
as follows q
||x|| = (x12 + x22 + · · · + xn2 ).

Note that ||x||2 = xT x.

Properties
(a) x ∈ Rn =⇒ kxk ≥ 0;
(b) kxk = 0 ⇐⇒ x = 0.
Euclidean norm

Definition 1.12 (Euclidean norm)


Suppose x = (x1 , x2 , · · · , xn )T ∈ Rn . The Euclidean norm of x is defined
as follows q
||x|| = (x12 + x22 + · · · + xn2 ).

Note that ||x||2 = xT x.

Properties
(a) x ∈ Rn =⇒ kxk ≥ 0;
(b) kxk = 0 ⇐⇒ x = 0.
x ∈ Rn , λ ∈ R =⇒ kλxk = |λ| kxk.
Euclidean norm

Definition 1.12 (Euclidean norm)


Suppose x = (x1 , x2 , · · · , xn )T ∈ Rn . The Euclidean norm of x is defined
as follows q
||x|| = (x12 + x22 + · · · + xn2 ).

Note that ||x||2 = xT x.

Properties
(a) x ∈ Rn =⇒ kxk ≥ 0;
(b) kxk = 0 ⇐⇒ x = 0.
x ∈ Rn , λ ∈ R =⇒ kλxk = |λ| kxk.
Triangle inequality: x, y ∈ Rn =⇒ kx + yk ≤ kxk + kyk.
Euclidean norm

Definition 1.12 (Euclidean norm)


Suppose x = (x1 , x2 , · · · , xn )T ∈ Rn . The Euclidean norm of x is defined
as follows q
||x|| = (x12 + x22 + · · · + xn2 ).

Note that ||x||2 = xT x.

Properties
(a) x ∈ Rn =⇒ kxk ≥ 0;
(b) kxk = 0 ⇐⇒ x = 0.
x ∈ Rn , λ ∈ R =⇒ kλxk = |λ| kxk.
Triangle inequality: x, y ∈ Rn =⇒ kx + yk ≤ kxk + kyk.
Cauchy-Schwarz inequality: x, y ∈ Rn =⇒ |xT y| ≤ kxk · kyk.
Equality holds if and only if x and y are parallel (i.e. x = λy or
y = λx, for some λ ∈ R).
Euclidean norm

q
||x|| = (x12 + x22 + · · · + xn2 ).
Properties(cont’d):
(a) xT y = kxk · kyk ⇐⇒ x = λy or y = λx, for some λ ≥ 0 (i.e. the
maximum value of xT y occurs whenever x and y are vectors in the
same direction).
(b) xT y = −kxk · kyk ⇐⇒ x = λy or y = λx, for some λ ≤ 0 (i.e.
the minimum value of xT y occurs whenever x and y are vectors in the
opposite direction).
Local minimizer and global minimizer

Definition 1.13 (Local minimizer and global minimizer)


Let S be a subset of Rn . Define B (y) = {x ∈ Rn | kx − yk < } to be the open
ball with center y and radius .
Local minimizer and global minimizer

Definition 1.13 (Local minimizer and global minimizer)


Let S be a subset of Rn . Define B (y) = {x ∈ Rn | kx − yk < } to be the open
ball with center y and radius .
1 A point x∗ ∈ S is said to be a local minimizer of f (x) if there exists an
 > 0 such that f (x) ≥ f (x∗ ) for all x ∈ S ∩ B (x∗ ).
Local minimizer and global minimizer

Definition 1.13 (Local minimizer and global minimizer)


Let S be a subset of Rn . Define B (y) = {x ∈ Rn | kx − yk < } to be the open
ball with center y and radius .
1 A point x∗ ∈ S is said to be a local minimizer of f (x) if there exists an
 > 0 such that f (x) ≥ f (x∗ ) for all x ∈ S ∩ B (x∗ ).
If f (x) > f (x∗ ) for all x ∈ S ∩ B (x∗ ) − {x∗ }, then x∗ is said to be a strict
local minimizer of f (x).
Local minimizer and global minimizer

Definition 1.13 (Local minimizer and global minimizer)


Let S be a subset of Rn . Define B (y) = {x ∈ Rn | kx − yk < } to be the open
ball with center y and radius .
1 A point x∗ ∈ S is said to be a local minimizer of f (x) if there exists an
 > 0 such that f (x) ≥ f (x∗ ) for all x ∈ S ∩ B (x∗ ).
If f (x) > f (x∗ ) for all x ∈ S ∩ B (x∗ ) − {x∗ }, then x∗ is said to be a strict
local minimizer of f (x).
2 A point x∗ ∈ S is said to be a global minimizer of f (x) if f (x) ≥ f (x∗ ) for
all x ∈ S.
If f (x) > f (x∗ ) ∀ x ∈ S − {x∗ }, then x∗ is said to be a strict global
minimizer of f (x).
Local minimizer and global minimizer

Definition 1.13 (Local minimizer and global minimizer)


Let S be a subset of Rn . Define B (y) = {x ∈ Rn | kx − yk < } to be the open
ball with center y and radius .
1 A point x∗ ∈ S is said to be a local minimizer of f (x) if there exists an
 > 0 such that f (x) ≥ f (x∗ ) for all x ∈ S ∩ B (x∗ ).
If f (x) > f (x∗ ) for all x ∈ S ∩ B (x∗ ) − {x∗ }, then x∗ is said to be a strict
local minimizer of f (x).
2 A point x∗ ∈ S is said to be a global minimizer of f (x) if f (x) ≥ f (x∗ ) for
all x ∈ S.
If f (x) > f (x∗ ) ∀ x ∈ S − {x∗ }, then x∗ is said to be a strict global
minimizer of f (x).
3 Similarly, for (strict) local or global maximizer, replace the inequality by
f (x) ≤ f (x∗ ) or f (x) < f (x∗ ) appropriately.

By definitions, a global minimizer is a local minimizer. However, the converse is


not true in general.
Example 1.14
Consider the following 1-dimensional problem:
minimize f (x) := sin(πx)2 exp(x) − x 2
subject to 0≤x ≤3

8 2 x 2
f(x) = sin(pi*x) e − x
6

0
f(x)

−2

−4

−6

−8

−10
0 0.5 1 1.5 2 2.5 3
x
Example 1.15
Consider the following NLP in R2 .
minimize f (x) := x2
subject to 10 − (x1 − 3)(x1 − 1)2 − x2 ≤ 0
0 ≤ x1 ≤ 4
20
 
∗ 4
1 x = is a global minimizer.
1
15
 
1
2 x̂ = is a local minimizer. It is a feasible
10
x2

10

point and it gives a minimum


 objective
 value in
5
1
the local vicinity of x̂ = .
10
0
−2 0 2 4 6
x1
3 x̂ is a local minimizer but not a global minimizer.
Continuous function on closed and bounded set
When do we guarantee to have a global maximizer/minimizer?

We study a special case of continuous function on closed and bounded set!


Definition 1.16 (Bounded)
Let S ⊆ Rn be a nonempty set. The set S is said to be bounded if there
is a positive number M such that ||x|| ≤ M ∀ x ∈ S.

Note: A set S is bounded if and only if there is a positive number M


b such
that |xi | ≤ M ∀ i = 1, 2, · · · , n ∀ x ∈ S.
b
Continuous function on closed and bounded set
When do we guarantee to have a global maximizer/minimizer?

We study a special case of continuous function on closed and bounded set!


Definition 1.16 (Bounded)
Let S ⊆ Rn be a nonempty set. The set S is said to be bounded if there
is a positive number M such that ||x|| ≤ M ∀ x ∈ S.

Note: A set S is bounded if and only if there is a positive number M


b such
that |xi | ≤ M ∀ i = 1, 2, · · · , n ∀ x ∈ S.
b

Example 1.17
(a) Intervals (a, b), [a, b), [a, b] and (a, b] are bounded in R.
(b) The set {x = (x1 , x2 , · · · , xn )T ∈ Rn | ai ≤ xi ≤ bi , i = 1, 2, · · · , n} is
bounded.
(c) The closed n-ball B̄(0, M) = {x ∈ Rn : kxk ≤ M} and the open n-ball
B(0, M) = {x ∈ Rn : kxk < M} are bounded.
Continuous function on compact set

A closed set may not be a bounded set, eg: S = {x ∈ R | x ≥ 0}


A bounded set may not be a closed set, eg: S = {x ∈ R | 0 < x < 1}.

We define a set with nice property:

Definition 1.18 (Compact)


A set S in Rn is said to be compact if it is closed and bounded.
Continuous function on compact set

A closed set may not be a bounded set, eg: S = {x ∈ R | x ≥ 0}


A bounded set may not be a closed set, eg: S = {x ∈ R | 0 < x < 1}.

We define a set with nice property:

Definition 1.18 (Compact)


A set S in Rn is said to be compact if it is closed and bounded.

It turns out we have a guarantee on the existence of global optimizers!

Theorem 1.19 (Weierstrass Theorem)


A continuous function on a nonempty compact set S ⊂ Rn has a global
maximum point and a global minimum point in S.
[Weierstrass Theorem]
A continuous function on a nonempty compact set S ⊂ Rn has a global maximum
point and a global minimum point in S.

Example 1.20

minimize f (x) := x12 − x22


subject to g (x) = x12 + x22 − 3 = 0.
The feasible set S = {x ∈ R2 | g (x) = x12 + x22 − 3 = 0} is closed and
bounded. The function f is continuous. By Weierstrass Theorem, f has a
global minimum and a global maximum on S.
Example 1.21
minimize f (x) := x12 − x22
subject to g (x) = 1 − x1 − x2 = 0.
The function f is continuous, but the feasible set
S = {x ∈ R2 | g (x) = 1 − x1 − x2 = 0} is closed but unbounded. Thus,
Weierstrass Theorem cannot be used directly to deduce whether there is a
global minimum.

In fact, the problem has no global minimum. For any α ∈ R, the point
[α; 1 − α] ∈ S, and
 α 
= lim α2 − (1 − α)2 = lim (2α − 1) = −∞.
 
lim f
α→−∞ 1−α α→−∞ α→−∞

You might also like