0% found this document useful (0 votes)
10 views46 pages

Lecture7[1]

The document provides an overview of optimization techniques relevant to AI, covering both unconstrained and constrained optimization methods, including linear regression, logistic regression, and support vector machines. It discusses mathematical concepts such as convexity and various optimization problems like the knapsack and max-flow problems. Additionally, it references various resources for further study in mathematical foundations for AI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views46 pages

Lecture7[1]

The document provides an overview of optimization techniques relevant to AI, covering both unconstrained and constrained optimization methods, including linear regression, logistic regression, and support vector machines. It discusses mathematical concepts such as convexity and various optimization problems like the knapsack and max-flow problems. Additionally, it references various resources for further study in mathematical foundations for AI.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CS303: Mathematical Foundations for AI

Introduction to Optimization
28 Jan 2025
Recap

• Recap (Linear Algebra)


▶ Vectors
▶ Matrices
▶ Eigen Decomposition
▶ SVD
▶ PCA
▶ Nonlinear Dimensionality Reduction
• Unconstrained vs Constrained Optimization
• Convex Optimization

1 / 28
References

• https://stanford.edu/∼boyd/cvxbook/bv cvxbook.pdf
• https://web.stanford.edu/class/msande310/310trialtext.pdf
• Chapter 5 (Vector Calculus)
• https://indrag49.github.io/Numerical-Optimization/index.html
• https://slds-lmu.github.io/website optimization/
• Essence of Calculus
• NPTEL Course

2 / 28
Unconstrained (Uni-variate and Discrete)

Maximizing profit
• How many units of goods to produce?
• Cost of producing x units is c( x )
• Profit from selling x units is p( x )
• x ∈ {0, 1, 2, . . .}
• Optimization problem to minimize cost

max c( x ) − p( x )
x

3 / 28
Unconstrained Optimization (Uni-variate and Continuous)

• Maximize profit
• Distance traveled is x, where x ∈ R

f ( x ) = 4x2 − 10x

• How to maximize profit?

4 / 28
Unconstrained Optimization (Multi-variate)

• Maximize profit
• Distance traveled
 is x1 and time taken
x
x2 where x = 1 ∈ R2
x2

f ( x ) = 4x12 − 10x1 − x23


= [4 0] x2 + [−10 0] x + [0 − 1] x3

• How to maximize profit?

5 / 28
Supervised Learning

Linear Regression as unconstrained


optimization x1 x2 y
• Size of house x1 1500 5 300, 000
2000 3 400, 000
• Distance from city center x2
1200 8 250, 000
• Price of house y 1800 4 350, 000
2200 2 450, 000

6 / 28
Supervised Learning

Linear Regression as unconstrained


optimization x1 x2 y
• Represent y = w1 x1 + w2 x2 + b 1500 5 300, 000
2000 3 400, 000
• Price as a linear function of x1 and x2
1200 8 250, 000
• “Best” linear function 1800 4 350, 000
1 1 2200 2 450, 000
n ∑ ∥y − xw∥2 = n ∑(yi − w1 xi1 − w2 xi2 − b)2
i i

• Minimize mean squared error

6 / 28
Supervised Learning

1
min
w,b n ∑(yi − w1 xi1 − w2 xi2 − b)2
i

6 / 28
Supervised Learning

Logistic Regression as unconstrained


optimization x1 x2 y
• Size of house x1 1500 5 1
2000 3 1
• Distance from city center x2
1200 8 0
• Sold or not y 1800 4 1
2200 2 0

7 / 28
Supervised Learning

Logistic Regression as unconstrained optimization


• Mean Squared Error? n1 ∥y − xw∥
• Predict the probabilities
▶ pi : probability of xi being 1
▶ (1 − pi ): probability of xi being 0
y
▶ likelihood: ∏i∈ N pi i (1 − pi )1−yi
▶ Log likelihood: ∑i∈ N yi log( pi ) + (1 − yi ) log(1 − pi )
• Can pi = xi w + b? where w ∈ R2 , b ∈ R?
• Probabilities are positive and should be between 0 and 1;
1
p i ( y = 1| x i ) = −( xi w+b)
1+ e

7 / 28
Supervised Learning

Logistic Regression as unconstrained


optimization
• Sigmoid Function
1
p i ( y = 1| x i ) = −( xi w+b)
1+ e
▶ xi w + b = 0, pi = ?0.5
▶ xi w + b ≥ 0, pi ≥ 0.5
▶ xi w + b < 0, pi < 0.5

7 / 28
Supervised Learning

Maximize the log-likelihood


1 1
max ∑ yi log( pi ) + (1 − yi ) log(1 − pi ), pi =
w,b n i 1 + e−(xi w+b)
7 / 28
Supervised Learning

Neural Networks as unconstrained optimization


• Same as regression or classification defined before
• Linear Case: ŷ = f ( x ) = xw + b
• Neural Network (non-linear): f ( x ) = σ(σ( xw1 + b1 )w2 + b2 )

8 / 28
Supervised Learning

Neural Network (non-linear): f ( x ) = σ(σ( xw1 + b1 )w2 + b2 )

1 1
 
w11 . . . w14
• w1 =  . .. .. 
 .. . . 
1
w31 . . . w34 1

• b1 = [b11 b21 b31 b41 ]


 2
w1
• w =  ... 
2 
, b2 scalar
w42
• σ: ReLU, sigmoid, tanh

8 / 28
Unsupervised Learning

Low Rank Approximations as unconstrained optimization

min ∥ A − UV T ∥
U,V

9 / 28
Unconstrained Optimization

min f ( x )
x

• x could be continuous or discrete, univariate or multi-variate


• f (·) could be linear or non-linear

10 / 28
Constrained Optimization
Knapsack Problem
maxxi v T x s.t. w T x ≤ W
• n items
 
v1
• Value of item i: vi , v =  ... 
 

vn
 
w1
•  .. 
Weight of item i: wi , w =  . 
wn
• Maximize value under weight constraints
W  
x1
•  .. 
xi ∈ {0, 1} inclusion of i, x =  . 
xn
11 / 28
Max-Flow Problem

10
s a
15
5 5

b t
Given a network G = (V, E, s, t, c) 10
• V: set of vertices • Flow at every edge
• E: set of edges ▶ f sa ≤ 10, f sb ≤ 5, f ab ≤ 5, f bq ≤ 10
etc
• s: source
• At node a, inflow = outflow
• t: sink (target) ▶ f sa = f ab + f at
• c: cost for every edge • At node b, inflow = outflow
▶ f sb + f ab = f bt

12 / 28
Max-Flow Problem

max ( f sa + f sb )
f

Subject to:

0 ≤ f sa ≤ 10, 0 ≤ f sb ≤ 5, 0 ≤ f at ≤ 15, 0 ≤ f bt ≤ 10, 0 ≤ f ab ≤ 5

f sa = f at + f ab

f sb = f bt + f ab

12 / 28
Max-Flow Problem

max ∑ xsj
j ∈V

Subject to:
∑ x ji = ∑ xij , ∀i ∈ V \ {s, t}
j∈V; ji ∈ E j∈V; ij∈ E

0 ≤ xij ≤ cij , ∀(i, j) ∈ E

12 / 28
Supervised Learning

Regularization as constrained optimization

13 / 28
Supervised Learning
Regularization as constrained optimization
• f ( x ) = w1 x18 + w2 x17 + . . . + b
• f ( x ) = 0.x18 + 0.x17 + . . . + w1 x3 + w2 x2 + w3 x + b (Sparser)

14 / 28
Supervised Learning

Lasso regression adds a penalty on the L1 norm of the coefficients. The regularized
optimization problem is:
!
m
min
w
∑ ( y i − x i w )2 subject to ∥ w ∥1 ≤ λ
i =1

where:
• ∥w∥1 = ∑nj=1 |w j | is the L1 norm of the weight vector w,
• λ is the regularization parameter.

15 / 28
Supervised Learning

Support Vector Machine as constrained optimization

16 / 28
Supervised Learning

Support Vector Machine as constrained optimization


• Classification while maximizing margin
• margin: distance from the decision boundary to the closest point

1
min ∥w∥2
w,b 2

yi (w T xi + b) ≥ 1, ∀i = 1, 2, . . . , n

17 / 28
Supervised Learning

Margin

17 / 28
Unsupervised Learning
K-Means clustering as unconstrained optimization

18 / 28
Unsupervised Learning

K-Means Clustering as constrained optimization


• Given k number of clusters
• Identify cluster centers c1 , . . . , ck
• Assign every point to a unique cluster centers
• Minimize the distance between the points and cluster center

18 / 28
Unsupervised Learning
The objective function is:
n K
min ∑ ∑ sik ∥xi − ck ∥2
C,S
i =1 k =1

where:
• C = {c1 , c2 , . . . , cK } clusters centers
• S = {sik }, where sik = 1 if point xi is assigned to cluster k, and sik = 0 otherwise,
• Cluster Assignment: Each data point must be assigned to exactly one cluster:
K
∑ sik = 1, ∀i = 1, 2, . . . , n
k =1

n K K
min ∑ ∑ sik ∥xi − ck ∥2 subject to ∑ sik = 1, sik ∈ {0, 1}
C,S
i =1 k =1 k =1

18 / 28
Constrained Optimization

min f (x)
x
s.t. gi (x) ≤ 0, i = 1, 2, . . . , m
h j (x) = 0, j = 1, 2, . . . , p
x∈S

• f (x) objective function to minimize (or maximize),


• x ∈ S decision variables,
• gi (x) ≤ 0 inequality constraints,
• h j (x) = 0 equality constraints,

19 / 28
Linear Programming

• Special case where f , g and h are linear in x


• Decision variables are non-negative, real numbers
max c T x
x
s.t. Ax ≤ b
x≥0
E.g. Max flow problem

20 / 28
Quadratic Programming

• Special case where f is quadratic g and h are linear in x


• Decision variables are non-negative, real numbers
1
min x T Qx + c T x
x 2
s.t. Ax ≥ b
x≥0
E.g. SVM

21 / 28
Convexity

Convexity is good to have for optimization!


What is Convexity?

22 / 28
Convex Sets

A set C is convex if
• for all x1 , x2 in C
• the line segment joining x1 and x2 must be within C
▶ θx1 + (1 − θ ) x2 ∈ C, for any θ ∈ [0, 1]

22 / 28
Convex Sets

22 / 28
Convex Sets

Convex, Non-convex, Non-convex

22 / 28
Half-Spaces
• Hyperplane S = { x |wT x = b} is convex
▶ x1 , x2 ∈ S, θ ∈ [0, 1]
▶ Take a point x = θx1 + (1 − θ ) x2
▶ w T ( x ) = θw T x1 + (1 − θ )w T x2 = θb + (1 − θ )b = b
• Half-spaces S = { x | aT x ≤ b}, a ̸= 0 is convex

23 / 28
Norm Balls

S = { x |∥ x − xc ∥ ≤ r }

• x1 and x2 ∈ S, θ ∈ [0, 1]
• x = θx1 + (1 − θ ) x2
• ∥ x − xc ∥ = ∥θx1 + (1 − θ ) x2 − xr ∥
= ∥θ ( x1 − xr ) + (1 − θ )( x2 − xr )∥
≤ θ ∥ x1 − xr ∥ + (1 − θ )∥ x2 − xr ∥ ≤ r

24 / 28
Polyhedra

Solution set of finite number of linear equalities or inequalities

S = { x | a Tj x ≤ b j , j = 1 . . . m, ciT x = di , i = 1, . . . , p}

• Intersection of finitely many convex sets is convex


• Polyhedron is intersection of finitely many half-spaces
• What about union of convex sets?

25 / 28
Convex Sets

Are the following sets convex?


• A slab, i.e., a set of form { x ∈ Rn | α ≤ aT x ≤ β}
• The set of points closer to a given point than a given set i.e.,

{ x | ∥ x − x0 ∥2 ≤ ∥ x − y∥2 for all y ∈ S} S ⊆ Rn

• The set of points closer to one set than a given set i.e.,

{ x | dist( x, T ) ≤ dist( x, S)} S, T ⊆ Rn

25 / 28
Convex Functions

26 / 28
Convex Functions

A function f : Rn → R is called convex if its domain dom( f ) is a convex set and for
all x, y ∈ dom( f ) and λ ∈ [0, 1], the following inequality holds:

f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y).

Also, f ” ( x ) ≥ 0 when x ∈ R

26 / 28
Convex Functions

Which of the following are convex functions?


• Exponential eαx for any α ∈ R
• Logarithm log x
• Mean Squared Error?
• Log Loss?

26 / 28
Convex Optimization

A convex optimization problem is an optimization problem of the form:

min f (x)
x
s.t. gi (x) ≤ 0, i = 1, . . . , m,
h j (x) = 0, j = 1, . . . , p,

• f : Rn → R is the convex objective function,


• gi : Rn → R are convex functions defining the inequality constraints,
• h j : Rn → R are affine functions defining the equality constraints,
• The feasible set S = {x ∈ Rn | gi (x) ≤ 0, h j (x) = 0} is a convex set.

27 / 28
Categorization

28 / 28

You might also like