0% found this document useful (0 votes)
12 views24 pages

SVM EXAMPLE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views24 pages

SVM EXAMPLE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

18-660: Numerical Methods for

Engineering Design and Optimization


Xin Li
Department of ECE
Carnegie Mellon University
Pittsburgh, PA 15213

Slide 1
Overview
 Classification
 Support vector machine
 Regularization

Slide 2
Classification
 Predict categorical output (i.e., two or multiple classes) from
input attributes (i.e., features)

 Example: two-class classification

Class A
≥ 0 (Class A)

Feature x2
f (X ) = W X + C
T

< 0 (Class B )

Class B
Feature x1

Slide 3
Classification
 Classification vs. regression

Input Prediction of
Regression
attributes real-valued output

Input Prediction of
Classification
attributes categorical output

Slide 4
Classification Examples
 Identify hand-written digits from US zip codes

Bishop, Pattern recognition and machine learning, 2007

Slide 5
Classification Examples
 Identify geometrical structure from oil flow data

Blue: geometrical structure 1


Green: geometrical structure 2
Red: geometrical structure 3

Bishop, Pattern recognition and machine learning, 2007

Slide 6
Support Vector Machine (SVM)
 Support vector machine (SVM) is a popular algorithm used for
many classification problems
 Key idea: maximize classification margin (immune to noise)

 Two-class linear support vector machine

≥ 0 (Class A)
Class A
f (X ) = W X + C
T

< 0 (Class B )
Feature x2

Determine W and C with maximum margin

Class B Margin

Feature x1

Slide 7
Margin Calculation
 To maximize margin, we must first represent margin as a
function of W and C
≥ 0 (Class A)
f (X ) = W T X + C 
WT X +C = 0 Class A < 0 (Class B )
Plus plane

Minus plane
Plus plane WT X +C =1
Class B Minus plane W T X + C = −1
Margin
(Right-hand side can be normalized to ±1)
Support vectors

Slide 8
Margin Calculation
 W is perpendicular to plus/minus planes
Plus plane WT X +C =1
Minus plane W T X + C = −1

x2
A–B WT A+C =1
A WTB +C =1
W

B W T ⋅ (A − B) = 0
WT X +C =1 W is perpendicular to (A – B)

0 x1

Slide 9
Margin Calculation
 Margin equals to the distance between Xm and Xp

X p = X m + λW Margin = X p − X m = λW 2
2

Find λ to determine margin

x2
Xp
WT X +C =1
λW W
Xm

W T X + C = −1
0 x1

Slide 10
Margin Calculation

X p = X m + λW
WT X p +C =1 W T ⋅ ( X p − X m ) = λW T W = 2
W T X m + C = −1

x2
Xp
WT X +C =1
λW W
Xm

W T X + C = −1
0 x1

Slide 11
Margin Calculation

2 2
λW W = 2
T
λ= T Margin = λW 2
= λ ⋅ W TW =
W W W TW

Maximizing margin implies minimizing ||W||2

x2
Xp
WT X +C =1
λW W
Xm

W T X + C = −1
0 x1

Slide 12
Mathematical Formulation
 Start from a set of training samples
( X i , yi ) (i = 1,2,, N )
Xi: input feature of i-th sampling point
yi: output label of i-th sampling point
Class A → yi = 1
Class B → yi = −1

Class A:
Class A W T X i + C ≥ 1 yi = 1

W X +C =1
T ( )
yi ⋅ W T X i + C ≥ 1

W T X + C = −1 Class B:
Class B W T X i + C ≤ −1 yi = −1
( )
yi ⋅ W T X i + C ≥ 1

Slide 13
Mathematical Formulation
 Formulate a convex optimization problem
2
max Maximize margin
W ,C T
W W
S.T. ( )
yi ⋅ W T X i + C ≥ 1 All data samples are in the right class
(i = 1,2,, N )

min W T W Convex quadratic function

( )
W ,C
S.T. yi ⋅ W T X i + C ≥ 1 Linear constraints
(i = 1,2,, N )
(Convex optimization)

Slide 14
A Simple SVM Example
 Two training samples
 Class A: x1 = 1, x2 = 1 and y = 1 x2
 Class B: x1 = −1, x2 = −1 and y = −1

Class A
≥ 0 (Class A)
f ( X ) = w1 x1 + w2 x2 + C 
< 0 (Class B ) x1
Class B
Solve w1, w2 and C to determine classifier

Slide 15
A Simple SVM Example
 Two training samples
 Class A: x1 = 1, x2 = 1 and y = 1 x2
 Class B: x1 = −1, x2 = −1 and y = −1

Class A
T
min W W
( )
W ,C
S.T. yi ⋅ W T X i + C ≥ 1 x1
(i = 1,2,, N ) Class B

min w12 + w22


W ,C
S.T. 1 ⋅ (w1 + w2 + C ) ≥ 1
− 1 ⋅ (− w1 − w2 + C ) ≥ 1

Slide 16
A Simple SVM Example
min w12 + w22
w2
W ,C
S.T. 1 ⋅ (w1 + w2 + C ) ≥ 1
− 1 ⋅ (− w1 − w2 + C ) ≥ 1
w1 + w2 ≥ 1 + C

min w12 + w22


W ,C
S.T. w1 + w2 ≥ 1 − C
w1 + w2 ≥ 1 + C w1

w1 = w2 = 0.5
min w12 + w22 C =0
W ,C
S.T. w1 + w2 ≥ 1 + C

Slide 17
A Simple SVM Example
 Two training samples
 Class A: x1 = 1, x2 = 1 and y = 1 x2
 Class B: x1 = −1, x2 = −1 and y = −1

Class A
w1 = w2 = 0.5
C =0 x1
Class B

(Class A) 0.5 x1 + 0.5 x2 = 0


≥ 0
f ( X ) = 0.5 x1 + 0.5 x2 
< 0 (Class B )

Slide 18
Support Vector Machine with Noise
 In practice, training samples may contain noise or are not
linearly separable
min W T W
( )
W ,C
S.T. yi ⋅ W T X i + C ≥ 1
(i = 1,2,, N ) Class A

Feature x2
(No feasible solution)

Parameter determined Margin


Class B
by cross validation
min
W ,C ,ξ
∑ ξi + λ ⋅W TW Feature x1
S.T. ( )
yi ⋅ W T X i + C ≥ 1 − ξ i
ξi ≥ 0 Error of i-th
(i = 1,2,, N ) training sample

Slide 19
Support Vector Machine with Noise
 Can be solved by convex programming
 Cost : sum of two convex functions
 Constraints: linear and hence convex

Linear Quadratic
(convex) (convex)

min
W ,C ,ξ
∑ i
ξ + λ ⋅ W T
W Convex
S.T. ( )
yi ⋅ W T X i + C ≥ 1 − ξ i
ξi ≥ 0 Linear
(i = 1,2,, N )
(Convex optimization)

Slide 20
Regularization
 Regression vs. classification Regularization

min
α
A ⋅α − B 2 + λ ⋅ α
2 2
2
min
W ,C ,ξ
∑ξ i + λ ⋅W TW

Regression
S.T. ( )
yi ⋅ W T X i + C ≥ 1 − ξ i
ξi ≥ 0
(i = 1,2,, N )
Support vector machine

Other regularization forms can also be used for support


vector machine

Slide 21
Regularization
 L1-norm regularization is used to find a sparse solution of W

L1-norm regularization

min
W ,C ,ξ
∑ i
ξ + λ ⋅ W T
W min
W ,C ,ξ
∑ξ i +λ⋅ W 1

S.T. ( )
yi ⋅ W T X i + C ≥ 1 − ξ i S.T. ( )
yi ⋅ W T X i + C ≥ 1 − ξ i
ξi ≥ 0 ξi ≥ 0
(i = 1,2,, N ) (i = 1,2,, N )

Important for feature selection

Slide 22
Regularization
 Feature selection
≥ 0 (Class A)
f (X ) = W X + C
T

< 0 (Class B )

 x1 
x 
 2
[0 0 × 0 ×]⋅  x3 
 
WT
 x4  Important features
 x5 
X

Slide 23
Summary
 Classification
 Support vector machine
 Regularization

Slide 24

You might also like