0% found this document useful (0 votes)

92 views5 pages

Machine Learning-Kernel Methods

This document summarizes a lecture on support vector machines (SVMs). It discusses three key topics: 1) The separable case, where training examples are linearly separable. The SVM aims to maximize the margin between classes by minimizing a regularization penalty. 2) The non-separable case, where examples are not perfectly separable. The optimization problem is modified to allow for some misclassifications by adding penalty terms. 3) A comparison of SVMs to logistic regression, noting they both aim to minimize a regularized empirical loss function, though SVMs focus on large-margin separation while logistic regression models class probabilities.

Uploaded by

aviral1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views5 pages

Machine Learning-Kernel Methods

Uploaded by

aviral1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine learning: lecture 7 Topics

Tommi S. Jaakkola • Support vector machines

MIT CSAIL – separable case, formulation, margin
[email protected] – non-separable case, penalties, and logistic regression
– dual solution, kernels
– examples, properties

Tommi Jaakkola, MIT CSAIL 2

Support vector machine (SVM) SVM: separable case

(d
• When the training examples are linearly separable we can • We minimize "w1"2/2 = 2
i=1 wi /2 subject to
maximize a geometric notion of margin (distance to the
yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n
boundary) by minimizing the regularization penalty
d margin = 1/!ŵ1! o
! o o o
x
"w1"2/2 = wi2/2 x o f (x; ŵ) = ŵ0 + xT ŵ1
o
i=1 x o
o
o x
x o o o x ŵ1 o
o o ooo o o xx xxx
subject to the classification constraints x
o
x x x
x− x+ w1
x o x
o + −
x |x − x |/2
yi[w0 + xTi w1] −1≥0 x
x
o x
x x
for i = 1, . . . , n. x • The resulting margin and the “slope” "ŵ1" are inversely
x
x related
• The solution is defined only on the basis of a subset of
examples or “support vectors”

Tommi Jaakkola, MIT CSAIL 3 Tommi Jaakkola, MIT CSAIL 4

SVM: non-separable case SVM: non-separable case cont’d

• When the examples are not linearly separable we can modify • We can also write the SVM optimization problem more
the optimization problem slightly to add a penalty for compactly as
violating the classification constraints: ξi
n "
! & #$ '+%
We minimize
n C 1 − yi [w0 + xi w1] +"w1"2/2
T
!
"w1"2/2 + C ξi x o
o
o o
i=1

i=1 x x
o
o where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the
+
x o
subject to relaxed classification x
o positive part).
x ŵ1 o
constraints x x x
x
yi [w0 + xTi w1] − 1 + ξi ≥ 0, x
o
x

for i = 1, . . . , n. Here ξi ≥ 0 are

called “slack” variables.

Tommi Jaakkola, MIT CSAIL 5 Tommi Jaakkola, MIT CSAIL 6

SVM: non-separable case cont’d SVM vs logistic regression
• We can also write the SVM optimization problem more • When viewed from the point of view of regularized empirical
compactly as loss minimization, SVM and logistic regression appear quite
ξi similar:
n "
! & #$ '+%
1 !& '+
n
C 1 − yi [w0 + xi w1] +"w1"2/2
T
SVM: 1 − yi [w0 + xTi w1] + λ"w1"2/2
i=1 n i=1

where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the

+ − log P (y |x,w)
i
n " & #$ '%
1!
positive part). Logistic: − log g yi [w0 + xTi w1] +λ"w1"2/2
n i=1
• This is equivalent to regularized empirical loss minimization
where g(z) = (1 + exp(−z))−1 is the logistic function.
1 !& '+
n
1 − yi [w0 + xTi w1] + λ"w1"2/2 (Note that we have transformed the problem maximizing the
n i=1
penalized log-likelihood into minimizing negative penalized
where λ = 1/nC is the regularization parameter. log-likelihood.)

Tommi Jaakkola, MIT CSAIL 7 Tommi Jaakkola, MIT CSAIL 8

SVM vs logistic regression cont’d SVM: solution, Lagrange multipliers

• The difference comes from how we penalize “errors”: • Back to the separable case: how do we solve
&" z
1!
n #$ %' min "w1"2/2 subject to
Both: Loss yi [w0 + xTi w1] + λ"w1"2/2
n i=1 yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n
5
SVM loss
• SVM: 4.5 LR loss

Loss(z) = (1 − z)+ 3.5

3
loss

2.5

• Regularized logistic reg: 2

1.5

Loss(z) = log(1 + exp(−z)) 1

0.5

0
!4 !3 !2 !1 0 1 2 3 4
z

Tommi Jaakkola, MIT CSAIL 9 Tommi Jaakkola, MIT CSAIL 10

SVM: solution, Lagrange multipliers SVM: solution, Lagrange multipliers

• Back to the separable case: how do we solve • Back to the separable case: how do we solve

min "w1"2/2 subject to min "w1"2/2 subject to

yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n

• Let start by representing the constraints as losses • Let start by representing the constraints as losses
+ +
) * 0, yi [w0 + xTi w1] − 1 ≥ 0 ) * 0, yi [w0 + xTi w1] − 1 ≥ 0
max α 1 − yi [w0 + xTi w1] = max α 1 − yi [w0 + xTi w1] =
α≥0 ∞, otherwise α≥0 ∞, otherwise

and rewrite the minimization problem in terms of these

, n
! ) *-
min "w1"2/2 + max αi 1 − yi [w0 + xTi w1]
w αi≥0
i=1

Tommi Jaakkola, MIT CSAIL 11 Tommi Jaakkola, MIT CSAIL 12

SVM: solution, Lagrange multipliers SVM solution cont’d
• Back to the separable case: how do we solve • We can then swap ’max’ and ’min’:
min "w1"2/2 subject to , n
! ) *-
min max "w1"2/2 + αi 1 − yi [w0 + xTi w1]
yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n w {αi≥0}
i=1
, n
! ) *-
• Let start by representing the constraints as losses ?
= max min "w1"2/2 + αi 1 − yi [w0 + xTi w1]
+ {αi≥0} w
) * 0, yi [w0 + xTi w1] − 1 ≥ 0 $ i=1 %" #
max α 1 − yi [w0 + xTi w1] = J(w;α)
α≥0 ∞, otherwise
As a result we have to be able to minimize J(w; α) with
and rewrite the minimization problem in terms of these
respect to parameters w for any fixed setting of the Lagrange
, !n
) *- multipliers αi ≥ 0.
min "w1"2/2 + max αi 1 − yi [w0 + xTi w1]
w αi≥0
i=1
, n
! ) *-
= min max "w1"2/2 + αi 1 − yi [w0 + xTi w1]
w {αi≥0}
i=1

Tommi Jaakkola, MIT CSAIL 13 Tommi Jaakkola, MIT CSAIL 14

SVM solution cont’d SVM solution cont’d

• We can then swap ’max’ and ’min’: • We can then substitute the solution
! n
, !n
) *- ∂
min max "w1"2/2 + αi 1 − yi [w0 + xTi w1] J(w; α) = w1 − αiyixi = 0
w {αi≥0}
∂w1 i=1
i=1
! n
, n
! ) *- ∂
?
= max min "w1" /2 +2
αi 1 − yi [w0 + xTi w1] J(w; α) = − αiyi = 0
{αi≥0} w ∂w0 i=1
$ i=1 %" #
J(w;α) back into the objective and get (after some algebra):
, !n
) *-
We can find the optimal ŵ as a function of {αi} by setting
max "ŵ1"2/2 + αi 1 − yi [ŵ0 + xTi ŵ1]
the derivatives to zero: P
αi ≥ 0
i=1
n αiyi = 0
∂ ! i

J(w; α) = w1 − αiyixi = 0 ,!
n
1 !
n -
∂w1 i=1 = max αi − yiyj αiαj (xTi xj )
n αi ≥ 0 2 i,j=1
∂ ! P
αiyi = 0
i=1
J(w; α) = − αiyi = 0 i

∂w0 i=1

Tommi Jaakkola, MIT CSAIL 15 Tommi Jaakkola, MIT CSAIL 16

SVM solution: summary SVM solution: summary

• We can find the optimal setting of the Lagrange multipliers • We can find the optimal setting of the Lagrange multipliers
αi by maximizing αi by maximizing
n
! n n n
1 ! ! 1 !
αi − yiyj αiαj (xTi xj ) αi − yiyj αiαj (xTi xj )
i=1
2 i,j=1 i=1
2 i,j=1
( (
subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding
to “support vectors” will be non-zero. to “support vectors” will be non-zero.
• We can make predictions on any new example x according
to the sign of the discriminant function
n
)! !
ŵ0 + xT ŵ1= ŵ0 + xT α̂iyixi) = ŵ0 + α̂iyi(xT xi)
i=1 i∈SV

Tommi Jaakkola, MIT CSAIL 17 Tommi Jaakkola, MIT CSAIL 18

SVM solution: summary SVM solution: summary
• We can find the optimal setting of the Lagrange multipliers • We can find the optimal setting of the Lagrange multipliers
αi by maximizing αi by maximizing
n
! n n n
1 ! ! 1 !
αi − yiyj αiαj (xTi xj ) αi − yiyj αiαj (xTi xj )
i=1
2 i,j=1 i=1
2 i,j=1
( (
subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding
to “support vectors” will be non-zero. to “support vectors” will be non-zero.
• We can make predictions on any new example x according • We can make predictions on any new example x according
to the sign of the discriminant function to the sign of the discriminant function
n n
)! ! )! !
ŵ0 + xT ŵ1 = ŵ0 + xT α̂iyixi)= ŵ0 + α̂iyi(xT xi) ŵ0 + xT ŵ1 = ŵ0 + xT α̂iyixi) = ŵ0 + α̂iyi(xT xi)
i=1 i∈SV i=1 i∈SV

Tommi Jaakkola, MIT CSAIL 19 Tommi Jaakkola, MIT CSAIL 20

Non-linear classifier Non-linear classifier

• So far our classifier can make only linear separations
x x
• As with linear regression and logistic regression models, we x x
x

can easily obtain a non-linear classifier by first mapping our x

x
x
x

examples x = [x1 x2] into longer feature vectors φ(x)

x
x x
x
√ √ √ x
x
x x

φ(x) = [x21 x22 2x1x2 2x1 2x2 1]

Linear separator in the feature φ-space
and then applying the linear classifier to the new feature
vectors φ(x) x
x x
x x x
x
x
x
x
x
x x
x
x
x

Non-linear separator in the original x-space

Tommi Jaakkola, MIT CSAIL 21 Tommi Jaakkola, MIT CSAIL 22

Feature mapping and kernels Examples of kernel functions

• Let’s look at the previous example in a bit more detail • Linear kernel
√ √ √
x → φ(x) = [x21 x22 2x1x2 2x1 2x2 1] K(x, x%) = (xT x%)

• The SVM classifier deals only with inner products of examples • Polynomial kernel
(or feature vectors). In this example, ) *p
K(x, x%) = 1 + (xT x%)
φ(x) φ(x ) =
T %
x21x%2
1 + x22x%2
2 + 2x1x2x1x2
% %
+ 2x1x%1 + 2x2x%2 +1 where p = 2, 3, . . .. To get the feature vectors we
= (1 + x1x%1 + x2x%2)2 concatenate all up to pth order polynomial terms of the
) T % 2
* components of x (weighted appropriately)
= 1 + (x x )
• Radial basis kernel
so the inner products can be evaluated without ever explicitly . /
1
constructing the feature vectors φ(x)! K(x, x%) = exp − "x − x%"2
2
) *2
• K(x, x%) = 1 + (xT x%) is a kernel function (inner product In this case the feature space is infinite dimensional function
in the feature space) space (use of the kernel results in a non-parametric classifier).

Tommi Jaakkola, MIT CSAIL 23 Tommi Jaakkola, MIT CSAIL 24

SVM examples
2 2

1.5 1.5

1 1

0.5 0.5

0 0

!0.5 !0.5

!1 !1
!1.5 !1 !0.5 0 0.5 1 1.5 2 !1.5 !1 !0.5 0 0.5 1 1.5 2

linear 2nd order polynomial

2 2

1.5 1.5

1 1

0.5 0.5

0 0

!0.5 !0.5

!1 !1
!1.5 !1 !0.5 0 0.5 1 1.5 2 !1.5 !1 !0.5 0 0.5 1 1.5 2

4th order polynomial 8th order polynomial

Tommi Jaakkola, MIT CSAIL 25

SVM Overview and Applications
No ratings yet
SVM Overview and Applications
33 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
20 SVM
No ratings yet
20 SVM
35 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
37 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
History and Basics of Support Vector Machines
No ratings yet
History and Basics of Support Vector Machines
35 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
SVM
No ratings yet
SVM
21 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
55 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
SVM: Strengths, Examples, and Code
No ratings yet
SVM: Strengths, Examples, and Code
35 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
ML SVM Lect10 11
No ratings yet
ML SVM Lect10 11
27 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
45 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
10 SVM
No ratings yet
10 SVM
77 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
19 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
SVMs for Machine Learning Students
No ratings yet
SVMs for Machine Learning Students
36 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machines in Machine Learning
No ratings yet
Support Vector Machines in Machine Learning
11 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Lecture 2.7
No ratings yet
Lecture 2.7
18 pages
10 SVM
No ratings yet
10 SVM
23 pages
Review Problems For Quizzes, Midterms, and The Final
No ratings yet
Review Problems For Quizzes, Midterms, and The Final
21 pages
Relations and Functions - 11 - Ws - 1 - Question Bank - 22-23
No ratings yet
Relations and Functions - 11 - Ws - 1 - Question Bank - 22-23
8 pages
Algebra (Part 1)
No ratings yet
Algebra (Part 1)
26 pages
Numerical Solutions to TISE in QM
No ratings yet
Numerical Solutions to TISE in QM
5 pages
Chapter 6 Graphs of Trigonometric Functions
No ratings yet
Chapter 6 Graphs of Trigonometric Functions
24 pages
Geometry!!!
No ratings yet
Geometry!!!
19 pages
A-Level Simultaneous Equations Guide
No ratings yet
A-Level Simultaneous Equations Guide
5 pages
Stiffness Method in Structural Analysis
No ratings yet
Stiffness Method in Structural Analysis
3 pages
Mathematics Circle
No ratings yet
Mathematics Circle
9 pages
IJSO DPP 6 Maths
No ratings yet
IJSO DPP 6 Maths
3 pages
Odd and Even Functions Homework
No ratings yet
Odd and Even Functions Homework
33 pages
Understanding Cyclic Groups in Algebra
No ratings yet
Understanding Cyclic Groups in Algebra
26 pages
Linear Algebra: Eigenvalues & Diagonalization
No ratings yet
Linear Algebra: Eigenvalues & Diagonalization
36 pages
Tensors Lecture Problem Sheet 2023
No ratings yet
Tensors Lecture Problem Sheet 2023
6 pages
Dynamic Programming Techniques
No ratings yet
Dynamic Programming Techniques
131 pages
(Ebook) A First Course in Differential Equations With Modeling Applications by Dennis G. Zill ISBN 9781111827052, 1111827052 Download
No ratings yet
(Ebook) A First Course in Differential Equations With Modeling Applications by Dennis G. Zill ISBN 9781111827052, 1111827052 Download
110 pages
Sine and Cosine Addition and Subtraction Formulas
No ratings yet
Sine and Cosine Addition and Subtraction Formulas
3 pages
MIT Int Bee 2025 Mock Problems
100% (1)
MIT Int Bee 2025 Mock Problems
6 pages
PMATH 340 1101 Midterm Solutions
No ratings yet
PMATH 340 1101 Midterm Solutions
7 pages
Math1081 Topic1 Notes
0% (1)
Math1081 Topic1 Notes
27 pages
LPP, Test - 2 (WBSSC (SLST) )
No ratings yet
LPP, Test - 2 (WBSSC (SLST) )
6 pages
Basic Mathematics Sample Notes For NEET 2020 Preparation
No ratings yet
Basic Mathematics Sample Notes For NEET 2020 Preparation
5 pages
Binomial Theorem (Question Paper)
No ratings yet
Binomial Theorem (Question Paper)
6 pages
Multivariable Calculus: 1 Typical Operations
100% (1)
Multivariable Calculus: 1 Typical Operations
3 pages
Algebra1!7!24 Lesson Curated Practice Problem Set
No ratings yet
Algebra1!7!24 Lesson Curated Practice Problem Set
4 pages
Limits and Differentiation Guide
No ratings yet
Limits and Differentiation Guide
17 pages
Quantum Mechanics
No ratings yet
Quantum Mechanics
29 pages
Maths Project
No ratings yet
Maths Project
25 pages
Mock XII Maths
No ratings yet
Mock XII Maths
6 pages
WTW238 2021 Assessments
No ratings yet
WTW238 2021 Assessments
68 pages

Machine Learning-Kernel Methods

Uploaded by

Machine Learning-Kernel Methods

Uploaded by

Machine learning: lecture 7 Topics

Tommi S. Jaakkola • Support vector machines

Tommi Jaakkola, MIT CSAIL 2

Support vector machine (SVM) SVM: separable case

Tommi Jaakkola, MIT CSAIL 3 Tommi Jaakkola, MIT CSAIL 4

SVM: non-separable case SVM: non-separable case cont’d

for i = 1, . . . , n. Here ξi ≥ 0 are

Tommi Jaakkola, MIT CSAIL 5 Tommi Jaakkola, MIT CSAIL 6

where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the

Tommi Jaakkola, MIT CSAIL 7 Tommi Jaakkola, MIT CSAIL 8

SVM vs logistic regression cont’d SVM: solution, Lagrange multipliers

Loss(z) = (1 − z)+ 3.5

• Regularized logistic reg: 2

Loss(z) = log(1 + exp(−z)) 1

Tommi Jaakkola, MIT CSAIL 9 Tommi Jaakkola, MIT CSAIL 10

SVM: solution, Lagrange multipliers SVM: solution, Lagrange multipliers

min "w1"2/2 subject to min "w1"2/2 subject to

and rewrite the minimization problem in terms of these

Tommi Jaakkola, MIT CSAIL 11 Tommi Jaakkola, MIT CSAIL 12

Tommi Jaakkola, MIT CSAIL 13 Tommi Jaakkola, MIT CSAIL 14

SVM solution cont’d SVM solution cont’d

Tommi Jaakkola, MIT CSAIL 15 Tommi Jaakkola, MIT CSAIL 16

SVM solution: summary SVM solution: summary

Tommi Jaakkola, MIT CSAIL 17 Tommi Jaakkola, MIT CSAIL 18

Tommi Jaakkola, MIT CSAIL 19 Tommi Jaakkola, MIT CSAIL 20

Non-linear classifier Non-linear classifier

can easily obtain a non-linear classifier by first mapping our x

examples x = [x1 x2] into longer feature vectors φ(x)

φ(x) = [x21 x22 2x1x2 2x1 2x2 1]

Non-linear separator in the original x-space

Tommi Jaakkola, MIT CSAIL 21 Tommi Jaakkola, MIT CSAIL 22

Feature mapping and kernels Examples of kernel functions

Tommi Jaakkola, MIT CSAIL 23 Tommi Jaakkola, MIT CSAIL 24

linear 2nd order polynomial

4th order polynomial 8th order polynomial

Tommi Jaakkola, MIT CSAIL 25

You might also like