0% found this document useful (0 votes)

74 views

Support Vector Machines

The document summarizes support vector machines (SVMs). SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples. This optimal hyperplane is found by solving a convex quadratic programming optimization problem that minimizes the hyperplane's scale while ensuring examples of different classes lie on separate sides of the hyperplane. Introducing slack variables allows for misclassified examples and improves the model's ability to generalize to new data.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Support Vector Machines

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Support Vector Machines

These notes are based on Mohri, Rostamizadeh and Talwalkar (2012).

Some Convex Optimization. Consider

min f (x) subject to gi (x) ≤ 0 i = 1, . . . , m.

Define the Lagrangian X

L = f (x) + αj gj (x).
j

The dual function is define by

F (α) = inf L.
x

A central result in convex optimization is that the original problem can be solved by maxi-
mizing F subject to αi ≥ 0 and αi g(xi ) = 0.

Hyperplanes and SVM’s. Suppose we have data (X1 , Y1 ), . . . , (Xn , Yn ) that can be sep-
arated by a hyperplane. Let b + wT x = 0 be such a hyperplane. Note that Yi (b + XiT w) ≥ 1
for all i. Any re-scaled version of the hyper-plane is the same classifier. So re-scale the
hyper-plane so that
min |b + wT Xi | = 1.
i

If x0 is any point, then using some simple algebra, we find that the distance to the hyperplane
is
|b + wT x0 |
.
||w||
We call the distance to the closest point, the margin ρ. Since | mini |b + wT Xi | = 1, we see
that
|wT Xi + b| 1
ρ = min = .
i ||w|| ||w||

The support vector machine (SVM) is the hyperplane that maximized the margin. But
maximizing 1/||w|| is the same is minimizing ||w|| which is the same as minimizing (1/2)||w||2 .
So finding the SVM corresponds to:
1
minw,b ||w||2 subject to Yi (wT Xi + b) ≥ 1 i = 1, . . . , n.
2

The Lagrangian for this problem is

1 X
L = ||w||2 − αi [Yi (wT Xi + b) − 1]
2 i

1
where αi ≥ 0 and αi [Yi (wT Xi + b) − 1] = 0. If we set ∇w L = 0 and ∇b L = 0 we get the two
equations
X
w= αi Yi Xi = 0
i
X
0= αi Yi .
i
P P
If we insert w = i αi Yi Xi into L and use the fact that i αi Yi = 0 we get
X 1X
L= αi − αi αj Yi Yj (XiT Xj ).
i
2 i,j

This leads to the optimization

X 1X
maximize αi − αi αj Yi Yj (XiT Xj )
i
2 i,j

subject to αi ≥ 0 and αi [Yi (wT Xi + b) − 1] = 0. Note two importnat facts: (i) this is a
quadratic program so it can be solved quickly and (ii) we don’t need the Xi ’s we only need
the inner products XiT Xj .

Consider the constraint αi [Yi (wT Xi + b) − 1] = 0. If αi > 0 then Yi (wT Xi + b) = 1 which

implies that this point lies on the boundary of the margin. Such a point is called
P a support
vector. On the other hand, if Yi (wT Xi + b) > 1 then αi = 0. Since w = i αi Yi Xi this
means that the hyperplane only depends on the support vectors.

If (Xi , Yi ) is a support vector then W T Xi + b = Yi . Since w = j αj Yj Xj , we see that

X
b = Yi − αj Yj XjT Xi .
j

Multiply by αi Yi and sum to get

X X X
αi Yi b = αi Yi2 − αi αj Yi Yj (XiT Xj ).
i i i,j

Since Yi2 = 1, w =
P P
i αi Yi Xi and αi Yi = 0 this implies that
i
X
0= αi − ||w||2 .
i

The margin ρ is 1/||w|| so that

1 1 1
ρ2 = 2
=P = .
||w|| i αi ||α||1

2
The Non-separable Case. Usually, the data are not linearly separable. So we can’t
assume that Yi (wT Xi + b) ≥ 1. We introduce slack variables ξi ≥ 0 and instead require
Yi (WiT Xi + b) ≥ 1 − ξi .
This allows points to be incorrectly classified. But it also allows points to be correctly
classified but be inside the margin. We change the optimization problem to
1 X
min ||w||2 + C ξi
w,b,ξ 2
i

subject to Yi (wT Xi + b) ≥ 1 − ξi and ξi ≥ 0. The constant C ≥ 0 controls the amount of

slack that is allowed.

The Lagrangian is
1 X X X
L = ||w||2 + C ξi − αi [Yi (wT Xi + b) − 1 + ξi ] − βi ξi .
2 i i i

Setting the derivative to 0 leads to the conditions

X
w= αi Yi Xi
i
X
0= αi Yi
i
C = α i + βi
0 = αi or Yi (wT Xi + b) = 1 − ξi
0 = βi or ξi = 0.
When αi > 0 we call Xi a support vector. If αi 6= 0 then
Yi (wT Xi + b) = 1 − ξi .
If ξi = 0 then Xi lies on the marginal hyperplane. If ξi 6= 0 then βi = 0 which implies
αi = C. In summary, support vectors lie on the marginal hyperplane or αi = C.

The dual problem has a simple form:

X 1X
max αi − αi αj Yi Yj XiT Xj
α
i
2 i,j
P
subject to 0 ≤ αi ≤ C and i αi Yi = 0. Again, it is a quadratic program and only involves
inner products of the Xi .

Since the VC dimension of hyperplane classifiers is d + 1, we know that, with probability at

least 1 − δ, r r
2(d + 1) log(en/(n + 1)) log(1/δ)
R(h) ≤ R(b h) + + . (1)
n 2n

3
But this bound does not use the structure of SVM’s. For this, we turn to margin theory.

Margins. Recall that the margin is

Yi (wT Xi + b)
ρ = min .
i ||w||
We can improve the VC bound using the margin.

Theorem 1 Suppose that the sample space is contained in {x : ||x|| ≤ r}. Let H be the set
of hyperplanes satisfying ||w|| ≤ Λ and mini |wT Xi | = 1. Then VC(H) ≤ r2 Λ2 .

Proof. Suppose that {x1 , . . . , xd } can be shattered. Then for y ∈ {−1, +1}d there exists w
such that 1 ≤ yi (wT xi ) for all i. Sum over i to get
X X X
d ≤ wT yi xi ≤ ||w|| || yi xi || ≤ Λ || yi xi ||.
i i i

This holds for all choices of yi . So it holds if Yi is drawn uniformly over {−1, +1}. Thus
E[Yi Yj ] = E[Yi ][Yj ] = 0 for i 6= j and E[Yi Yi ] = 1. So
d
X
s
X
d ≤ ΛE|| Yi xi || ≤ Λ E|| Yi xi ||2
i=1 i
sX s
X
T
=Λ E[Yi Yj ]xi xj = Λ xTi xi
i,j i
√ √
≤ Λ dr2 = Λr d

so that d ≤ r2 Λ.

If the data are separable, the hyperplane satisfies ||w|| = 1/ρ so that Λ2 = 1/ρ2 and hence
d ≤ r2 /ρ2 . Plugging this into (1) we get
s r
2r 2 log((enρ2 )/r 2 ) log(1/δ)
R(h) ≤ R(b h) + 2
+ (2)
nρ 2n

which is dimension independent.

Nonparametric SVM’s. We can get a nonparametric SVM using RKHS’s by replacing x

with a feature map Φ(x). Recall that Φ(x1 )T Φ(x2 ) = K(x1 , x2 ). So we get a nonparaametric

4
SVM by solving
X 1X
max αi − αi αj Yi Yj K(Xi , Xj )
α
i
2 i,j
P
subject to 0 ≤ αi ≤ C and i αi Yi = 0. The classifier is
!
X
h(x) = sign Yi K(Xi , x) + b .
i

This is a nonlinear (nonparametric) classifer.

36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Report 1
No ratings yet
Report 1
6 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
SVM
No ratings yet
SVM
28 pages
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
No ratings yet
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
14 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
No ratings yet
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
8 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support vector machine
No ratings yet
Support vector machine
49 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Chapter_8 (1)
No ratings yet
Chapter_8 (1)
52 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
svm
No ratings yet
svm
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
Classification: Linear SVM
No ratings yet
Classification: Linear SVM
26 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support_Vector_Machine(SVM)[1]
No ratings yet
Support_Vector_Machine(SVM)[1]
103 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Class 0420
No ratings yet
Class 0420
44 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
Lecture 7_SVM
No ratings yet
Lecture 7_SVM
125 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Ml -5 Sovan Lr Svm 1
No ratings yet
Ml -5 Sovan Lr Svm 1
59 pages
Svm Student
No ratings yet
Svm Student
40 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
1632118884_ML-TCS-Lecture-15 (1)
No ratings yet
1632118884_ML-TCS-Lecture-15 (1)
46 pages
EXP-14
No ratings yet
EXP-14
27 pages
ML-chap13_2024_110331
No ratings yet
ML-chap13_2024_110331
67 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Survey Piccialli sciandrone4OR
No ratings yet
Survey Piccialli sciandrone4OR
29 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
Main
No ratings yet
Main
12 pages
Lec8 PDF
No ratings yet
Lec8 PDF
5 pages
12_Bài toán phân lớp_SVM_v2
No ratings yet
12_Bài toán phân lớp_SVM_v2
138 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Classifying Personal Loan by Support Vector Machine: La Thanh Thao
No ratings yet
Classifying Personal Loan by Support Vector Machine: La Thanh Thao
14 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machines (SVM) Models in Stata
No ratings yet
Support Vector Machines (SVM) Models in Stata
19 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM
No ratings yet
SVM
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
HW7
No ratings yet
HW7
1 page
Lecture 8: Inference 36-401, Fall 2015, Section B
No ratings yet
Lecture 8: Inference 36-401, Fall 2015, Section B
16 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
1 Review
No ratings yet
1 Review
7 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
Mat 1320 Exercises-Solutions 3
No ratings yet
Mat 1320 Exercises-Solutions 3
3 pages
Lesson 17: Vectors in The Coordinate Plane: Student Outcomes
No ratings yet
Lesson 17: Vectors in The Coordinate Plane: Student Outcomes
18 pages
Trigonometric Functions Paper 1 and 2 (2019 - 2023)
No ratings yet
Trigonometric Functions Paper 1 and 2 (2019 - 2023)
14 pages
Prism P.U. Science College: A B AB
No ratings yet
Prism P.U. Science College: A B AB
2 pages
2d Curves - Notations
No ratings yet
2d Curves - Notations
2 pages
09 22 14 Laws of Exponents Rules Chart 1dlohdx PDF
No ratings yet
09 22 14 Laws of Exponents Rules Chart 1dlohdx PDF
1 page
Ordinary Differential Equations
100% (2)
Ordinary Differential Equations
238 pages
38 Chapter2 Therealnumbers: Section 2.4 Apppcatiom of The Supremum Property
No ratings yet
38 Chapter2 Therealnumbers: Section 2.4 Apppcatiom of The Supremum Property
4 pages
m3 Regular Jntu Question Papers 2008
No ratings yet
m3 Regular Jntu Question Papers 2008
7 pages
Problemas Taller 0 PDF
No ratings yet
Problemas Taller 0 PDF
4 pages
Further Mathematics: (Syllabus 9649)
No ratings yet
Further Mathematics: (Syllabus 9649)
15 pages
Worksheet - 2.sequences & Series
No ratings yet
Worksheet - 2.sequences & Series
2 pages
Engineering Mathematics
No ratings yet
Engineering Mathematics
14 pages
Finite Element Lectures
100% (1)
Finite Element Lectures
153 pages
2012 Exam Sample Answers
No ratings yet
2012 Exam Sample Answers
16 pages
Ii I SS Unit I
No ratings yet
Ii I SS Unit I
88 pages
The Transfer Function X (S) /F(S)
No ratings yet
The Transfer Function X (S) /F(S)
3 pages
British Columbia Secondary School Mathematics Contest, 2015: Senior Final, Part A
No ratings yet
British Columbia Secondary School Mathematics Contest, 2015: Senior Final, Part A
3 pages
2.3 ALGEBRA-Simultaneous Equations
No ratings yet
2.3 ALGEBRA-Simultaneous Equations
11 pages
Discrete & Continuous
No ratings yet
Discrete & Continuous
14 pages
Multiples Factors and Powers
No ratings yet
Multiples Factors and Powers
38 pages
(2020-2021) The 7 Hong Kong Primary Mathematics Challenge (2020-2021)
No ratings yet
(2020-2021) The 7 Hong Kong Primary Mathematics Challenge (2020-2021)
4 pages
Checklist 6
No ratings yet
Checklist 6
10 pages
10.4 Exponential Equations PDF
100% (2)
10.4 Exponential Equations PDF
5 pages
11th Maths Demo Notes
No ratings yet
11th Maths Demo Notes
1 page
Unit-Iii - Complex Integration: F (Z) DZ 0
No ratings yet
Unit-Iii - Complex Integration: F (Z) DZ 0
20 pages
Gauss Elimination and Gauss Jordan Method
No ratings yet
Gauss Elimination and Gauss Jordan Method
13 pages
FIGUEIREDO - Lectures On The Ekeland Variational Principle With
No ratings yet
FIGUEIREDO - Lectures On The Ekeland Variational Principle With
16 pages
Methods For Solving Nonlinear Equations
No ratings yet
Methods For Solving Nonlinear Equations
3 pages
quiz binary operations
No ratings yet
quiz binary operations
2 pages

Support Vector Machines

Uploaded by

Support Vector Machines

Uploaded by

Support Vector Machines

These notes are based on Mohri, Rostamizadeh and Talwalkar (2012).

Some Convex Optimization. Consider

min f (x) subject to gi (x) ≤ 0 i = 1, . . . , m.

Define the Lagrangian X

The dual function is define by

The Lagrangian for this problem is

This leads to the optimization

Consider the constraint αi [Yi (wT Xi + b) − 1] = 0. If αi > 0 then Yi (wT Xi + b) = 1 which

If (Xi , Yi ) is a support vector then W T Xi + b = Yi . Since w = j αj Yj Xj , we see that

Multiply by αi Yi and sum to get

The margin ρ is 1/||w|| so that

subject to Yi (wT Xi + b) ≥ 1 − ξi and ξi ≥ 0. The constant C ≥ 0 controls the amount of

Setting the derivative to 0 leads to the conditions

The dual problem has a simple form:

Since the VC dimension of hyperplane classifiers is d + 1, we know that, with probability at

Margins. Recall that the margin is

which is dimension independent.

Nonparametric SVM’s. We can get a nonparametric SVM using RKHS’s by replacing x

This is a nonlinear (nonparametric) classifer.

You might also like