0% found this document useful (0 votes)

31 views33 pages

Kernel Machines

Uploaded by

Shayan Chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views33 pages

Kernel Machines

Uploaded by

Shayan Chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture Slides for

INTRODUCTION TO

Machine Learning
2nd Edition

ETHEM ALPAYDIN
© The MIT Press, 2010
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
CHAPTER 13:

Kernel Machines
Outline
Introduction,
Optimal Separating Hyper plane,
The Non-separable Case: Soft Margin Hyper plane,
ν-SVM,
Kernel Trick,
Vectorial Kernels,
Defining Kernels

3
Introduction
Kernel machines are maximum margin methods that
allow the model to be written as a sum of the influences
of a subset of the training instances.
These influences are given by application-specific
similarity kernels, and we discuss “kernelized”
classification, regression, ranking, outlier detection and
dimensionality reduction, and how to choose and use
kernels.

4
We now discuss a different approach for linear
classification and regression.
We should not be surprised to have so many different
methods even for the simple case of a linear model.
Each learning algorithm has a different inductive bias,
makes different assumptions, and defines a different
objective function and thus may find a different linear
model.

5
The model that we will discuss in this chapter, called the
support vector machine (SVM), and later generalized under
the name kernel machine, has been popular in recent years
for a number of reasons:
1. It is a Discriminant-based method and uses Vapnik’s
principle to never solve a more complex problem as a first
step before the actual problem (Vapnik 1995).
2. After training, the parameter of the linear model, the
weight vector, can be written down in terms of a subset of
the training set, which are the so-called support vectors.
3. The output is written as a sum of the influences of
support vectors and these are given by kernel functions
that are application-specific measures of similarity
between data instances.
6
Typically in most learning algorithms, data points are
represented as vectors, and either dot product (as in the
multilayer perceptrons) or Euclidean distance (as in
radial basis function networks) is used.
4. A kernel function allows us to go beyond that. For
example, G1 and G2 may be two graphs and K(G1,G2)
may correspond to the number of shared paths, which
we can calculate without needing to represent G1 or G2
explicitly as vectors.
5. Kernel-based algorithms are formulated as convex
optimization problems, and there is a single optimum
that we can solve for analytically.

7
Kernel Machines
Discriminant-based: No need to estimate densities first
Define the discriminant in terms of support vectors
The use of kernel functions, application-specific
measures of similarity
No need to represent instances as vectors
Convex optimization problems with a unique solution

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Optimal Separating Hyperplane
t
  1 if x  C1
X  x , r t where r  
t t t
t
  1 if x C2
find w and w 0 such that
w T xt  w 0  1 for r t  1
w T xt  w 0  1 for r t  1
which can be rewritten as
r t w T xt  w 0   1

(Cortes and Vapnik, 1995; Vapnik, 1995)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Margin
Distance from the discriminant to the closest instances
on either side
Distance of x to the hyperplane is w T xt  w0
w
We require r t w T xt  w 0 
  , t
w

For a unique sol’n, fix ρ||w||=1, and to max margin

1
min w subject to r t w T xt  w0   1, t
2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
Margin •For a two-class
problem where the
instances of the
classes are
shown by plus signs
and dots, the thick
line is the boundary
and the dashed
lines
•define the margins
on either side.
Circled instances
are the support
vectors.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
1
min w subject to r t w T xt  w 0   1, t
2

2
1 N
Lp  w   t r t w T xt  w 0  1
2
2
 
t 1

1 N N
 w   r w x  w 0   t
2 t t T t

2 t 1 t 1

Lp N
 0  w   t r t xt
w t 1

Lp N
 0   t r t  0
w 0 t 1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
1 T
Ld  w w  w T  t r t xt w 0  t r t   t
2 t t t

1 T
  w w   t
2 t

1
    r r x  x   t
t s t s t T s

2 t s t

subject to  t r t  0 and  t  0, t
t

Most αt are 0 and only a small number have αt >0; they are
the support vectors

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Soft Margin Hyperplane
Not linearly separable
r t w T x t  w0   1   t

Soft error

 
t
t

New primal is

1

Lp  w  C t  t  t  t r t w T x t  w 0  1   t t  t t
2
2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
15
Hinge Loss t t  0
Lhinge (y , r )   t t
if y t r t  1
1  y r otherwise

16
n-SVM 1 1
min w -     t
2

2 N t
subject to
r t w T xt  w 0      t , t  0,   0
1 N
Ld     r r x  x
t s t s t T s

2 t 1 s
subject to
1
t  t t
r  0 ,0   t

N t
,   t


n controls the fraction of support vectors

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
Kernel Trick
Preprocess input x by basis functions
z = φ(x)
g(z)=wTz
g(x)=wT φ(x)
The SVM solution
w    t r t z t    t r t φ x t 
t t

g x   w φx     r φ x
T t t
 φx 
t T

g x     t r t K x t , x 
t

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18
•A Kernel Trick is a simple method where a Non Linear
data is projected onto a higher dimension space so as to
make it easier to classify the data where it could be
linearly divided by a plane.
•This is mathematically achieved by Lagrangian formula
using Lagrangian multipliers.

19
•Kernel: A kernel is a
method of placing a
two dimensional
plane into a higher
dimensional space, so
that it is curved in
the higher
dimensional space.
(In simple terms, a
kernel is a function
from the low
dimensional space
into a higher
dimensional space.)
20
Advantages of Support Vector Machine

1. Training of the model is relatively easy.

2. The model scales relatively well to high
dimensional data
3. SVM is a useful alternative to neural networks
4. Trade-off amongst classifier complexity and error
can be controlled explicitly
5. It is useful for both Linearly Separable and Non-
linearly Separable data
6. Assured Optimality: The solution is guaranteed to
be the global minimum due to the nature of
Convex Optimization
21
Disadvantages of Support Vector Machine

1. Picking right kernel and parameters can be

computationally intensive
2. In Natural Language Processing (NLP), a structured
representation of text yields better performance.
However, SVMs cannot accommodate such
structures(word embedding)

22
Vectorial Kernels
Polynomials of degree q:

K x , x   x x  1
t T t q

K x , y   xT y  1
2

 x1y1  x 2 y 2  12
 1  2 x1y1  2 x 2 y 2  2 x1 x 2 y1y 2  x12 y12  x 22 y 22

 x   1, 2 x1 , 2 x 2 , 2 x1 x 2 , x , x 2
1 2 
2 T

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 23
Vectorial Kernels
Radial-basis functions:
2
 xt  x 
K xt , x   exp  
 2s 2 
 

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24
Defining kernels
Kernel “engineering”
Defining good measures of similarity
String kernels, graph kernels, image kernels, ...
Empirical kernel map: Define a set of templates mi and
score function s(x,mi)
f(xt)=[s(xt,m1), s(xt,m2),..., s(xt,mM)]
and
K(x,xt)=f (x)T f (xt)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25
Multiple Kernel Learning
Fixed kernel combination  cK x, y 

K x, y   K1 x, y   K 2 x , y 
 K x, y K x, y 
 1 2

Adaptive kernel combination

m
K x , y    i K i x , y 
i 1

1
Ld   t    t s r t r s i K i xt , x s 
t 2 t s i

g(x)   t r t  i K i xt , x 
t i

Localized kernel combination

g(x)   t r t i x| K i xt , x 
t i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 26
Multiclass Kernel Machines
1-vs-all
Pairwise separation
Error-Correcting Output Codes
Single multiclass optimization
1 K
min  w i  C   it
2

2 i 1 i t

subject to
w z t T xt  w z t 0  w i T xt  wi 0  2   it , i  z t ,  it  0

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27
SVM for Regression
Use a linear model (possibly kernelized)
f(x)=wTx+w0
Use the є-sensitive error function
0 if r t  f xt   
e r , f x    t
t t

 r  f x   
t
otherwise
1
min w  C   t   t 
2

2 t

r t  w T x  w0     t 
w x  w  r
T
0
t
    t
 t , t  0
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 28
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
Kernel Regression
Polynomial kernel Gaussian kernel

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30
One-Class Kernel Machines
Consider a sphere with center a and radius R
min R 2  C  t
t

subject to
xt  a  R 2   t , t  0
N
Ld   x t t T
 x    r r x  x
s t s t s t T s

t t 1 s

subject to
0   t  C ,  t  1
t

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 31
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 32
Kernel Dimensionality Reduction

Kernel PCA does

PCA on the kernel
matrix (equal to
canonical PCA with
a linear kernel)
Kernel LDA

Screenshot 2024-04-30 at 7.29.40 AM
No ratings yet
Screenshot 2024-04-30 at 7.29.40 AM
32 pages
N Link Pendulum
No ratings yet
N Link Pendulum
11 pages
Rise of Surrealism - Bohn, Willard
89% (9)
Rise of Surrealism - Bohn, Willard
261 pages
Chap13 KernelMachines
No ratings yet
Chap13 KernelMachines
24 pages
Unit 3 Kernel Machines
No ratings yet
Unit 3 Kernel Machines
24 pages
11 Ethem Linear SVM 2015
No ratings yet
11 Ethem Linear SVM 2015
66 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
This Is
No ratings yet
This Is
7 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
15 Support Vector Machines
No ratings yet
15 Support Vector Machines
30 pages
0701907v3
No ratings yet
0701907v3
53 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
Ds 11
No ratings yet
Ds 11
21 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Lecture 6 - Classification - SVM
No ratings yet
Lecture 6 - Classification - SVM
48 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Regression
No ratings yet
Regression
33 pages
SVM Class
No ratings yet
SVM Class
33 pages
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
No ratings yet
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
78 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
32 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
Lecture 15
No ratings yet
Lecture 15
35 pages
poly_aml
No ratings yet
poly_aml
76 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Toc
No ratings yet
Toc
14 pages
Vahid
No ratings yet
Vahid
18 pages
i2ml2e-chap5-v1-0
No ratings yet
i2ml2e-chap5-v1-0
26 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
Unit 3 Linear Discrimination
No ratings yet
Unit 3 Linear Discrimination
14 pages
i2ml2e-chap4-v1-0
No ratings yet
i2ml2e-chap4-v1-0
27 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Intro
No ratings yet
Intro
24 pages
Svm
No ratings yet
Svm
52 pages
SVM
No ratings yet
SVM
40 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Slide+-+SVM
No ratings yet
Slide+-+SVM
12 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
AIML M-4 & M-5 (1)
No ratings yet
AIML M-4 & M-5 (1)
67 pages
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
No ratings yet
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
84 pages
Svm
No ratings yet
Svm
52 pages
Handout 03 Classic Classifiers
No ratings yet
Handout 03 Classic Classifiers
39 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Thebook PDF
No ratings yet
Thebook PDF
234 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
AnIntroductiontoMachineLearning - thebook
No ratings yet
AnIntroductiontoMachineLearning - thebook
234 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
28 pages
Support Vector Machines
No ratings yet
Support Vector Machines
26 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
Ordinary Differential Equations and Stability Theory: An Introduction
From Everand
Ordinary Differential Equations and Stability Theory: An Introduction
David A. Sanchez
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
Unit 4: Extensions: Instructions
No ratings yet
Unit 4: Extensions: Instructions
4 pages
MATHS (STANDARD)
No ratings yet
MATHS (STANDARD)
35 pages
Chem206-Syllabus W2010 01 52
No ratings yet
Chem206-Syllabus W2010 01 52
6 pages
Whole Number Worksheet 1
No ratings yet
Whole Number Worksheet 1
1 page
Romberg Integration
No ratings yet
Romberg Integration
15 pages
Reliability L.D. Arya
100% (5)
Reliability L.D. Arya
169 pages
Russian Math Lexington Homework
100% (1)
Russian Math Lexington Homework
4 pages
Study Schedule Topic Learning Outcomes Activities Week 5: Completion of Let's Try This and Gauge Your Learning Activities
No ratings yet
Study Schedule Topic Learning Outcomes Activities Week 5: Completion of Let's Try This and Gauge Your Learning Activities
11 pages
Full Download Proceedings of The 13th International Congress On Mathematical Education ICME 13 1st Edition Gabriele Kaiser (Eds.) PDF
100% (3)
Full Download Proceedings of The 13th International Congress On Mathematical Education ICME 13 1st Edition Gabriele Kaiser (Eds.) PDF
52 pages
DSKP KSSM Mathematics Form 3 Edited
100% (1)
DSKP KSSM Mathematics Form 3 Edited
31 pages
A Tour Around The Geometry of A Cyclic Quadrilateral
100% (2)
A Tour Around The Geometry of A Cyclic Quadrilateral
31 pages
1743 Chapter 3 Probability
No ratings yet
1743 Chapter 3 Probability
21 pages
Chapter 4 Mathematical Reasoning
No ratings yet
Chapter 4 Mathematical Reasoning
26 pages
Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
No ratings yet
Model Selection and Feature Selection: Piyush Rai CS5350/6350: Machine Learning
14 pages
General Mathematics Assignment
No ratings yet
General Mathematics Assignment
2 pages
Egyptian Algebra
100% (1)
Egyptian Algebra
2 pages
Basic Cal Q4 Module 5
No ratings yet
Basic Cal Q4 Module 5
12 pages
Week 5 Math 10-05-2020
No ratings yet
Week 5 Math 10-05-2020
2 pages
Advanced Maths Test: Class: IV
No ratings yet
Advanced Maths Test: Class: IV
4 pages
MSG 00008
No ratings yet
MSG 00008
4 pages
Mtes1104 Coursework
100% (1)
Mtes1104 Coursework
3 pages
Eapcet Grand Test -1 (26-04-2025)_1619047_2025_04_26_10_58
No ratings yet
Eapcet Grand Test -1 (26-04-2025)_1619047_2025_04_26_10_58
8 pages
Algebraic Expressions 2 - Js3 For Voicing
No ratings yet
Algebraic Expressions 2 - Js3 For Voicing
13 pages
C++ Programming CODE-Algorithm-Flowchart of Positive Negative Zero
No ratings yet
C++ Programming CODE-Algorithm-Flowchart of Positive Negative Zero
2 pages
LUMS SDSB Essay Prompt 2
No ratings yet
LUMS SDSB Essay Prompt 2
2 pages
AP Calc Midterm (MCQ Part 2-Calc) 2021
50% (2)
AP Calc Midterm (MCQ Part 2-Calc) 2021
6 pages
Are There True Contradictions A Critical Discussion of Graham Priest S Beyond The Limits of Thought
No ratings yet
Are There True Contradictions A Critical Discussion of Graham Priest S Beyond The Limits of Thought
12 pages