0% found this document useful (0 votes)

70 views24 pages

05 Kernel

Uploaded by

Hongphi Hiep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views24 pages

05 Kernel

Uploaded by

Hongphi Hiep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Kernel Method

1 / 24
Input and Feature Space

For mining and analysis, it is important to find a suitable data representation.

For example, for complex data such as text, sequences, images, and so on,
we must typically extract or construct a set of attributes or features, so that we
can represent the data instances as multivariate vectors.
Given a data instance x (e.g., a sequence), we need to find a mapping φ, so
that φ(x) is the vector representation of x.
Even when the input data is a numeric data matrix a nonlinear mapping φ may
be used to discover nonlinear relationships.
The term input space refers to the data space for the input data x and feature
space refers to the space of mapped vectors φ(x).

2 / 24
Sequence-based Features
Consider a dataset of DNA sequences over the alphabet Σ = {A, C, G, T }.
One simple feature space is to represent each sequence in terms of the
probability distribution over symbols in Σ. That is, given a sequence x with
length |x| = m, the mapping into feature space is given as

φ(x) = {P(A), P(C), P(G), P(T )}

where P(s) = nms is the probability of observing symbol s ∈ Σ, and ns is the

number of times s appears in sequence x.
For example, if x = ACAGCAGTA, with m = |x| = 9, since A occurs four
times, C and G occur twice, and T occurs once, we have

φ(x) = (4/9, 2/9, 2/9, 1/9) = (0.44, 0.22, 0.22, 0.11)

We can compute larger feature spaces by considering, for example, the

probability distribution over all substrings or words of size up to k over the
alphabet Σ.

3 / 24
Nonlinear Features

Consider the mapping φ that takes as input a vector x = (x1 , x2 )T ∈ R2 and

maps it to a “quadratic” feature space via the nonlinear mapping
√
φ(x) = (x12 , x22 , 2x1 x2 )T ∈ R3

For example, the point x = (5.9, 3)T is mapped to the vector

√
φ(x) = (5.92 , 32 , 2 · 5.9 · 3)T = (34.81, 9, 25.03)T

We can then apply well-known linear analysis methods in the feature space.

4 / 24
Kernel Method
Let I denote the input space, which can comprise any arbitrary set of objects, and let
D = {xi }ni=1 ⊂ I be a dataset comprising n objects in the input space. Let φ : I → F
be a mapping from the input space I to the feature space F.
Kernel methods avoid explicitly transforming each point x in the input space into the
mapped point φ(x) in the feature space. Instead, the input objects are represented via
their pairwise similarity values comprising the n × n kernel matrix, defined as

K (x1 , x1 ) K (x1 , x2 ) · · · K (x1 , xn )

 
K (x2 , x1 ) K (x2 , x2 ) · · · K (x2 , xn )
K= .. .. ..
 
.. 
 . . . . 
K (xn , x1 ) K (xn , x2 ) · · · K (xn , xn )

K : I × I → R is a kernel function on any two points in input space, which should

satisfy the condition

K (xi , xj ) = φ(xi )T φ(xj )

Intuitively, we need to be able to compute the value of the dot product using the
original input representation x, without having recourse to the mapping φ(x).
5 / 24
Linear Kernel

Let φ(x) → x be the identity kernel. This leads to the linear kernel, which is
simply the dot product between two input vectors:

φ(x)T φ(y) = xT y = K (x, y)

T T
For example, if x1 = 5.9 3 and x2 = 6.9 3.1 , then we have

K (x1 , x2 ) = xT1 x2 = 5.9 × 6.9 + 3 × 3.1 = 40.71 + 9.3 = 50.01

X2
x4 K x1 x2 x3 x4 x5
bC
x1 x2
Cb x3 bC x1 43.81 50.01 47.64 36.74 42.00
3.0 bC
x2 50.01 57.22 54.53 41.66 48.22
x3 47.64 54.53 51.97 39.64 45.98
2.5 x5 x4 36.74 41.66 39.64 31.40 34.64
Cb
x5 42.00 48.22 45.98 34.64 40.84
2 X1
4.5 5.0 5.5 6.0 6.5

6 / 24
Kernel Trick

Many data mining methods can be kernelized that is, instead of mapping the
input points into feature space, the data can be represented via the n × n
kernel matrix K, and all relevant analysis can be performed over K.
This is done via the kernel trick, that is, show that the analysis task requires
only dot products φ(xi )T φ(xj ) in feature space, which can be replaced by the
corresponding kernel K (xi , xj ) = φ(xi )T φ(xj ) that can be computed efficiently
in input space.
Once the kernel matrix has been computed, we no longer even need the input
points xi , as all operations involving only dot products in the feature space can
be performed over the n × n kernel matrix K.

7 / 24
Kernel Matrix

A function K is called a positive semidefinite kernel if and only if it is

symmetric:
K (xi , xj ) = K (xj , xi )
and the corresponding kernel matrix K for any subset D ⊂ I is positive
semidefinite, that is,

aT Ka ≥ 0, for all vectors a ∈ Rn

which implies that
Xn Xn
ai aj K (xi , xj ) ≥ 0, for all ai ∈ R, i ∈ [1, n]
i=1 j=1

8 / 24
Dot Products and Positive Semi-definite Kernels
Positive Semidefinite Kernel
If K (xi , xj ) represents the dot product φ(xi )T φ(xj ) in some feature space, then K is a
positive semidefinite kernel.

First, K is symmetric since the dot product is symmetric, which also implies that K is
symmetric.

Second, K is positive semidefinite because

n X
X n
aT Ka = ai aj K (xi , xj )
i=1 j=1
Xn X n
= ai aj φ(xi )T φ(xj )
i=1 j=1
n
!T  n 
X X
= ai φ(xi )  aj φ(xj )
i=1 j=1
n 2
X
= ai φ(xi ) ≥0
i=1
Zaki & Meira Jr. (RPI and UFMG) Data Mining andAnalysis 9 / 24
Empirical Kernel Map
We now show that if we are given a positive semidefinite kernel
K : I × I → R, then it corresponds to a dot product in some feature space F .
Define the map φ as follows:
T
φ(x) = (K (x1 , x), K (x2 , x), . . . , K (xn , x) ∈ Rn
The empirical kernel map is defined as
T
φ(x) = K−1/2 · (K (x1 , x), K (x2 , x), . . . , K (xn , x) ∈ Rn
so that the dot product yields
T
φ(xi )T φ(xj ) = K−1/2 Ki K−1/2 Kj
= KTi K−1/2 K−1/2 Kj

= KTi K−1 Kj
where Ki is the ith column of K.
Over all pairs of mapped points, we have
T −1 n
Ki K Kj i,j=1 = K K−1 K = K
10 / 24
Data-specific Mercer Kernel Map

The Mercer kernel map also corresponds to a dot product in feature space.
Since K is a symmetric positive semidefinite matrix, it has real and non-negative
eigenvalues. It can be decomposed as follows:

K = UΛUT

where U is the orthonormal matrix of eigenvectors ui = (ui1 , ui2 , . . . , uin )T ∈ Rn

(for i = 1, . . . , n), and Λ is the diagonal matrix of eigenvalues, with both arranged in
non-increasing order of the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn ≥ 0:
The Mercer map φ is given as
√
φ(xi ) = ΛUi

where Ui is the ith row of U.

The kernel value is simply the dot product between scaled rows of U:
√ T √
φ(xi )T φ(xj ) = ΛUi ΛUj = UTi ΛUj

11 / 24
Polynomial Kernel
Polynomial kernels are of two types: homogeneous or inhomogeneous.
Let x, y ∈ Rd . The (inhomogeneous) polynomial kernel is defined as

Kq (x, y) = φ(x)T φ(y) = (c + xT y)q

where q is the degree of the polynomial, and c ≥ 0 is some constant. When c = 0 we

obtain the homogeneous kernel, comprising only degree q terms. When c > 0, the
feature space is spanned by all products of at most q attributes.
This can be seen from the binomial expansion
q
!
X q q−k T k
Kq (x, y) = (c + xT y)q = c x y
k
k =1

The most typical cases are the linear (with q = 1) and quadratic (with q = 2) kernels,
given as

K1 (x, y) = c + xT y
K2 (x, y) = (c + xT y)2

12 / 24
Gaussian Kernel

The Gaussian kernel, also called the Gaussian radial basis function (RBF)
kernel, is defined as
( )
kx − yk2
K (x, y) = exp −
2σ 2

where σ > 0 is the spread parameter that plays the same role as the standard
deviation in a normal density function.
Note that K (x, x) = 1, and further that the kernel value is inversely related to
the distance between the two points x and y.
A feature space for the Gaussian kernel has infinite dimensionality.

13 / 24
Basic Kernel Operations in Feature Space
Basic data analysis tasks that can be performed solely via kernels, without
instantiating φ(x).
Norm of a Point: We can compute the norm of a point φ(x) in feature space
as follows:

kφ(x)k2 = φ(x)T φ(x) = K (x, x)

p
which implies that kφ(x)k = K (x, x).
Distance between Points: The distance between φ(xi ) and φ(xj ) is
2 2 2
kφ(xi ) − φ(xj )k = kφ(xi )k + kφ(xj )k − 2φ(xi )T φ(xj )
= K (xi , xi ) + K (xj , xj ) − 2K (xi , xj )

which implies that

q
kφ(xi ) − φ(xj )k = K (xi , xi ) + K (xj , xj ) − 2K (xi , xj )

14 / 24
Basic Kernel Operations in Feature Space
Kernel Value as Similarity: We can rearrange the terms in
2
kφ(xi ) − φ(xj )k = K (xi , xi ) + K (xj , xj ) − 2K (xi , xj )

to obtain
1
kφ(xi )k2 + kφ(xj )k2 − kφ(xi ) − φ(xj )k2 = K (xi , xj ) = φ(xi )T φ(xj )

2
The more the distance kφ(xi ) − φ(xj )k between the two points in feature
space, the less the kernel value, that is, the less the similarity.
Mean in Feature Space: The mean of the points in feature space is given as
µφ = 1/n ni=1 φ(xi ). Thus, we cannot compute it explicitly. However, the the
P
squared norm of the mean is:
n n
1 XX
kµφ k2 = µTφ µφ = K (xi , xj ) (1)
n2
i=1 j=1

The squared norm of the mean in feature space is simply the average of the
values in the kernel matrix K.
15 / 24
Basic Kernel Operations in Feature Space
Total Variance in Feature Space: The total variance in feature space is obtained by
taking the average squared deviation of points from the mean in feature space:
n n n n
1X 1X 1 XX
σφ2 = kφ(xi ) − µφ k2 = K (xi , xi ) − 2 K (xi , xj )
n n n
i=1 i=1 i=1 j=1

Centering in Feature Space We can center each point in feature space by subtracting
the mean from it, as follows:
φ̂(xi ) = φ(xi ) − µφ
The kernel between centered points is given as
Kˆ(xi , xj ) = φ̂(xi )T φ̂(xj )
n n n n
1X 1X 1 XX
K (xi , xj ) − K (xi , xk ) − K (xj , xk ) + 2 K (xa , xb )
n n n
k =1 k =1 a=1 b=1

More compactly, we have:

1 1
K̂ = I− 1n×n K I − 1n×n
n n
where 1n×n is the n × n matrix of ones.
16 / 24
Basic Kernel Operations in Feature Space

Normalizing in Feature Space: The dot product between normalized points

in feature space corresponds to the cosine of the angle between them

φ(xi )T φ(xj )
φn (xi )T φn (xj ) = = cos θ
kφ(xi )k · kφ(xj )k

If the mapped points are both centered and normalized, then a dot product
corresponds to the correlation between the two points in feature space.
The normalized kernel matrix, Kn , can be computed using only the kernel
function K , as

φ(xi )T φ(xj ) K (xi , xj )

Kn (xi , xj ) = = p
kφ(xi )k · kφ(xj )k K (xi , xi ) · K (xj , xj )

Kn has all diagonal elements as 1.

17 / 24
Spectrum Kernel for Strings
l
Given alphabet Σ, the l-spectrum feature map is the mapping φ : Σ∗ → R|Σ| from the
set of substrings over Σ to the |Σ|l -dimensional space representing the number of
occurrences of all possible substrings of length l, defined as
T
φ(x) = · · · , #(α), · · ·
α∈Σl

where #(α) is the number of occurrences of the l-length string α in x.

The (full) spectrum map considers all lengths from l = 0 to l = ∞, leading to an infinite
dimensional feature map φ : Σ∗ → R∞ :
T
φ(x) = · · · , #(α), · · ·
α∈Σ∗

where #(α) is the number of occurrences of the string α in x.

The (l-)spectrum kernel between two strings xi , xj is simply the dot product between
their (l-)spectrum maps:

K (xi , xj ) = φ(xi )T φ(xj )

The (full) spectrum kernel can be computed efficiently via suffix trees in O(n + m) time
for two strings of length n and m.
18 / 24
Diffusion Kernels on Graph Nodes
Let S be some symmetric similarity matrix between nodes of a graph
G = (V , E). For instance, S can be the (weighted) adjacency matrix A or the
Laplacian matrix L = A − ∆ (or its negation), where ∆ is the degree matrix for
an undirected graph G, defined as ∆(i, i) = di and ∆(i, j) = 0 for all i 6= j, and
di is the degree of node i.
Power Kernels: Summing up the product of the base similarities over all
l-length paths between two nodes, we obtain the l-length similarity matrix S(l) ,
which is simply the lth power of S, that is,

S(l) = Sl

Even path lengths lead to positive semidefinite kernels, but odd path lengths
are not guaranteed to do so, unless the base matrix S is itself a positive
semidefinite matrix.
Power kernel K can be obtained via the eigen-decomposition of Sl :
l
K = Sl = UΛUT = U Λl UT

19 / 24
Exponential Diffusion Kernel
The exponential diffusion kernel we can obtain a new kernel between nodes of a graph
by paths of all possible lengths, but damps the contribution of longer paths
∞
X 1 l l
K= βS
l!
l=0
1 1
= I + βS + β 2 S2 + β 3 S3 + · · ·
2! 3!

= exp βS
where β is a damping factor, and exp{βS} is the matrix exponential. The series on the
right hand side above converges for all β ≥ 0.
Substituting S = UΛUT the kernel can be computed as
1
K = I + βS + β 2 S2 + · · ·
2!
exp{βλ1 } 0 ··· 0
 
 0 exp{βλ 2} ··· 0 
 T
= U .. .. U

..
 . . . 0 
0 0 ··· exp{βλn }
where λi is an eigenvalue of S.
20 / 24
Von Neumann Diffusion Kernel

The von Neumann diffusion kernel is defined as

∞
X
K= β l Sl
l=0

where β ≥ 0. Expanding and rearranging the terms, we obtain

K = (I − βS)−1

The kernel is guaranteed to be positive semidefinite if |β| < 1/ρ(S), where

ρ(S) = maxi {|λi |} is called the spectral radius of S, defined as the largest
eigenvalue of S in absolute value.

21 / 24
Graph Diffusion Kernel: Example

v4 v5

v3 v2
Adjacency and degree matrices are given as
   
0 0 1 1 0 2 0 0 0 0
0 0 1 0 1 0 2 0 0 0
   
A= 1 1 0 1 0 0
∆= 0 3 0 0


1 0 1 0 1 0 0 0 3 0
0 1 0 1 0 0 0 0 0 2

22 / 24
Graph Diffusion Kernel: Example
Let the base similarity matrix S be the negated Laplacian matrix
 
−2 0 1 1 0
 0 −2 1 0 1
 
S = −L = A − D =   1 1 −3 1 0

 1 0 1 −3 1
0 1 0 1 −2

The eigenvalues of S are as follows:

λ1 = 0 λ2 = −1.38 λ3 = −2.38 λ4 = −3.62 λ5 = −4.62

and the eigenvectors of S are

 
u1 u2 u3 u4 u5
0.45 −0.63 0.00 0.63 0.00
 
0.45 0.51 −0.60 0.20 −0.37
U=  
0.45 −0.20 −0.37 −0.51 0.60
0.45 −0.20 0.37 −0.51 −0.60
0.45 0.51 0.60 0.20 0.37

23 / 24
Graph Diffusion Kernel: Example
Assuming β = 0.2, the exponential diffusion kernel matrix is given as
 
exp{0.2λ1 } 0 ··· 0
 0 exp{0.2λ2 } · · · 0 
 T
K = exp 0.2S = U  .. .. U

. .
 . . . 0 
0 0 · · · exp{0.2λn }
 
0.70 0.01 0.14 0.14 0.01
0.01 0.70 0.13 0.03 0.14
 
=0.14 0.13 0.59 0.13 0.03

0.14 0.03 0.13 0.59 0.13
0.01 0.14 0.03 0.13 0.70
Assuming β = 0.2, the von Neumann kernel is given as
 
0.75 0.02 0.11 0.11 0.02
0.02 0.74 0.10 0.03 0.11
K = U(I − 0.2Λ)−1 UT = 
 
0.11 0.10 0.66 0.10 0.03
0.11 0.03 0.10 0.66 0.10
0.02 0.11 0.03 0.10 0.74

24 / 24

Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
3 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Lec5 SVM Kernel SoftMargin
No ratings yet
Lec5 SVM Kernel SoftMargin
44 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
Kernel Models for Data Scientists
No ratings yet
Kernel Models for Data Scientists
56 pages
Lec 16
No ratings yet
Lec 16
23 pages
ML Mod 4
No ratings yet
ML Mod 4
26 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
33 pages
Kernel Methods for Pattern Analysis
No ratings yet
Kernel Methods for Pattern Analysis
77 pages
The Representation of Similarities in Linear Spaces
No ratings yet
The Representation of Similarities in Linear Spaces
17 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Week 9 Notes
No ratings yet
Week 9 Notes
6 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Kernel Methods for Nonlinear Regression
No ratings yet
Kernel Methods for Nonlinear Regression
23 pages
Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
No ratings yet
Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
5 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
Data An-6
No ratings yet
Data An-6
36 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
Dark Background Image-Denosing Based On KPCA Method
No ratings yet
Dark Background Image-Denosing Based On KPCA Method
4 pages
4c Kernels
No ratings yet
4c Kernels
31 pages
Machine Learning: Kernel Methods
No ratings yet
Machine Learning: Kernel Methods
6 pages
Kernel Classes in Machine Learning
No ratings yet
Kernel Classes in Machine Learning
14 pages
Machine Learning: Kernel Methods Explained
No ratings yet
Machine Learning: Kernel Methods Explained
19 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
25 pages
Ds 11
No ratings yet
Ds 11
21 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
KernelTrick PDF
No ratings yet
KernelTrick PDF
4 pages
SVM Kernels: Non-Linear Learning
No ratings yet
SVM Kernels: Non-Linear Learning
15 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
5 pages
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
Kernel Trick
No ratings yet
Kernel Trick
40 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
SVM Kernal
No ratings yet
SVM Kernal
5 pages
Kernelized Perceptron Techniques
No ratings yet
Kernelized Perceptron Techniques
13 pages
Note KT 1
No ratings yet
Note KT 1
5 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Lec3-The Kernel Trick
No ratings yet
Lec3-The Kernel Trick
4 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
Kernel Methods for Statisticians
No ratings yet
Kernel Methods for Statisticians
53 pages
Unit IV
No ratings yet
Unit IV
144 pages
Handout 03 Classic Classifiers
No ratings yet
Handout 03 Classic Classifiers
39 pages
Kernel Methods in Computational Biology
No ratings yet
Kernel Methods in Computational Biology
42 pages
KPCA
No ratings yet
KPCA
26 pages
Anomaly Detection via Kernel Combination
No ratings yet
Anomaly Detection via Kernel Combination
14 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
SVM
No ratings yet
SVM
9 pages
5th Unit ML
No ratings yet
5th Unit ML
40 pages
Kernel Methods for Nonlinear Models
No ratings yet
Kernel Methods for Nonlinear Models
15 pages
Lecture 14: Kernels - Applied ML
No ratings yet
Lecture 14: Kernels - Applied ML
14 pages
Understanding Kernel Methods in ML
No ratings yet
Understanding Kernel Methods in ML
5 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
A Survey of Quantum Computing For Finance
No ratings yet
A Survey of Quantum Computing For Finance
60 pages
Feature Extraction in Data Analysis
No ratings yet
Feature Extraction in Data Analysis
90 pages
Classical Mechanics Ph.D. Qualifying Examination
No ratings yet
Classical Mechanics Ph.D. Qualifying Examination
6 pages
Unit 4
No ratings yet
Unit 4
49 pages
Digital Image Processing Solutions
No ratings yet
Digital Image Processing Solutions
3 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
MAT9004 Lecture Outline
No ratings yet
MAT9004 Lecture Outline
4 pages
Principal Component Analysis vs. Exploratory Factor Analysis
No ratings yet
Principal Component Analysis vs. Exploratory Factor Analysis
11 pages
QP - Ma Jam2022
No ratings yet
QP - Ma Jam2022
42 pages
Evolutionary Spectral Clustering by Incorporating Temporal Smoothness.
No ratings yet
Evolutionary Spectral Clustering by Incorporating Temporal Smoothness.
10 pages
Estimation of The Transition Matrix of A Discrete-Time
No ratings yet
Estimation of The Transition Matrix of A Discrete-Time
10 pages
The State Space Representation in RLC Circuits
No ratings yet
The State Space Representation in RLC Circuits
11 pages
Topic01 - Principal Component Analysis
No ratings yet
Topic01 - Principal Component Analysis
14 pages
Jordan Canonical Form
No ratings yet
Jordan Canonical Form
27 pages
Formula Sheets ECE GATE ESE
No ratings yet
Formula Sheets ECE GATE ESE
53 pages
Complex Numbers & Differentiation
No ratings yet
Complex Numbers & Differentiation
270 pages
Prelims2014-15 Final 1
No ratings yet
Prelims2014-15 Final 1
37 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
53 pages
Mws Gen Sle PPT Eigenvalues
No ratings yet
Mws Gen Sle PPT Eigenvalues
41 pages
Nonlinearities in Control Systems
No ratings yet
Nonlinearities in Control Systems
113 pages
Engineering Chemistry Curriculum Overview
No ratings yet
Engineering Chemistry Curriculum Overview
28 pages
Computational Mathematics in The Era of Data Science
No ratings yet
Computational Mathematics in The Era of Data Science
42 pages
Math 321
No ratings yet
Math 321
268 pages
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
100% (3)
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
446 pages
18matdip41 January 2025
No ratings yet
18matdip41 January 2025
3 pages
In This Article, We Will Offer A Geometric Explanation of Singular Value Decompositions and Look at Some of The Applications of Them. ..
No ratings yet
In This Article, We Will Offer A Geometric Explanation of Singular Value Decompositions and Look at Some of The Applications of Them. ..
12 pages
Buckling and The Use of Finite Element Analysis
No ratings yet
Buckling and The Use of Finite Element Analysis
16 pages
Quadratic Forms and Definite Matrices
No ratings yet
Quadratic Forms and Definite Matrices
15 pages
Linear Algebra Exam Questions
No ratings yet
Linear Algebra Exam Questions
14 pages
MA24101 Syllabus
No ratings yet
MA24101 Syllabus
2 pages

05 Kernel

Uploaded by

05 Kernel

Uploaded by

Kernel Method

For mining and analysis, it is important to find a suitable data representation.

φ(x) = {P(A), P(C), P(G), P(T )}

where P(s) = nms is the probability of observing symbol s ∈ Σ, and ns is the

φ(x) = (4/9, 2/9, 2/9, 1/9) = (0.44, 0.22, 0.22, 0.11)

We can compute larger feature spaces by considering, for example, the

Consider the mapping φ that takes as input a vector x = (x1 , x2 )T ∈ R2 and

For example, the point x = (5.9, 3)T is mapped to the vector

K (x1 , x1 ) K (x1 , x2 ) · · · K (x1 , xn )

K : I × I → R is a kernel function on any two points in input space, which should

K (xi , xj ) = φ(xi )T φ(xj )

φ(x)T φ(y) = xT y = K (x, y)

K (x1 , x2 ) = xT1 x2 = 5.9 × 6.9 + 3 × 3.1 = 40.71 + 9.3 = 50.01

A function K is called a positive semidefinite kernel if and only if it is

aT Ka ≥ 0, for all vectors a ∈ Rn

Second, K is positive semidefinite because

where U is the orthonormal matrix of eigenvectors ui = (ui1 , ui2 , . . . , uin )T ∈ Rn

where Ui is the ith row of U.

Kq (x, y) = φ(x)T φ(y) = (c + xT y)q

where q is the degree of the polynomial, and c ≥ 0 is some constant. When c = 0 we

kφ(x)k2 = φ(x)T φ(x) = K (x, x)

which implies that

More compactly, we have:

Normalizing in Feature Space: The dot product between normalized points

φ(xi )T φ(xj ) K (xi , xj )

Kn has all diagonal elements as 1.

where #(α) is the number of occurrences of the l-length string α in x.

where #(α) is the number of occurrences of the string α in x.

K (xi , xj ) = φ(xi )T φ(xj )

The von Neumann diffusion kernel is defined as

where β ≥ 0. Expanding and rearranging the terms, we obtain

The kernel is guaranteed to be positive semidefinite if |β| < 1/ρ(S), where

The eigenvalues of S are as follows:

λ1 = 0 λ2 = −1.38 λ3 = −2.38 λ4 = −3.62 λ5 = −4.62

and the eigenvectors of S are

You might also like