0% found this document useful (0 votes)

19 views72 pages

8 SVMs

Uploaded by

Mehroz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views72 pages

8 SVMs

Uploaded by

Mehroz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

IPVS – Institute for Parallel and Distributed Systems

Analytic Computing

8 Support Vector Machines

Prof. Dr. Steffen Staab

https://www.ipvs.uni-stuttgart.de/departments/ac/
• based on slides by
• Thomas Gottron, U. Koblenz-Landau,
https://west.uni-koblenz.de/de/studying/courses/ws1718/machine-learning-and-data-mining-1

• Andrew Zisserman, http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf

25.06.22 2
1 Perceptron Algorithm
Binary classification

$
• Given training data 𝑥! , 𝑦! !"# with 𝑥! ∈ ℝ% and 𝑦! ∈ −1,1 ,

• Learn a classifier

( > 0, if 𝑦! = +1
𝑓 𝑥! = +
< 0, if 𝑦! = −1

• Correct classification:
𝑓( 𝑥! 𝑦! > 0
25.06.22 4
Linear separability

(Zisserman 2015)

25.06.22 5
Linear separability

(Zisserman 2015)

25.06.22 6
Linear classifiers Related to, but different
from chapter „From linear
regression to classification“
• A linear classifier has the form in 4-LogisticRegression

𝑓( 𝑥 = 𝑤 & 𝑥 + 𝑏
!

• In 2D the discriminant is a line

• 𝑤 is the normal to the line,
! !
and b the bias
• 𝑤 is known as the weight vector
(Zisserman 2015)

25.06.22 7
Linear classifiers

• Let‘s assume 𝒙𝒊 = 𝟏, 𝒙𝒊,𝟏 , … , 𝒙𝒊,𝒅

(as in linear regression)

• The we write
!

𝑓( 𝑥 = 𝑤 & 𝑥

! !

(Zisserman 2015)
x0=1
25.06.22 8
Linear classifiers

• A linear classifier has the form

𝑓( 𝑥 = 𝑤 & 𝑥
!
• In 3D the discriminant is a plane
• In n-dim the discriminant is a
hyperplane
• Only 𝑤 (including 𝑏) are needed to
classify new data
(Zisserman 2015)

25.06.22 9
The perceptron classifier

$
• Given training data 𝑥! , 𝑦! !"# with 𝑥! ∈ ℝ% and 𝑦! ∈ −1,1 ,
• how to find a weight vector w, the separating hyperplane,
such that the two categories are separated for the dataset?

• Perceptron algorithm
1. Initialize 𝑤 = 0#
2. While there is 𝑖 such that 𝑓& 𝑥! 𝑦! < 0 do

• 𝑤 ≔ 𝑤 − 𝛼𝑥! sign 𝑓4 𝑥! = 𝑤 + 𝛼𝑥! 𝑦! (not for sign(...)=0)

25.06.22 10
Example for perceptron algorithm in 2D

1. Initialize 𝑤 = 0#
2. While there is 𝑖 such that 𝑓& 𝑥! 𝑦! < 0 do
• 𝑤 ≔ 𝑤 − 𝛼𝑥! sign(𝑓& 𝑥! )

At convergence: 𝑤 = ∑$
!"# 𝛼! 𝑥! (Zisserman 2015)
Example

• if the data is linearly separable,

then the algorithm will converge
• convergence can be slow …
• separating line close to training data

(Zisserman 2015)

25.06.22 12
2 Support Vectors
What is the best 𝒘?

• Idea: maximum margin solution is most stable under

perturbations of the inputs

(Zisserman 2015)

25.06.22 16
Support Vector Machine

!
𝑓" 𝑥 = & 𝛼! 𝑦! x"# 𝑥 + 𝑏
! (Zisserman 2015)
25.06.22 17
SVM Optimization Problem (1)

• Distance 𝛿 𝑥! , ℎ of data point 𝑥! from hyperplane ℎ

"$ # $% %$ &'
𝛿 𝑥! , ℎ = $

𝛿 𝑥! , ℎ 25.06.22 18
SVM Optimization Problem (1)

• In general, length of vector w does not matter

• fix w such that support vectors 𝑥( make 𝑦( 8 𝑤 ) 𝑥( + 𝑏 = 1
• Then positive and negative
support vectors have 1
distance 𝛿 𝑥( , ℎ =
* 𝑤
|$|
from hyperplane –
which we want
to maximize
• Standard formulation:
$
Minimize
,
(inverse margin)
𝛿 𝑥! , ℎ 25.06.22 19
Support Vector Machine

(Zisserman 2015)

25.06.22 21
SVM Optimization Problem (2)

2
max
$ ||𝑤||

subject to
𝑦! 𝑤 ) 𝑥! + 𝑏 ≥ 1, for 𝑖 = 1 … 𝑁
Or equivalently
min 𝑤 ,
$
subject to the same constraints
This is a quadratic optimization problem
subject to linear constraints and there is a unique minimum

25.06.22 22
Compare the two optimization criteria for classfication of linearly separable data
by linear regression with classification by SVM
Linear classification
Linear classification
using SVM:
using regression:
decision line maximizes
decision line is average
margins between support
between regression lines;
vectors; far away data
all data points are
points are irrelevant
considered

25.06.22 24
3 Soft Margin and
Hinge Loss
Re-visiting linear separability

• Points can be linearly separated,

but with very narrow margin

(Zisserman 2015)

25.06.22 26
Re-visiting linear separability

• Points can be linearly separated,

but with very narrow margin

• Possibly the large margin solution is better,

even though one constraint is violated

Trade-off between the margin and the

number of mistakes on training data (Zisserman 2015)

25.06.22 27
Introduce „slack“ variables

(Zisserman 2015)
Soft margin solution
Revised optimization problem
O
K
min 𝑤 + 𝐶 ' 𝜉L
G∈ℝ ,J" ∈ℝ#
!
LMN
subject to
𝑦! 𝑤 " 𝑥! + 𝑏 ≥ 1 − 𝜉! , for 𝑖 = 1 … 𝑁
• Every constraint can be satisfied if 𝜉! is sufficiently large
• 𝐶 is a regularization parameter:
- small 𝐶 allows constraints to be easily ignored ⟹ large margin
- large 𝐶 makes constraints hard to ignore ⟹ narrow margin
- 𝐶 = ∞ enforces all constraints ⟹ hard margin
• Still a quadratic optimization problem with unique minimum
• One hyperparameter 𝐶 25.06.22 29
Loss function

• Given constraints:
𝑦! 𝑤 ) 𝑥! + 𝑏 ≥ 1 − 𝜉!
𝜉! ≥ 0
• We can rewrite 𝜉! as:
𝜉! = max(0,1 − 𝑦! 𝑓& 𝑥! )

• Hence, we can optimize the unconstrained optimization problem over 𝑤:

0
1
min& 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

regularization loss function

25.06.22 30
Loss function
, 0
1
min 𝑤 + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ& ,2$ ∈ℝ' 𝐶
!/*

(Zisserman 2015) 25.06.22 31

Hinge loss

(Zisserman 2015)

25.06.22 32
4 Gradient descent over
convex function
Gradient descent/ascent

https://en.wikipedia.org/wiki/Gradient_descent#/media/File:Gradient_descent.svg
Climb down a hill
Climb up a hill

Given differentiable function

describing height of hill at position
𝒙 = (𝑥(, … , 𝑥) ) height of hill 𝑓 𝒙 .

How to climb up/down fastest?

Go in direction where
𝑑𝑓(𝒙)
= 𝛻𝒙 𝑓(𝒙)
𝑑𝒙
is maximal/minimal
In general: challenge can be difficult

𝑥! 𝑥"
Gradient Descent (- but without posts)

https://goo.gl/images/JKN6zm
Optimization continued

Questions
• Does this cost function have a unique solution?

25.06.22 38
Optimization continued

(Zisserman 2015)
Questions
• Does this cost function have a unique solution?
• Do we find it using gradient descent?
Does the solution we find using gradient descent depend on the starting point?
To the rescue:
• If the cost function is convex, then a locally optimal point is globally optimal (provided
the optimization is over constraints that form a convex set – given in our case)
25.06.22 39
Convex functions

(Zisserman 2015)
25.06.22 40
Convex function examples

• A non-negative sum of convex functions is convex (Zisserman 2015)

25.06.22 41
Applied to hinge loss and regularization

25.06.22 42
Gradient descent algorithm for SVM
To minimize a cost function 𝒞(𝑤) use the iterative update
𝑤#$% ≔ 𝑤# − 𝜂# ∇& 𝒞(w' )
where 𝜂 is the learning rate.

(
Let‘s rewrite the minimization problem as an average with 𝜆 = ):
+
1 1
𝒞 𝑤 = 𝑤 ( + > max(0,1 − 𝑦! 𝑓C 𝑥! ) =
𝑁𝐶 𝑁
!*%
+
1 𝜆
= > 𝑤 ( + max(0,1 − 𝑦! 𝑓C 𝑥! )
𝑁 2
!*%

and 𝑓C 𝑥! = 𝑤 " 𝑥 + 𝑏
25.06.22 43
Sub-gradient for hinge loss

ℒ 𝑥! , 𝑦! ; 𝑤 = max(0,1 − 𝑦! 𝑓& 𝑥! ), 𝑓& 𝑥! = 𝑤 ) 𝑥! + 𝑏

(Zisserman 2015)
!
25.06.22 44
Sub-gradient descent algorithm for SVM
/
1 𝜆 0
𝒞 𝑤 = ) 𝑤 + ℒ(𝑥- , 𝑦- ; 𝑤)
𝑁 2
-.(

The iterative update is

𝑤12( ≔ 𝑤1 − 𝜂∇3 𝒞 w4 ≔
/
1
≔ 𝑤1 − 𝜂 ) 𝜆𝑤1 + ∇3 ℒ(𝑥- , 𝑦- ; 𝑤)
𝑁
-.(

Then each iteration t involves cycling through the training data with the updates:

𝑤 − 𝜂 𝜆𝑤1 − 𝑦- 𝑥- , if 𝑦- 𝑓< 𝑥- < 1

𝑤12( ≔9 1
𝑤1 − 𝜂𝜆𝑤1 , otherwise
(
Typical learning rate in Pegasos: 𝜂1 =
51

25.06.22 45
?
Questions 4:
Gradient descent

Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 25.06.22 46

5 The dual problem
Primal vs dual problem
• SVM is a linear classifier: 𝑓& 𝑥 = 𝑤 ) 𝑥 + 𝑏
• The primal problem: an optimization problem over w:
0
1
min& 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

• The dual problem: Getting rid of the 𝑤 for a slightly different

representation of 𝑓& 𝑥 leads to the following representation
0
𝑓& 𝑥 = R 𝛼! 𝑦! 𝑥!) 𝑥 + 𝑏
!/*
and a new optimization problem with the same solution, but several
advantages. Let us show this on following slides... 25.06.22 49
Revisit Optimization Problem for Hard Margin Case
• Minimize the quadratic form
𝑤 , 𝑤)𝑤
=
2 2
• With constraints
𝑦! 8 𝑤 ) 𝑥! + 𝑏 ≥ 1 ∀𝑖

• The constraints will reach a value of 1 for at least one instance.

• Include hard constraints into the loss function:

, 0
𝑤
ℒ(𝑤, 𝑏, 𝛼) = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1
2
!/*
• failed constraints “punish“ the objective function 25.06.22 50
Excursion: Lagrange Multiplier

• We want to maximize a function 𝑓 𝑥

under the constraints 𝑔 𝑥 = 𝑎

Solution with Lagrange Multiplier

• Optimize the Lagrangian
𝑓 𝑥 −𝜆 𝑔 𝑥 −𝑎
instead!
Nicely visual explanation of Lagrange optimization at
https://www.svm-tutorial.com/2016/09/duality-lagrange-multipliers/
25.06.22 51
Algorithm for optimization with a Lagrange multiplier

1. Write down the Lagrangian 𝑓 𝑥 − 𝜆 ⋅ 𝑔 𝑥 − 𝑎

2. Take derivative of Lagrangian wrt x,
set it to 0
to find estimate of x that depends on 𝜆
3. Plug your estimate of x in the Lagrangian,
take the derivative wrt 𝜆,
and set it to 0,
to find the optimal value for the lagrange multiplier 𝜆
4. Plug in the Lagrange multiplier in your estimate for x

25.06.22 52
Revisit Optimization Problem for Hard Margin Case
• Minimize the quadratic form
𝑤 , 𝑤)𝑤
=
2 2
• With constraints
𝑦! 8 𝑤 ) 𝑥! + 𝑏 ≥ 1 ∀𝑖

• The constraints will reach a value of 1 for at least one instance.

• Include hard constraints into the loss function:

, 0 𝛼- are the
𝑤
ℒ(𝑤, 𝑏, 𝛼) = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1 Lagrange
2 multipliers
!/*
• failed constraints “punish“ the objective function 25.06.22 53
Lagrangian primal problem
• Lagrangian primal problem is:
min max ℒ(𝑤, 𝑏, 𝛼)
$,' 3

subject to ∀𝑖: 𝛼! ≥ 0

25.06.22 54
Finding the optimum

• Loss is a function of 𝑤, 𝑏, and 𝛼

, 0
𝑤
ℒ 𝑤, 𝑏, 𝛼 = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1
2
!/*

𝑤 is a linear
• Find optimum using derivatives: combination of the
data instances!
0
𝜕
ℒ 𝑤, 𝑏, 𝛼 = 0 ⟹ 0 = R 𝛼! 𝑦!
𝜕𝑏
!/*
0 0
𝜕
ℒ 𝑤, 𝑏, 𝛼 = 0 ⟹ 𝑤( = R 𝛼! 𝑦! 𝑥!,( ⟹ 𝑤 = R 𝛼! 𝑦! 𝑥!
𝜕𝑤(
!/* !/* 25.06.22 55
Substitution into ℒ(𝑤, 𝑏, 𝛼)
, 0
𝑤
ℒ 𝑤, 𝑏, 𝛼 |$/∑8 = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1 =
$67 3$ "$ %$ 2
!/*

) 6
0 0 0 0
1
= R 𝛼( 𝑦( 𝑥( R 𝛼5 𝑦5 𝑥5 − R 𝛼! 𝑦! 8 R 𝛼( 𝑦( 𝑥( 𝑥! + 𝑏 − 1 =
2
(/* 5/* !/* (/*

0 0
1
= R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5 − R 𝛼! 𝛼( 𝑦! 𝑦( 𝑥!) 𝑥( − 𝑏 R 𝛼! 𝑦! + R 𝛼! =
2
(,5 !,( !/* !/*

= =0
0
1
= ℒ(𝛼) = R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
2 25.06.22 56
!/* (,5
Wolfe dual problem

0
1
max R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
3 2
!/* (,5

subject to ∀𝑖: 𝛼! ≥ 0, and 0 = ∑0

!/* 𝛼! 𝑦!

• This problem is solvable with quadratic programming, because it fulfills

the Karush-Kuhn-Tucker conditions on 𝛼! that handle inequality
constraints (≥1) in the Lagrange optimization (not given here!).
• It gives us the classification function:
𝛼! is positive if
8 𝑥! is a support
vector
𝑓! 𝑥 = % 𝛼5 ' 𝑦5 ' 𝑥59 𝑥 + 𝑏
567
25.06.22 57
Non-separable Case (similar as before)
• Introduce (positive) „slack variables“ 𝜉! to allow deviations from the minimum
distance:

𝑦! 𝑤 " 𝑥! + 𝑏 ≥ 1 − 𝜉!

• Include a penalizing term in the optimization function:

% &

𝐶 > 𝜉!
!#$

• Transform to Lagrangian
• with additional Lagrange multipliers for the slack variables being
constrained to positive values ...

25.06.22 58
Summary: Primal and dual formulations

• Primal version of classifier

𝑓& 𝑥 = 𝑤 ) 𝑥 + 𝑏
• Dual version of classifier
0
𝑓& 𝑥 = R 𝛼! 8 𝑦! 8 𝑥!) 𝑥 + 𝑏
!/*

The dual form classifier seems to work like a kNN classifier, it requires
the training data points 𝑥! . However, many of the 𝛼! are zero.
The ones that are non-zero define the support vectors 𝑥! .

25.06.22 59
Summary: Primal and dual formulations
• Lagrangian primal problem is:
min max ℒ(𝑤, 𝑏, 𝛼)
$,' 3

subject to ∀𝑖: 𝛼! ≥ 0

• Lagrangian dual problem is:

0
1
max R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
3 2
!/* (,5

subject to ∀𝑖: 𝛼! ≥ 0, and 0 = ∑0

!/* 𝛼! 𝑦!

25.06.22 60
6 Kernelization Tricks
in SVMs
Non-linear Case

• Not all classes can

be separated via a
hyperplane
• Essential:
• Dual representation uses
only the product of data
instances:
+

𝑓C 𝑥 = > 𝛼! G 𝑦! G 𝑥!" 𝑥 + 𝑏
!*%
• 𝑥- : i-th training instance
• 𝛼- : weight for i-th training instance

• Same for the Lagrangian... 25.06.22 62

Feature engineering using 𝝓(𝒙) Cf. lecture on
regression.
Chapter “beyond
• Classifier: Given 𝑥! ∈ ℝ7 , 𝜙: ℝ7 → ℝ8 , 𝑤 ∈ ℝ8 linear input”
𝑓& 𝑥 = 𝑤 ) 𝜙 𝑥 + 𝑏
• Learning:
0
1
min9 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

25.06.22 63
Example 1: From 1-dim to 2-dim

25.06.22 64
Example 2: From 2-dim to 3-dim

(Zisserman 2015)

25.06.22 65
Feature engineering using 𝝓(𝒙)

• Classifier: Given 𝑥! ∈ ℝ7 , 𝜙: ℝ7 → ℝ8 , 𝑤 ∈ ℝ8
𝑓& 𝑥 = 𝑤 ) 𝜙 𝑥 + 𝑏
• Learning:
0
1
min9 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

• 𝜙 𝑥 maps to high dimensional space ℝ8 where data is separable

• If 𝐷 ≫ 𝑑 then there are many more parameters to learn for w
25.06.22 66
Dual classifier in transformed feature space
Classifier:
/

𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝑥-# 𝑥 + 𝑏
-.(
/ 𝜙(x)
⟹ 𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝜙 𝑥- #
𝜙(𝑥) + 𝑏 only occurs in pairs
"
-.( 𝜙 𝑥' 𝜙(𝑥! )
Learning:
/ Kernels
1 "
max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝑥!# 𝑥) 𝑘 𝑥' , 𝑥! = 𝜙 𝑥' 𝜙(𝑥! )
: 2
-.( !,)

/
1 #
⟹ max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝜙 𝑥! 𝜙(𝑥) )
: 2
-.( !,)

subject to ∀𝑖: 𝛼- ≥ 0, and 0 = ∑/

-.( 𝛼- 𝑦-
25.06.22 67
Dual classifier using kernels
Classifier:
/

𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝜙 𝑥- #
𝜙(𝑥) + 𝑏
-.(
/

⟹ 𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝑘(𝑥- , 𝑥) + 𝑏
-.(

Learning:

/
1 #
max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝜙 𝑥! 𝜙(𝑥) )
: 2
-.( !,)

/
1
⟹ max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝑘(𝑥! , 𝑥) )
: 2
-.( !,)

subject to ∀𝑖: 𝛼- ≥ 0, and 0 = ∑/

-.( 𝛼- 𝑦-
25.06.22 68
Example kernels
• Linear kernels: 𝑘 𝑥, 𝑥 < = 𝑥 # 𝑥 <

• Polynomial kernels: 𝑘 𝑥, 𝑥 < = 1 + 𝑥 # 𝑥 < = , for any 𝑑 > 0

• Contains all polynomial terms up to degree

$
!"!#
>
• Gaussian kernels: 𝑘 𝑥, 𝑥 < = 𝑒 $%$ , for 𝜎 > 0
• Infinite dimensional feature space
• Also called Radial basis function kernel (RBF)
• often works quite well!

• Graph kernels: random walk

• String kernels: ...
• build your own kernel for your own problem! 25.06.22 69
Summary on kernels

• „Instead of inventing funny non-linear features, we may directly invent

funny kernels“ (Toussaint 2019)
• Inventing a kernel is intuitive:
• 𝑘(𝑥, 𝑥′) expresses how correlated 𝑦 and 𝑦′ should be
• it is a meassure of similarity, it compares 𝑥 and 𝑥′.
• Specifying how ’comparable’ 𝑥 and 𝑥′ are is often more intuitive than
defining “features that might work”.

25.06.22 70
Background reading and more

• Smooth readong about SVMs: Alexandre Kowalczyk,

Support vector machines succinctly. Syncfusion. Free
ebook:
https://www.syncfusion.com/ebooks/support_vector_machines_succinctly
• Also talks about most efficient algorithms to be used for finding
support vectors (it is neither of the two presented here!)
?
Questions 6:
Kernel

Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 25.06.22 72

7 Transductive
Classification
Transductive learning characteristics

Characteristics Use cases

• Training data AND • news recommender
test data known • spam classifier
at learning time
• document reorganization
• Learning happens
specifically for the given
test cases

Thorsten Joachims:
Transductive Inference for Text Classification using Support Vector
Machines. ICML 1999: 200-209
Maximum margin hyperplane

Training data … , 𝑥⃗! , 𝑦! , …

Test data … 𝑥⃗(∗ …
Loss function
; 5
1
∥ 𝑤 ∥, +𝐶 R 𝜉! + 𝐶 ∗ R 𝜉(∗
2
!/: (/:
Naive, intractable approach:
subject to:
• for every hyperplane:
∀;!/* : 𝑦! 𝑤 ⋅ 𝑥⃗! + 𝑏 ≥ 1 − 𝜉!
5
∀(/* : 𝑦(∗ 𝑤 ⋅ 𝑥⃗(∗ + 𝑏 ≥ 1 − 𝜉(∗ • classify 𝑥⃗(∗
∀;!/* : 𝜉! > 0 • compute loss
5
∀(/* : 𝜉(∗ > 0 76
Reuters data set experiments (3299 test documents)

25.06.22 77
Reuters data set experiments (17 training documents)

25.06.22 78
IPVS

Thank you!

Steffen Staab

E-Mail [email protected]
Telefon +49 (0) 711 685-To be defined
www. ipvs.uni-stuttgart.de/departments/ac/

Universität Stuttgart
Analytic Computing, IPVS
Universitätsstraße 32, 50569 Stuttgart

SVM Notes
No ratings yet
SVM Notes
40 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
04SVM
No ratings yet
04SVM
22 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
10 SVM
No ratings yet
10 SVM
77 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Lecture 9 - SVM
No ratings yet
Lecture 9 - SVM
42 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
Berkeley-Tutorial Optimization For Machine Learningpart2
No ratings yet
Berkeley-Tutorial Optimization For Machine Learningpart2
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
315 F19 15 SVM 2
No ratings yet
315 F19 15 SVM 2
35 pages
10 SVM
No ratings yet
10 SVM
23 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
02 Lecturenote GD
No ratings yet
02 Lecturenote GD
10 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machines: Javier B Ejar Cbea
No ratings yet
Support Vector Machines: Javier B Ejar Cbea
44 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
Ds 2
No ratings yet
Ds 2
27 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Lec 3
No ratings yet
Lec 3
22 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Classification: Linear SVM
No ratings yet
Classification: Linear SVM
26 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lec 6 Tutorial
No ratings yet
Lec 6 Tutorial
27 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
3 Classification 2
No ratings yet
3 Classification 2
27 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
CTR - Boq (38714)
No ratings yet
CTR - Boq (38714)
12 pages
Pakistan Telecommunications Company Limited: Office of Assistant Business Manager Kasur Urban
No ratings yet
Pakistan Telecommunications Company Limited: Office of Assistant Business Manager Kasur Urban
1 page
Volunteering SOP
No ratings yet
Volunteering SOP
1 page
Ass1 (Psa)
No ratings yet
Ass1 (Psa)
14 pages
Untitled 2
No ratings yet
Untitled 2
2 pages
Compiled Sheet Tie DCs Project
No ratings yet
Compiled Sheet Tie DCs Project
10 pages
University of Engineering and Technology Taxila Electrical Engineering Department Lab Manual
No ratings yet
University of Engineering and Technology Taxila Electrical Engineering Department Lab Manual
1 page
Inductor Voltage Graph For 5T T 2ms
No ratings yet
Inductor Voltage Graph For 5T T 2ms
1 page
Lamps: 1. Incandescent Lamp 2. Arc Lamp 3. Gaseous Discharge Lamp
No ratings yet
Lamps: 1. Incandescent Lamp 2. Arc Lamp 3. Gaseous Discharge Lamp
8 pages
BUCK Driver: Microprocessor
No ratings yet
BUCK Driver: Microprocessor
9 pages
BUCK Driver: Microprocessor
No ratings yet
BUCK Driver: Microprocessor
9 pages