Exam Schedual
Exam Schedual
g(Z) = a0 + a1 z1 + a2 z2 + a3 z3 + a4 z4 + a5 z5
is a linear dscriminant function in the ‘φ(X)’ space.
W T Xi + b > 0, ∀i s.t. yi = +1
T
W Xi + b < 0, ∀i s.t. yi = −1
(Note both inequalities are strict)
W T Xi + b > 0, ∀i s.t. yi = +1
T
W Xi + b < 0, ∀i s.t. yi = −1
(Note both inequalities are strict)
• W T X + b = 0 – A separating hyperplane.
W T Xi + b > 0, ∀i s.t. yi = +1
T
W Xi + b < 0, ∀i s.t. yi = −1
(Note both inequalities are strict)
• W T X + b = 0 – A separating hyperplane.
• Infinitely many separating hyperplanes exist.
PR NPTEL course – p.23/119
A good separating hyperplane
W T Xi + b > 0, ∀i s.t. yi = +1
W T Xi + b < 0, ∀i s.t. yi = −1
W T Xi + b > 0, ∀i s.t. yi = +1
W T Xi + b < 0, ∀i s.t. yi = −1
• Since the training set is finite, ∃ǫ > 0 s.t.
T
W Xi + b ≥ ǫ, ∀i s.t. yi = +1
W T Xi + b ≤ −ǫ, ∀i s.t. yi = −1
W T Xi + b ≥ +1 if yi = +1
W T Xi + b ≤ −1 if yi = −1
W T Xi + b ≥ +1 if yi = +1
W T Xi + b ≤ −1 if yi = −1
or, equivalently
yi (W T Xi + b) ≥ 1, ∀i.
(Recall that yi ∈ {+1, −1})
yi (W T Xi + b) ≥ 1, ∀i.
yi (W T Xi + b) ≥ 1, ∀i.
• Then there are no training patterns between the two
parallel hyperplanes
W T X + b = +1
and
W T X + b = −1
.
PR NPTEL course – p.31/119
Optimal hyperplane
2
• Distance between these two hyperplanes is: ||W ||
.
Called margin of the separating hyperplane.
2
• Distance between these two hyperplanes is: ||W ||
.
Called margin of the separating hyperplane.
• Hence distance between the hyperplane and the
closest pattern is ||W1 || .
2
• Distance between these two hyperplanes is: ||W ||
.
Called margin of the separating hyperplane.
• Hence distance between the hyperplane and the
closest pattern is ||W1 || .
• Intuitively, more the margin, better is the chance of
correct classification of new patterns.
2
• Distance between these two hyperplanes is: ||W ||
.
Called margin of the separating hyperplane.
• Hence distance between the hyperplane and the
closest pattern is ||W1 || .
• Intuitively, more the margin, better is the chance of
correct classification of new patterns.
• Optimal Hyperplane – separating hyperplane with
maximum margin.
minimize f (x)
subject to aTj x + bj ≤ 0, j = 1, . . . , r
where f : ℜm → ℜ is a continuously differentiable
function, and
aj ∈ ℜm , bj ∈ ℜ, j = 1, · · · , r.
minimize f (x)
subject to aTj x + bj ≤ 0, j = 1, . . . , r
where f : ℜm → ℜ is a continuously differentiable
function, and
aj ∈ ℜm , bj ∈ ℜ, j = 1, · · · , r.
• A point, x ∈ ℜm , is called a feasible point (for this
problem) if aTj x + bj ≤ 0, j = 1, · · · , r.
minimize f (x)
subject to aTj x + bj ≤ 0, j = 1, . . . , r
where f : ℜm → ℜ is a continuously differentiable
convex function, and
aj ∈ ℜm , bj ∈ ℜ, j = 1, · · · , r.
minimize f (x)
subject to aTj x + bj ≤ 0, j = 1, . . . , r
where f : ℜm → ℜ is a continuously differentiable
convex function, and
aj ∈ ℜm , bj ∈ ℜ, j = 1, · · · , r.
• This is known as the primal problem.
minimize f (x)
subject to aTj x + bj ≤ 0, j = 1, . . . , r
where f : ℜm → ℜ is a continuously differentiable
convex function, and
aj ∈ ℜm , bj ∈ ℜ, j = 1, · · · , r.
• This is known as the primal problem.
• Here the optimization variables are x ∈ ℜm .
Here, x ∈ ℜm and µ ∈ ℜr .
Here, x ∈ ℜm and µ ∈ ℜr .
• Define the dual function, q : ℜr → [−∞, ∞) by
Here, x ∈ ℜm and µ ∈ ℜr .
• Define the dual function, q : ℜr → [−∞, ∞) by
maximize q(µ)
subject to µj ≥ 0, j = 1, . . . , r
maximize q(µ)
subject to µj ≥ 0, j = 1, . . . , r
• This is also a constrained optimization problem.
maximize q(µ)
subject to µj ≥ 0, j = 1, . . . , r
• This is also a constrained optimization problem.
• Here the optimization is over ℜr and µ ∈ ℜr are the
optimization variables.
maximize q(µ)
subject to µj ≥ 0, j = 1, . . . , r
• This is also a constrained optimization problem.
• Here the optimization is over ℜr and µ ∈ ℜr are the
optimization variables.
• There is a nice connection between the primal and
dual problems.
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
Implies Xi is closest to separating hyperplane.
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
Implies Xi is closest to separating hyperplane.
• {Xi | i ∈ S} are called Support vectors.
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
Implies Xi is closest to separating hyperplane.
• {Xi | i ∈ S} are called Support vectors. We have
W = i µi yi Xi = i∈S µ∗i yi Xi
∗
P ∗ P
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
Implies Xi is closest to separating hyperplane.
• {Xi | i ∈ S} are called Support vectors. We have
W = i µi yi Xi = i∈S µ∗i yi Xi
∗
P ∗ P
i∈S ⇒ yi (XiT W ∗ + b∗ ) = 1
Implies Xi is closest to separating hyperplane.
• {Xi | i ∈ S} are called Support vectors. We have
W = i µi yi Xi = i∈S µ∗i yi Xi
∗
P ∗ P