Vertopal.com Lab 2 SVM
Vertopal.com Lab 2 SVM
Classifier (LS-SVC)
1.1 Mathematical Foundations
The Least Squares Support Vector Classifier (LS-SVC) seeks to determine a linear separator by
minimizing a regularized squared error cost. Its primal form is expressed as:
( )
n
1
min ∥w ∥ 2+C ∑ ξ 2i
w , b ,ξ 2 i=1
subject to:
where:
d
• w ∈ R is the weight vector,
• b ∈ R is the bias,
• C> 0 is a penalty parameter controlling the trade-off between the margin and error.
( )
n n
1
L ( w , b , ξ , λ )=
2 i=1 i=1
[
∥w ∥ 2+C ∑ ξ 2i − ∑ λ i y i ( wT x i +b ) − 1+ξ i)
• For w :
n n
∂L
=w − ∑ λ i y i x i=0 ⇒ w=∑ λi y i x i
∂w i=1 i =1
• For b :
n n
∂L
=− ∑ λi y i=0 ⇒ ∑ λi y i=0
∂b i=1 i=1
• For ξ i:
∂L λi
=C ξi − λ i=0 ⇒ ξi =
∂ ξi C
Substituting back, we get the dual optimization problem:
( )
n n n n
1 1
L ( w , b , ξ , λ )=∥ w ∥2 − ∥w ∥ 2+ ∑ λi y i wT x i − b ∑ λi y i +∑ C ξ i − λi ξ i +∑ λi
2 i=1 i=1 i=1 2 i=1
( ) (∑ ) (∑ ) (
T
)
n n n n n
1 1 1
L ( w , b , ξ , λ ) = w − ∑ λi y i x i w −T
λi y i x i λi y i x i + ∑ λi − λi λ i − ∑ λi
i=1 2 i=1 i=1 i=1 2 C i=1
n n n n
1 1
L ( w , b , ξ , λ )=− ∑ ∑ λi λ j y i y j x Ti x j +∑ λi − ∑ λ2i
2 i=1 j=1 i=1 2C i=1
n n n n
1 1
max ∑ λi − ∑ ∑ λi λ j yi y j x Ti x j − ∑ λ 2i
λ i=1 2 i=1 j=1 2C i=1
subject to:
n
∑ λi y i=0
i=1
1.2 Implementation
Primal Form Implementation
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_array,
check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import pairwise_kernels
from cvxopt import matrix, solvers
from sklearn.preprocessing import StandardScaler
# Ensure y is in {-1, 1}
y = y.astype(float).ravel()
assert set(y) == {-1.0, 1.0}, "Labels must be -1 or 1"
q = np.zeros(n_features + 1 + n_samples)
q[-n_samples:] = self.C
solvers.options['show_progress'] = False
sol = solvers.qp(P, q, Aeq, beq)
sol_x = np.array(sol['x']).flatten()
self.coef_ = sol_x[:n_features]
self.intercept_ = sol_x[n_features]
self.xi_ = sol_x[-n_samples:]
return self
return self
Visualization
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
# Shift points away from the boundary, but keep some points close to
the other class
# For most points, shift as before
X[:, 0] = np.where(y > 0, X[:, 0] + 1, X[:, 0] - 1)
# Pick 10 random points in positive class and move them slightly left
(closer to negative)
np.random.seed(0)
close_pos = np.random.choice(pos_idx, 4, replace=False)
X[close_pos, 0] -= 1.5 # move closer to boundary
# Pick 10 random points in negative class and move them slightly right
(closer to positive)
close_neg = np.random.choice(neg_idx, 4, replace=False)
X[close_neg, 0] += 1.5 # move closer to boundary
LS_SVC_Primal(C=1)
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")
Accuracy: 0.56
# Train Dual LS-SVM
model =LS_SVC_Dual( C=1)
model.fit(X, y)
LS_SVC_Dual(C=1)
y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")
Accuracy: 0.96
1.3 Questions
1. How does the regularization parameter C influence LS-SVC?
The parameter C balances the emphasis between maximizing the margin and
reducing classification errors. A high value of C strongly penalizes
misclassifications, resulting in a tighter margin and a tendency to overfit the
training data. On the other hand, a lower C value permits a larger margin but might
increase the risk of underfitting.
( )
n
1
min ∥w ∥ 2+ b2 +C ∑ ξ 2i
w , b ,ξ 2 i=1
where:
( )
n n
1
L ( w , b , ξ , λ )=
2 i=1 i =1
[
∥w ∥ 2+b 2+ C ∑ ξ 2i − ∑ λi y i ( w T xi +b ) −1+ξ i )
Stationarity Conditions
To find the saddle point, we compute the partial derivatives and set them to zero:
• Derivative w.r.t. ( w ):
n n
∂L
=w − ∑ λ i y i x i=0 ⇒ w=∑ λi y i x i
∂w i=1 i =1
• Derivative w.r.t. ( b ):
n n
∂L
=b − ∑ λ i y i=0 ⇒ b=∑ λi y i
∂b i=1 i=1
• Derivative w.r.t. ξ i:
∂L λi
=C ξi − λ i=0 ⇒ ξi =
∂ ξi C
Dual Problem
Substitute the expressions for ( w ), ( b ), and ξ i into the Lagrangian to obtain the dual
forlambdalation. After simplification, the dual becomes:
( ) ( )
n n n n
1 b 1
L ( w , b , ξ , λ )=∥ w ∥ − ∥w ∥ − ∑ λi y i w x i+ b − ∑ λ i y i + ∑ C ξi − λ i ξ i +∑ λi
2 2 T
2 i=1 2 i=1 i=1 2 i=1
( ) ( )( )
T
( )
n n n n n
1 b2 1 1
L ( w , b , ξ , λ ) = w − ∑ λi y i x i w − T
∑ λi y i x i ∑ λi y i x i − + ∑ λ i − λi λi + ∑ λi
i=1 2 i=1 i=1 2 i=1 2 C i=1
( )
n n n n n 2
1 1 1
L ( w , b , ξ , λ )=− ∑ ∑
2 i=1 j=1
λi λ j y i y j x Ti x j +∑ λi − ∑
2C i=1
λ2i −
2
∑ λi y i
i=1 i=1
n n n n
1 1
max ∑ λi − ∑ ∑ λi λ j yi y j ( x Ti x j +1 ) − ∑ λ2i
λ i=1 2 i=1 j=1 2 C i=1
subject to:
n
λ∈ R
2.2 Implementation
Primal Form Implementation
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_array,
check_is_fitted
from sklearn.utils.multiclass import unique_labels
import numpy as np
# Solve
sol = solvers.qp(P, q, A, b)
return self
return self
P_SVC_Primal(C=10.0)
model = P_SVC_Dual(C=10.0)
model.fit(X, y)
P_SVC_Dual(C=10.0)
y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")
Accuracy: 1.0
2.3 Questions
1. How does P-SVC differ from LS-SVC in terms of the slack variables ξ i?
Both P-SVC and LS-SVC include slack variables ξ i squared in the objective function,
resulting in a quadratic penalty for misclassification. However, a key difference is
that P-SVC includes the bias term b in the regularization term — that is, the
2 2 2
objective minimizes ∥w ∥ +b +C ∑ ξi — whereas LS-SVC typically does not penalize
b . This impacts the optimization formulation and can influence the geometry of the
decision boundary.
The parameter C controls the trade-off between margin size and classification error.
A larger value of C places more emphasis on minimizing the training error, often
resulting in more support vectors and a tighter fit to the training data. In contrast,
a smaller C value allows more margin violations, usually producing fewer support
vectors and a wider margin, which may improve generalization on noisy data.
y i ( w T x i+ b ) ≥ ρ− ξ i , i=1 , … , n
where:
Dual Formulation
The dual form of $ \nu $-SVC can be derived using the method of Lagrange multipliers. The
Lagrangian is:
n n n
1 1
n i=1 i=1
[
L ( w , b , ξ , ρ , λ , β )= ∥w ∥ 2 − ν ρ+ ∑ ξ i − ∑ λ i y i ( wT x i +b ) − ρ+ξ i ) − ∑ β i ξ i
2 i=1
Taking the derivatives with respect to $ w $, $ b $, $ \xi_i $, and $ \rho $, and setting them to
zero, we get:
n
w=∑ λ i y i x i
i=1
∑ λi y i=0
i=1
1
λ i+ β i=
n
n
∑ λi=ν
i=1
Substituting these back into the Lagrangian, we obtain the dual problem:
n n
1
max −
λ
∑ ∑ λ λ y y xT x
2 i=1 j=1 i j i j i j
subject to:
n
∑ λi y i=0
i=1
1
0 ≤ λi ≤ ,i=1 , … , n
n
n
∑ λi=ν
i=1
3.2 Implementation
Primal Form Implementation
class Nu_SVC_Primal(BaseEstimator, ClassifierMixin):
def __init__(self, nu=0.5, reg=1e-6):
self.nu = nu
self.reg = reg # Regularization term
if sol['status'] != 'optimal':
raise ValueError("Optimization failed")
self.coef_ = np.array(sol['x'][:n_features])
self.intercept_ = sol['x'][n_features]
self.xi_ = np.array(sol['x'][-n_samples-1:-1])
self.rho_ = sol['x'][-1]
return self
Parameters:
-----------
X : array-like of shape (n_samples, n_features)
Training data.
Returns:
--------
self : object
Fitted estimator.
"""
X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels to -1 and 1
if sol['status'] != 'optimal':
raise ValueError("Optimization failed")
return self
Parameters:
-----------
X : array-like of shape (n_samples, n_features)
Test data.
Returns:
--------
y_pred : array-like of shape (n_samples,)
Predicted class labels.
"""
check_is_fitted(self)
X = check_array(X)
Nu_SVC_Primal(reg=1000000.0)
y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")
Accuracy: 0.4952
y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")
Accuracy: 0.4
3.3 Questions
1. What is the role of the parameter ν in ν -SVC?
The parameter ν controls the balance between margin size and classification error. It sets an
upper bound on the fraction of margin errors (misclassifications) and a lower bound on the
fraction of support vectors. A smaller ν generally leads to a larger margin and fewer support
vectors, while a larger ν results in a smaller margin and more support vectors.
1. How does ν control the trade-off between margin size and classification error?
ν directly affects both the margin size and the number of support vectors. A smaller ν allows for
a wider margin, which may increase training error but improve generalization. Conversely, a
larger ν reduces the margin, potentially lowering training error but risking overfitting.
Explanation:
• Classification Accuracy: All methods achieve high accuracy, but ν -SVC and P-SVC
often generalize better.
• Number of Support Vectors: ν -SVC typically has fewer support vectors because it
explicitly controls this number.
• Robustness to Outliers: P-SVC and ν -SVC are more robust due to their
regularization.
4.2 Discussion
Strengths and Weaknesses:
• LS-SVC:
– Strengths: Simple to implement and works well on linearly separable data.
• Imbalanced Data: ν -SVC is suitable as it allows control over the fraction of support
vectors.