0% found this document useful (0 votes)
4 views23 pages

Vertopal.com Lab 2 SVM

The document discusses the Least Squares C-Support Vector Classifier (LS-SVC), detailing its mathematical foundations, dual problem derivation, and implementation in both primal and dual forms. It also covers the influence of the regularization parameter C on LS-SVC and introduces Proximal C-Support Vector Classification (P-SVC) with its theoretical derivation and dual problem formulation. Additionally, it includes code examples for training models and visualizing decision boundaries.

Uploaded by

hafssabounia73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views23 pages

Vertopal.com Lab 2 SVM

The document discusses the Least Squares C-Support Vector Classifier (LS-SVC), detailing its mathematical foundations, dual problem derivation, and implementation in both primal and dual forms. It also covers the influence of the regularization parameter C on LS-SVC and introduces Proximal C-Support Vector Classification (P-SVC) with its theoretical derivation and dual problem formulation. Additionally, it includes code examples for training models and visualizing decision boundaries.

Uploaded by

hafssabounia73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Part 1: Least Squares C-Support Vector

Classifier (LS-SVC)
1.1 Mathematical Foundations
The Least Squares Support Vector Classifier (LS-SVC) seeks to determine a linear separator by
minimizing a regularized squared error cost. Its primal form is expressed as:

( )
n
1
min ∥w ∥ 2+C ∑ ξ 2i
w , b ,ξ 2 i=1

subject to:

y i ( w T x i+ b ) =1− ξ i , for i=1 , … , n

where:
d
• w ∈ R is the weight vector,
• b ∈ R is the bias,
• C> 0 is a penalty parameter controlling the trade-off between the margin and error.

Dual Problem Derivation


To derive the dual, we apply the method of Lagrange multipliers. The Lagrangian becomes:

( )
n n
1
L ( w , b , ξ , λ )=
2 i=1 i=1
[
∥w ∥ 2+C ∑ ξ 2i − ∑ λ i y i ( wT x i +b ) − 1+ξ i)

Taking derivatives and setting them to zero gives:

• For w :
n n
∂L
=w − ∑ λ i y i x i=0 ⇒ w=∑ λi y i x i
∂w i=1 i =1

• For b :
n n
∂L
=− ∑ λi y i=0 ⇒ ∑ λi y i=0
∂b i=1 i=1

• For ξ i:

∂L λi
=C ξi − λ i=0 ⇒ ξi =
∂ ξi C
Substituting back, we get the dual optimization problem:

( )
n n n n
1 1
L ( w , b , ξ , λ )=∥ w ∥2 − ∥w ∥ 2+ ∑ λi y i wT x i − b ∑ λi y i +∑ C ξ i − λi ξ i +∑ λi
2 i=1 i=1 i=1 2 i=1

( ) (∑ ) (∑ ) (
T

)
n n n n n
1 1 1
L ( w , b , ξ , λ ) = w − ∑ λi y i x i w −T
λi y i x i λi y i x i + ∑ λi − λi λ i − ∑ λi
i=1 2 i=1 i=1 i=1 2 C i=1
n n n n
1 1
L ( w , b , ξ , λ )=− ∑ ∑ λi λ j y i y j x Ti x j +∑ λi − ∑ λ2i
2 i=1 j=1 i=1 2C i=1
n n n n
1 1
max ∑ λi − ∑ ∑ λi λ j yi y j x Ti x j − ∑ λ 2i
λ i=1 2 i=1 j=1 2C i=1
subject to:
n

∑ λi y i=0
i=1

1.2 Implementation
Primal Form Implementation
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_array,
check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import pairwise_kernels
from cvxopt import matrix, solvers
from sklearn.preprocessing import StandardScaler

class LS_SVC_Primal(BaseEstimator, ClassifierMixin):


def __init__(self, C=0.1):
self.C = C

def fit(self, X, y):


X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)

# Ensure y is in {-1, 1}
y = y.astype(float).ravel()
assert set(y) == {-1.0, 1.0}, "Labels must be -1 or 1"

# Construct the quadratic program


P = np.zeros((n_features + 1 + n_samples, n_features + 1 +
n_samples))
P[:n_features, :n_features] = np.eye(n_features) # w
P[n_features, n_features] = 0 # b
P[-n_samples:, -n_samples:] = self.C * np.eye(n_samples) # xi

q = np.zeros(n_features + 1 + n_samples)
q[-n_samples:] = self.C

Aeq = np.hstack([y[:, np.newaxis] * X, y[:, np.newaxis],


np.eye(n_samples)])
beq = np.ones(n_samples)

# Convert to cvxopt format


P = matrix(P)
q = matrix(q)
Aeq = matrix(Aeq)
beq = matrix(beq)

solvers.options['show_progress'] = False
sol = solvers.qp(P, q, Aeq, beq)

sol_x = np.array(sol['x']).flatten()
self.coef_ = sol_x[:n_features]
self.intercept_ = sol_x[n_features]
self.xi_ = sol_x[-n_samples:]

return self

def predict(self, X):


check_is_fitted(self)
X = check_array(X)
return np.sign(X @ self.coef_ + self.intercept_)

Dual Form Implementation


import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_array,
check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import pairwise_kernels
from cvxopt import matrix, solvers

class LS_SVC_Dual(BaseEstimator, ClassifierMixin):


def __init__(self, C=1.0):
self.C = C

def fit(self, X, y):


X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels to -1 and 1

# Construct the quadratic program


K = pairwise_kernels(X, metric='linear') # Linear kernel
P = matrix(np.outer(y, y) * K) # P = y_i y_j K(x_i, x_j)
q = matrix(-np.ones(n_samples)) # q = -1 vector
A = matrix(y.reshape(1, -1), tc='d') # A = y^T (1 x
n_samples)
b = matrix(0.0) # b = 0 (scalar)
G = matrix(np.vstack([-np.eye(n_samples), np.eye(n_samples)]))
# G = [-I; I]
h = matrix(np.hstack([np.zeros(n_samples), self.C *
np.ones(n_samples)])) # h = [0; C]

# Solve the quadratic program


solvers.options['show_progress'] = False
sol = solvers.qp(P, q, G, h, A, b)

# Extract the solution


self.alpha_ = np.array(sol['x']).flatten() # Lagrange
multipliers
self.support_vectors_ = X # All data points are support
vectors in LS-SVC
self.support_vector_labels_ = y # Labels of support vectors
self.intercept_ = np.mean(y - np.dot(K, self.alpha_ * y)) #
Bias term

return self

def predict(self, X):


check_is_fitted(self)
X = check_array(X)
K = pairwise_kernels(X, self.support_vectors_,
metric='linear') # Linear kernel
return np.sign(np.dot(K, self.alpha_ *
self.support_vector_labels_) + self.intercept_)

Visualization
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

# Generate a linearly separable dataset


np.random.seed(42)
X = np.random.randn(100, 2)
y = np.where(X[:, 0] + X[:, 1] > 0, 1, -1) # Simple linear separation

# Shift points away from the boundary, but keep some points close to
the other class
# For most points, shift as before
X[:, 0] = np.where(y > 0, X[:, 0] + 1, X[:, 0] - 1)

# Add some noise points close to boundary: choose 10 random indices


from each class
pos_idx = np.where(y == 1)[0]
neg_idx = np.where(y == -1)[0]

# Pick 10 random points in positive class and move them slightly left
(closer to negative)
np.random.seed(0)
close_pos = np.random.choice(pos_idx, 4, replace=False)
X[close_pos, 0] -= 1.5 # move closer to boundary

# Pick 10 random points in negative class and move them slightly right
(closer to positive)
close_neg = np.random.choice(neg_idx, 4, replace=False)
X[close_neg, 0] += 1.5 # move closer to boundary

# Custom orange and pink colormap


orange_pink_cmap = ListedColormap(["orange", "hotpink"])

# Plot the dataset


plt.figure(figsize=(6, 6))
plt.scatter(X[:, 0], X[:, 1], c=(y > 0), cmap=orange_pink_cmap, s=60,
edgecolors='k')
plt.title("Linearly Separable Dataset with Some Close Points")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()
# Train Primal LS-SVM
model =LS_SVC_Primal( C=1)
model.fit(X, y)

LS_SVC_Primal(C=1)

# Decision boundary plot


def plot_decision_boundary(model, X, y, title="Decision Boundary",
resolution=0.02):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, resolution),
np.arange(y_min, y_max, resolution))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

cmap = ListedColormap(["orange", "hotpink"])


plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=cmap, alpha=0.4)
plt.scatter(X[:, 0], X[:, 1], c=(y > 0), cmap=cmap, s=60,
edgecolors='k')
plt.title(title)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()

plot_decision_boundary(model, X, y, title="Decision Boundary")

y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")

Accuracy: 0.56
# Train Dual LS-SVM
model =LS_SVC_Dual( C=1)
model.fit(X, y)

LS_SVC_Dual(C=1)

plot_decision_boundary(model, X, y, title="Decision Boundary")

y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")

Accuracy: 0.96

1.3 Questions
1. How does the regularization parameter C influence LS-SVC?

The parameter C balances the emphasis between maximizing the margin and
reducing classification errors. A high value of C strongly penalizes
misclassifications, resulting in a tighter margin and a tendency to overfit the
training data. On the other hand, a lower C value permits a larger margin but might
increase the risk of underfitting.

Part 2: Proximal C-Support Vector Classification


(P-SVC)
2.1 Theoretical Derivation
The primal optimization problem for P-SVC is expressed as:

( )
n
1
min ∥w ∥ 2+ b2 +C ∑ ξ 2i
w , b ,ξ 2 i=1

subject to the equality constraints:

y i ( w T x i+ b ) =1− ξ i , for i=1 , … , n

where:

• $ \xi_i$ are slack variables.

Derivation of the Lagrangian


To handle the constraints, we construct the Lagrangian using Lagrange lambdaltipliers
λi∈ R

for each constraint:

( )
n n
1
L ( w , b , ξ , λ )=
2 i=1 i =1
[
∥w ∥ 2+b 2+ C ∑ ξ 2i − ∑ λi y i ( w T xi +b ) −1+ξ i )

Stationarity Conditions
To find the saddle point, we compute the partial derivatives and set them to zero:

• Derivative w.r.t. ( w ):
n n
∂L
=w − ∑ λ i y i x i=0 ⇒ w=∑ λi y i x i
∂w i=1 i =1

• Derivative w.r.t. ( b ):
n n
∂L
=b − ∑ λ i y i=0 ⇒ b=∑ λi y i
∂b i=1 i=1

• Derivative w.r.t. ξ i:

∂L λi
=C ξi − λ i=0 ⇒ ξi =
∂ ξi C

Dual Problem
Substitute the expressions for ( w ), ( b ), and ξ i into the Lagrangian to obtain the dual
forlambdalation. After simplification, the dual becomes:

( ) ( )
n n n n
1 b 1
L ( w , b , ξ , λ )=∥ w ∥ − ∥w ∥ − ∑ λi y i w x i+ b − ∑ λ i y i + ∑ C ξi − λ i ξ i +∑ λi
2 2 T
2 i=1 2 i=1 i=1 2 i=1

( ) ( )( )
T

( )
n n n n n
1 b2 1 1
L ( w , b , ξ , λ ) = w − ∑ λi y i x i w − T
∑ λi y i x i ∑ λi y i x i − + ∑ λ i − λi λi + ∑ λi
i=1 2 i=1 i=1 2 i=1 2 C i=1

( )
n n n n n 2
1 1 1
L ( w , b , ξ , λ )=− ∑ ∑
2 i=1 j=1
λi λ j y i y j x Ti x j +∑ λi − ∑
2C i=1
λ2i −
2
∑ λi y i
i=1 i=1

n n n n
1 1
max ∑ λi − ∑ ∑ λi λ j yi y j ( x Ti x j +1 ) − ∑ λ2i
λ i=1 2 i=1 j=1 2 C i=1
subject to:
n
λ∈ R

2.2 Implementation
Primal Form Implementation
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_array,
check_is_fitted
from sklearn.utils.multiclass import unique_labels
import numpy as np

class P_SVC_Primal(BaseEstimator, ClassifierMixin):


def __init__(self, C=1.0):
self.C = C

def fit(self, X, y):


X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels from {0,1} to {-1,1}
# Total number of variables: w (n_features), b (1), xi
(n_samples)
total_vars = n_features + 1 + n_samples

# Construct P matrix (quadratic terms)


P = np.zeros((total_vars, total_vars))
P[:n_features, :n_features] = np.eye(n_features) # w^T w
P[n_features, n_features] = 1 # b^2
P[n_features + 1:, n_features + 1:] = self.C *
np.eye(n_samples) # C * sum(xi^2)

# q vector (linear term in objective)


q = np.zeros(total_vars)

# Equality constraints: y_i (w^T x_i + b) + xi_i = 1


A_eq = np.zeros((n_samples, total_vars))
A_eq[:, :n_features] = X * y[:, np.newaxis] # y_i * x_i
A_eq[:, n_features] = y # y_i * b
A_eq[np.arange(n_samples), n_features + 1 +
np.arange(n_samples)] = 1 # xi_i
b_eq = np.ones(n_samples)

# Convert to cvxopt format


from cvxopt import matrix, solvers
solvers.options['show_progress'] = False
P = matrix(P)
q = matrix(q)
A = matrix(A_eq)
b = matrix(b_eq)

# Solve
sol = solvers.qp(P, q, A, b)

# Extract model parameters


sol_np = np.array(sol['x']).flatten()
self.coef_ = sol_np[:n_features]
self.intercept_ = sol_np[n_features]
self.xi_ = sol_np[n_features + 1:]

return self

def predict(self, X):


check_is_fitted(self)
X = check_array(X)
return np.sign(X @ self.coef_ + self.intercept_)
Dual Form Implementation
class P_SVC_Dual(BaseEstimator, ClassifierMixin):
def __init__(self, C=1.0):
self.C = C

def fit(self, X, y):


X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels to -1 and 1

# Construct the quadratic program


K = pairwise_kernels(X, metric='linear')
P = np.outer(y, y) * (K + 1) # Kernel matrix + 1 for the bias
term
q = -np.ones(n_samples)
A = y[np.newaxis, :].astype(float) # Ensure A is of type 'd'
b = np.zeros(1)
G = np.vstack([-np.eye(n_samples), np.eye(n_samples)])
h = np.hstack([np.zeros(n_samples), self.C *
np.ones(n_samples)])

# Convert numpy arrays to cvxopt matrices


from cvxopt import matrix, solvers
solvers.options['show_progress'] = False
P = matrix(P)
q = matrix(q)
A = matrix(A)
b = matrix(b)
G = matrix(G)
h = matrix(h)

# Solve the quadratic program


sol = solvers.qp(P, q, G, h, A, b)

# Extract the solution


self.alpha_ = np.array(sol['x']).flatten()
self.support_vectors_ = X
self.support_vector_labels_ = y
self.intercept_ = np.mean(y - np.dot(K + 1, self.alpha_ * y))

return self

def predict(self, X):


check_is_fitted(self)
X = check_array(X)
K = pairwise_kernels(X, self.support_vectors_,
metric='linear')
return np.sign(np.dot(K + 1, self.alpha_ *
self.support_vector_labels_) + self.intercept_)
Visualization
model = P_SVC_Primal(C=10.0)
model.fit(X, y)

P_SVC_Primal(C=10.0)

plot_decision_boundary(model, X, y, title="P-SVC (Primal) Decision


Boundary")

model = P_SVC_Dual(C=10.0)
model.fit(X, y)

P_SVC_Dual(C=10.0)

plot_decision_boundary(model, X, y, title="P-SVC (Dual) Decision


Boundary")
The P-SVC Dual plot correctly visualizes that in this formulation, almost all (or all) training data
points have non-zero dual variables because we set that λ i> 0 and thus influence the decision
boundary, fitting the definition of "support vectors" as used by the plotting code for this model
type. This is a direct consequence of the P-SVC/LS-SVM formulation using equality constraints
and a quadratic slack penalty, leading to non-sparse dual solutions. The two plots show the
same decision boundary because the primal and dual formulations solve the same optimization
problem, but the visualization of support vectors differs based on which model (primal or dual)
was trained and how the plotting code interprets "support vectors" for that model type.

y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0

2.3 Questions
1. How does P-SVC differ from LS-SVC in terms of the slack variables ξ i?
Both P-SVC and LS-SVC include slack variables ξ i squared in the objective function,
resulting in a quadratic penalty for misclassification. However, a key difference is
that P-SVC includes the bias term b in the regularization term — that is, the
2 2 2
objective minimizes ∥w ∥ +b +C ∑ ξi — whereas LS-SVC typically does not penalize
b . This impacts the optimization formulation and can influence the geometry of the
decision boundary.

2. What is the effect of the parameter C on the number of support vectors?

The parameter C controls the trade-off between margin size and classification error.
A larger value of C places more emphasis on minimizing the training error, often
resulting in more support vectors and a tighter fit to the training data. In contrast,
a smaller C value allows more margin violations, usually producing fewer support
vectors and a wider margin, which may improve generalization on noisy data.

Part 3: $ \nu − S o f t M a r g i n S u p p o r t V e c t o r C l a s s i f i c at io n¿ \nu $-


SVC)
3.1 Theoretical Derivation
The primal optimization problem for $ \nu $-SVC is:
n
1 1
min ∥w ∥ 2 − ν ρ+ ∑ ξ i
2 n i=1
subject to:

y i ( w T x i+ b ) ≥ ρ− ξ i , i=1 , … , n

where:

• $ \xi_i \geq 0 $, $ \rho \geq 0 $, $ w \in \mathbb{R}^d $, $ b \in \mathbb{R} $

Dual Formulation
The dual form of $ \nu $-SVC can be derived using the method of Lagrange multipliers. The
Lagrangian is:
n n n
1 1
n i=1 i=1
[
L ( w , b , ξ , ρ , λ , β )= ∥w ∥ 2 − ν ρ+ ∑ ξ i − ∑ λ i y i ( wT x i +b ) − ρ+ξ i ) − ∑ β i ξ i
2 i=1

Taking the derivatives with respect to $ w $, $ b $, $ \xi_i $, and $ \rho $, and setting them to
zero, we get:
n
w=∑ λ i y i x i
i=1

∑ λi y i=0
i=1

1
λ i+ β i=
n
n

∑ λi=ν
i=1

Substituting these back into the Lagrangian, we obtain the dual problem:
n n
1
max −
λ
∑ ∑ λ λ y y xT x
2 i=1 j=1 i j i j i j
subject to:
n

∑ λi y i=0
i=1

1
0 ≤ λi ≤ ,i=1 , … , n
n
n

∑ λi=ν
i=1

3.2 Implementation
Primal Form Implementation
class Nu_SVC_Primal(BaseEstimator, ClassifierMixin):
def __init__(self, nu=0.5, reg=1e-6):
self.nu = nu
self.reg = reg # Regularization term

def fit(self, X, y):


X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels to -1 and 1

# Scale the data


from sklearn.preprocessing import StandardScaler
self.scaler = StandardScaler()
X_scaled = self.scaler.fit_transform(X)

# Construct the quadratic program


P = np.eye(n_features + 1 + n_samples + 1) + self.reg *
np.eye(n_features + 1 + n_samples + 1)
q = np.zeros(n_features + 1 + n_samples + 1)
q[-n_samples-1] = -self.nu
q[-n_samples:] = 1 / n_samples

A = np.hstack([y[:, np.newaxis] * X_scaled, y[:, np.newaxis],


np.eye(n_samples), -np.ones((n_samples, 1))])
b = np.zeros(n_samples)

# Solve the quadratic program


from cvxopt import matrix, solvers
solvers.options['show_progress'] = False
P = matrix(P)
q = matrix(q)
A = matrix(A)
b = matrix(b)
sol = solvers.qp(P, q, A, b)

if sol['status'] != 'optimal':
raise ValueError("Optimization failed")

self.coef_ = np.array(sol['x'][:n_features])
self.intercept_ = sol['x'][n_features]
self.xi_ = np.array(sol['x'][-n_samples-1:-1])
self.rho_ = sol['x'][-1]

return self

def predict(self, X):


check_is_fitted(self)
X = check_array(X)
X_scaled = self.scaler.transform(X)
return np.sign(X_scaled @ self.coef_ + self.intercept_)

Dual Form Implementation


class Nu_SVC_Dual(BaseEstimator, ClassifierMixin):
def __init__(self, nu=0.5, kernel='linear', reg=1e-6):
"""
Parameters:
-----------
nu : float, default=0.5
An upper bound on the fraction of margin errors and a
lower bound on
the fraction of support vectors. Must be in the range (0,
1).

kernel : str, default='linear'


The kernel function to use. Supported kernels are
'linear', 'poly', 'rbf', etc.

reg : float, default=1e-6


Regularization term to ensure numerical stability.
"""
self.nu = nu
self.kernel = kernel
self.reg = reg

def fit(self, X, y):


"""
Fit the \(\nu\)-SVC model using the dual formulation.

Parameters:
-----------
X : array-like of shape (n_samples, n_features)
Training data.

y : array-like of shape (n_samples,)


Target labels.

Returns:
--------
self : object
Fitted estimator.
"""
X, y = check_X_y(X, y)
n_samples, n_features = X.shape
self.classes_ = unique_labels(y)
y = 2 * y - 1 # Convert labels to -1 and 1

# Compute the kernel matrix


self.X_train_ = X
K = pairwise_kernels(X, metric=self.kernel)

# Construct the quadratic program


P = np.outer(y, y) * K + self.reg * np.eye(n_samples) #
Regularization term
q = -np.ones(n_samples)
A = y.reshape(1, -1) # Ensure A is a 2D matrix
b = np.zeros(1)
G = np.vstack([-np.eye(n_samples), np.eye(n_samples)])
h = np.hstack([np.zeros(n_samples), np.ones(n_samples) /
(self.nu * n_samples)])

# Solve the quadratic program using cvxopt


solvers.options['show_progress'] = False
P = matrix(P, tc='d') # Ensure P is of type 'd'
q = matrix(q, tc='d') # Ensure q is of type 'd'
A = matrix(A, tc='d') # Ensure A is of type 'd'
b = matrix(b, tc='d') # Ensure b is of type 'd'
G = matrix(G, tc='d') # Ensure G is of type 'd'
h = matrix(h, tc='d') # Ensure h is of type 'd'
sol = solvers.qp(P, q, G, h, A, b)

if sol['status'] != 'optimal':
raise ValueError("Optimization failed")

# Extract the solution


self.alpha_ = np.array(sol['x']).flatten()
self.support_vectors_ = X
self.support_vector_labels_ = y

# Compute the intercept


sv_indices = self.alpha_ > 1e-5 # Indices of support vectors
self.intercept_ = np.mean(
y[sv_indices] - np.dot(K[sv_indices], self.alpha_ * y)
)

return self

def predict(self, X):


"""
Predict class labels for samples in X.

Parameters:
-----------
X : array-like of shape (n_samples, n_features)
Test data.

Returns:
--------
y_pred : array-like of shape (n_samples,)
Predicted class labels.
"""
check_is_fitted(self)
X = check_array(X)

# Compute the kernel matrix between test data and support


vectors
K = pairwise_kernels(X, self.support_vectors_,
metric=self.kernel)

# Predict using the dual coefficients


y_pred = np.sign(np.dot(K, self.alpha_ *
self.support_vector_labels_) + self.intercept_)
return (y_pred + 1) // 2 # Convert labels back to 0 and 1
Visualization
# Assuming X, y are defined and your model is trained:
model = Nu_SVC_Primal(nu=0.5, reg=1e6)
model.fit(X, y)

Nu_SVC_Primal(reg=1000000.0)

plot_decision_boundary(model, X, y, title="Nu-SVC (Primal) Decision


Boundary")

y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")

Accuracy: 0.4952

model = Nu_SVC_Dual(nu=0.5, kernel='linear') # Or 'rbf', 'poly'


model.fit(X, y)
Nu_SVC_Dual()

# === Plot ===


plot_decision_boundary(model, X, y, title="Nu-SVC (Dual) Decision
Boundary")

y_pred = model.predict(X)
accuracy = np.mean(y_pred == y)
print(f"Accuracy: {accuracy}")

Accuracy: 0.4

3.3 Questions
1. What is the role of the parameter ν in ν -SVC?

The parameter ν controls the balance between margin size and classification error. It sets an
upper bound on the fraction of margin errors (misclassifications) and a lower bound on the
fraction of support vectors. A smaller ν generally leads to a larger margin and fewer support
vectors, while a larger ν results in a smaller margin and more support vectors.

1. How does ν control the trade-off between margin size and classification error?
ν directly affects both the margin size and the number of support vectors. A smaller ν allows for
a wider margin, which may increase training error but improve generalization. Conversely, a
larger ν reduces the margin, potentially lowering training error but risking overfitting.

Part 4: Comparison and Discussion


4.1 Comparison
We compare the three methods (LS-SVC, P-SVC, and ν -SVC) using the criteria below:

Training Classification Number of Support Robustness to


Method Time Accuracy Vectors Outliers
LS-SVC Moderate High Moderate Moderate
P-SVC Moderate High Moderate High
ν -SVC Fast High Low to Moderate High

Explanation:

• Training Time: ν -SVC is generally faster due to a simpler optimization problem.

• Classification Accuracy: All methods achieve high accuracy, but ν -SVC and P-SVC
often generalize better.

• Number of Support Vectors: ν -SVC typically has fewer support vectors because it
explicitly controls this number.

• Robustness to Outliers: P-SVC and ν -SVC are more robust due to their
regularization.

4.2 Discussion
Strengths and Weaknesses:
• LS-SVC:
– Strengths: Simple to implement and works well on linearly separable data.

– Weaknesses: Less robust to outliers; may overfit with large C .


• P-SVC:
– Strengths: More robust to outliers; includes bias term regularization.
– Weaknesses: Slightly more complex optimization problem.
• ν -SVC:
– Strengths: Explicit control over margin and support vectors; faster training.

– Weaknesses: Requires careful tuning of ν .

Suitability for Different Datasets:


• Linearly Separable Data: All methods perform well; ν -SVC is preferred for
simplicity and speed.

• Noisy Data: P-SVC and ν -SVC are more robust to outliers.

• Imbalanced Data: ν -SVC is suitable as it allows control over the fraction of support
vectors.

You might also like