Regularization in Machine Learning
Regularization in Machine Learning
Introduction
Regularization is a technique used in machine learning to prevent overfitting by adding a penalty to the
model’s complexity during training. It improves the generalization performance of the model on unseen
data.
Types of Regularization
L1 Regularization (Lasso)
L1 regularization adds the sum of the absolute values of the coefficients as a penalty to the loss function.
This encourages sparsity by shrinking some weights to exactly zero, which can effectively perform feature
selection.
n
X
Loss = Original Loss + λ |wj | (1)
j=1
Properties of L1 Regularization:
• Promotes sparsity in the model parameters, leading to a sparse solution where some weights are exactly
zero.
• Performs feature selection, as irrelevant features will have zero coefficients.
• Useful in high-dimensional datasets with many irrelevant or redundant features.
• The optimization problem is not differentiable at zero, making it slightly harder to optimize compared
to L2 regularization.
1
L2 Regularization (Ridge)
L2 regularization adds the sum of the squared values of the coefficients as a penalty to the loss function. It
encourages smaller coefficients by penalizing large weights, without driving them to exactly zero.
n
X
Loss = Original Loss + λ wj2 (2)
j=1
Properties of L2 Regularization:
• Shrinks weights towards zero but does not eliminate them (no sparsity).
– Does not perform feature selection as weights are never exactly zero.
– Can result in complex models in high-dimensional settings.
2
Tuning Regularization
The strength of regularization is controlled by a hyperparameter λ, which determines the trade-off between
model complexity and fit to the data. Higher λ leads to stronger regularization (simpler models), while lower
λ allows for more complex models.
Hyperparameter tuning can be done using:
• Cross-validation to select the optimal λ.
Conclusion
Regularization is essential for building machine learning models that generalize well. Ridge (L2) and Lasso
(L1) regularization are widely used techniques:
• L1 Regularization (Lasso): Promotes sparsity, useful for feature selection.
• L2 Regularization (Ridge): Shrinks weights evenly, handles multicollinearity.
Elastic Net offers a compromise between L1 and L2 regularization, making it useful for datasets with
many correlated features.