Overfitting Regression
Overfitting Regression
Overfitting
Kriti Srivastava
On training
accuracy On testing
Complexity of model
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise
Lack of data points makes it difficult to predict correctly the class labels
of that region
Notes on Overfitting
• Overfitting results in model that are more complex
than necessary
8
Triple Trade-Off
• There is a trade-off between three factors:
– Complexity of H, c (H),
– Training set size, N,
– Generalization error, E on new data overfitting
• As N increases, E decreases
• As c (H) increases, first E decreases and then E increases
• As c (H) increases, the training error decreases for some time
and then stays constant (frequently at 0)
9
Notes on Overfitting
• overfitting happens when a model is capturing
idiosyncrasies of the data rather than generalities.
– Often caused by too many parameters relative to the
amount of training data.
– E.g. an order-N polynomial can intersect any N+1 data
points
Dealing with Overfitting
• Use more data
• Use a tuning set
• Regularization
• Be a Bayesian
11
Regularization
• In a linear regression model overfitting is
characterized by large weights.
12
Penalize large weights in Linear Regression
• Introduce a penalty term in the loss function.
Regularized Regression
1. (L2-Regularization or Ridge Regression)
1. L1-Regularization
13