0% found this document useful (0 votes)
9 views

4-ResamplingMethods 1

Uploaded by

Roushan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

4-ResamplingMethods 1

Uploaded by

Roushan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning

Samatrix Consulting Pvt Ltd


Resampling Methods
Resampling Methods
• Resampling methods involve draw samples from a training set repeatedly.
• Then refit the model of interest on each sample. In this process, we obtain
additional information about the fitted model.
• This information would not be available if we fit the model only once using
the original training sample.
• The resampling methods involve fitting a machine learning method
multiple times by using different subset of training data.
• Hence the resampling methods may be computationally expensive.
• However, due to recent advances in computing power, the resampling
methods are no longer prohibitive.
Resampling Methods
• The most commonly used resampling methods are cross-validation and
bootstrap.
• For example, we can use cross-validation to estimate the error associated
with a machine learning method to evaluate its performance or select the
level of flexibility.
• The process of evaluating the performance of a model is called model
assessment.
• Whereas the process of selecting the level of flexibility of a model is called
the model selection.
• We use bootstrap methods to measure the accuracy of a given machine
learning methods.
Resampling Methods
• In the previous chapters, we discussed how the test error rates and
the training error rates could be different from each other.
• If we do not have a very large test data set to estimate the test error
rate, we can use different techniques to estimate the test error rate
using the available training data.
• By using a class of methods, we can estimate the test error rates by
holding out a subset of training observation from the fitting process
and then applying the machine learning methods to the help out
observations.
Validation Set Approach
• Validation set approach is used to estimate the test error rate.
• In this approach, we randomly divide the available set of observations
into two parts: training set and validation set or hold-out set.
• We use the training set to fit the model and then use the fitted model
to predict the responses for the observations in the validation set.
• We use the validation set error rate to estimate the test error rate.
Validation Set Approach
• We have demonstrated the schematic display of the validation set
approach in the Figure 1.
• We have randomly split a set of 𝑛 observations into a training set and
a validation set.
• The training set is shown in blue and it contains observations,
7, 22, and 13 among others.
• The validation set is shown in beige and it contains observation 91
among others.
• We use a machine learning model to fit on the training set and use
validation set to evaluate its performance.
Validation Set Approach
• Even though the validation set approach is conceptually simple and is
easy to implement, it suffers two drawbacks:
• The variance of the estimate of the test error rate using validation approach
can be high because it depends how the observations have been split
between the training set and the validation set
• In the validation approach we use only a subset of the observations to fit the
model. The machine learning models do not perform well on fewer
observation and the validation set error rate may be an overestimation of the
test error rate for the model that has been fit on the entire data set.
Leave-one-out Cross-Validation
• Leave-One-Out Cross Validation (LOOCV) is similar to the validation set
approach but it ties to address the drawbacks of validation approach.
• Similar to the validation set approach, the LOOCV also split the data set
into two parts.
• Instead of creating two subsets of comparable size, we use only one
observation 𝑥1 , 𝑦1 for the validation set and use the remaining (𝑛 − 1)
observations { 𝑥2 , 𝑦2 , … , 𝑥𝑛 , 𝑦𝑛 } for the training set.
• We fit a machine learning model on the (𝑛 − 1) training observations,
make prediction 𝑦ො1 using the excluded observation 𝑥1 , and calculate the
𝑀𝑆𝐸1 = 𝑦1 − 𝑦ො1 2 .
• But the 𝑀𝑆𝐸1 is highly variable because it is based on a single observation.
Leave-one-out Cross-Validation
• Now we repeat the procedure by selecting
𝑥2 , 𝑦2 for validation data, training machine
learning model on 𝑛 − 1 observation
• 𝑥1 , 𝑦1 , 𝑥3 , 𝑦3 … , 𝑥𝑛 , 𝑦𝑛 , and
computing 𝑀𝑆𝐸2 = 𝑦2 − 𝑦ො2 2 .
• If we repeat this approach 𝑛 times, we get 𝑛
squared errors 𝑀𝑆𝐸1 , … , 𝑀𝑆𝐸𝑛 .
• The LOOCV estimate for the test 𝑀𝑆𝐸 can be
calculated by taking the average of all these
𝑛 test error estimates: 𝑛
1
𝐶𝑉 𝑛 = ෍ 𝑀𝑆𝐸𝑖
𝑛
𝑖=1
Leave-one-out Cross-Validation
• LOOCV has two major advantages over the validation set approach.
• Firstly, LOOCV has far less bias as compared to validation set
approach.
• In the case of LOOCV, we repeatedly fit the machine learning model
on the 𝑛 − 1 training observations (almost as many as are in the
entire data set), whereas in the case of validation set approach, we
split the training set into two comparable sizes.
• Hence the LOOCV approach does not overestimate the test error rate
as much as the validation set approach tends to overestimate.
Leave-one-out Cross-Validation
• Secondly, due to randomness in the training/validation set splits, the
validation set approach will yield different results when applied
repeatedly.
• While if we perform LOOCV multiple times, we will always get the
same results because there is no randomness in the
training/validation set splits.
• On the other hand, LOOCV is computationally expensive because the
model has to be fit 𝑛 times.
• If 𝑛 is large and each individual model is slow to fit, this can be very
time consuming.
Leave-one-out Cross-Validation
• LOOCV is a general method. We can use it for any kind of predictive
modeling.
• We can also use it with logistic regression or linear discriminant
analysis
k-fold Cross Validation
• 𝑘-fold cross validation is an alternative to LOOCV. In this method, we
randomly divide the data set into 𝑘 groups or folds of approximately
equal size.
• We treat the first fold as a validation set and a machine learning
method is fit on the remaining 𝑘 − 1 folds.
• We calculate the mean squared error,𝑀𝑆𝐸1 , on the observations in
the held-out fold.
• We repeat the procedure 𝑘 times by treating a different group of
observations as a validation set.
k-fold Cross Validation
• Thus, we get 𝑘 estimates of the test
error 𝑀𝑆𝐸1 , 𝑀𝑆𝐸2 , … , 𝑀𝑆𝐸𝑘 . Finally, we
compute 𝑘-fold CV estimate by
averaging these values
𝑘
1
𝐶𝑉 𝑘 = ෍ 𝑀𝑆𝐸𝑖
𝑘
𝑖=1
• A schematic of the 𝑘-fold cross
validation approach has been
demonstrated in Figure – 3.
k-fold Cross Validation
• We can see that LOOCV is a special case of 𝑘-fold CV in which 𝑘 = 𝑛.
• In practice, we typically perform 𝑘-fold CV using 𝑘 = 5 or 𝑘 = 10.
• The question arises about the advantage of using 𝑘 = 5 or 𝑘 = 10
instead of 𝑘 = 𝑛.
• One of the major advantages is computational.
• The LOOCV requires fitting the learning method 𝑛 times, which makes
LOOCV computationally expensive when compared to the 𝑘-fold CV
using 𝑘 = 5 or 𝑘 = 10.
• For example, if we take 𝑘 = 10, the 10-fold CV requires fitting the
learning method only 10 times which may be much more feasible.
The Bootstrap
• The bootstrap is widely acceptable and very powerful tool.
• It is used to quantity the uncertainty that are associated with a given
machine learning method.
• For example, we can use the bootstrap to estimate the standard
errors of the coefficients from a linear regression fit.
• It may not be useful for linear regression fit because standard
statistical software such as R and Python automatically provide
standard error.
• For a wide range of machine learning methods, the standard errors
are difficult to obtain. We can use the bootstrap for such methods.
The Bootstrap
• The bootstrap estimate quantities about a population by averaging
estimates from small data samples.
• The samples are constructed by drawing observations from a large
data sample one at a time and returning to the sample after they
have been chosen.
• This allows a given observation to be included in a given sample more
than once.
• This approach of sample is known as sampling with replacement.
The Bootstrap
• In this example, we have a simple
data set which we can call 𝑍 and it
contains only 𝑛 = 3 observations.
• From this dataset, we randomly select
𝑛 observations to produce
∗1
a
bootstrap dataset 𝑍 .
• We performed this sampling by
replacement due to which same
observation can be taken in the
bootstrap dataset more than once.
• In the current example, for 𝑍 ∗1
dataset, we have the first observation
once and the third observation twice
but the second observation is not
present in the dataset.
The Bootstrap
• We can use the 𝑍 ∗1 to calculate
the bootstrap estimate for 𝛼,
which we can call 𝛼ො ∗1 .
• We repeat this procedure 𝐵 times
by keeping a large value of 𝐵.
• Thus, we produce 𝐵 different
bootstrap data sets,
𝑍 ∗1 , 𝑍 ∗2 , … , 𝑍 ∗𝐵 .
• We also produce corresponding 𝐵
𝛼 estimates, 𝛼ො ∗1 , 𝛼ො ∗2 , … , 𝛼ො ∗𝐵 .
• We can now compute the standard
error of these bootstrap estimates.
Subset Selection
• There are two reasons why we are not satisfied with the least squares
estimates.
• The first is prediction accuracy: the least squares estimates often have low
bias but large variance. Prediction accuracy can sometimes be improved by
shrinking or setting some coefficients to zero. By doing so we sacrifice a
little bit of bias to reduce the variance of the predicted values, and hence
may improve the overall prediction accuracy.
• The second reason is interpretation. With a large number of predictors, we
often would like to determine a smaller subset that exhibit the strongest
effects. In order to get the "big picture", we are willing to sacrifice some of
the small details
Shrinkage Methods
• By retaining a subset of the predictors and discarding the rest, subset
selection produces a model that is interpretable and has possibly
lower prediction error than the full model.
• However, because it is a discrete process—variables are either
retained or discarded—it often exhibits high variance, and so doesn't
reduce the prediction error of the full model.
• Shrinkage methods are more continuous, and don't suffer as much
from high variability.
Thanks
Samatrix Consulting Pvt Ltd

You might also like