0% found this document useful (0 votes)

21 views12 pages

Different Types of Cross-Validations in Machine Learning

Uploaded by

radishclare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Different Types of Cross-Validations in Machine Learning

Uploaded by

radishclare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

What we For About Get Hambu

Resources Started
Login
do developers us

FOR DEVELOPERS

Different Types of Cross-Validations in Machine Learning and Their

Explanations

Share 

Machine learning and proper training go hand-in-hand. You can’t directly use or fit the model on a set of training data and say ‘Yes, this will
work.’ To ensure that the model is correctly trained on the data provided without much noise, you need to use cross-validation techniques.
These are statistical methods used to estimate the performance of machine learning models.

This article will introduce you to the different types of cross-validation techniques, supported with detailed explanations and code.

Types of cross-validation
1. K-fold cross-validation

2. Hold-out cross-validation

3. Stratified k-fold cross-validation

4. Leave-p-out cross-validation

5. Leave-one-out cross-validation

6. Monte Carlo (shuffle-split)

7. Time series (rolling cross-validation)

K-fold cross-validation
In this technique, the whole dataset is partitioned in k parts of equal size and each partition is called a fold. It’s known as k-fold since there are k
parts where k can be any integer - 3,4,5, etc.

One fold is used for validation and other K-1 folds are used for training the model. To use every fold as a validation set and other left-outs as a
training set, this technique is repeated k times until each fold is used once.

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 1/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us

Image source: sqlrelease.com

The image above shows 5 folds and hence, 5 iterations. In each iteration, one fold is the test set/validation set and the other k-1 sets (4 sets) are
the train set. To get the final accuracy, you need to take the accuracy of the k-models validation data.

This validation technique is not considered suitable for imbalanced datasets as the model will not get trained properly owing to the proper ratio
of each class's data.

Here’s an example of how to perform nok-fold cross-validation using Python.

Code:

Image source: Author

Output:

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 2/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us

Image source: Author

Holdout cross-validation
Also called a train-test split, holdout cross-validation has the entire dataset partitioned randomly into a training set and a validation set. A rule
of thumb to partition data is that nearly 70% of the whole dataset will be used as a training set and the remaining 30% will be used as a
validation set. Since the dataset is split into only two sets, the model is built just one time on the training set and executed faster.

Image source: datavedas.com

In the image above, the dataset is split into a training set and a test set. You can train the model on the training set and test it on the testing
dataset. However, if you want to hyper-tune your parameters or want to select the best model, you can make a validation set like the one below.

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 3/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us

Image source: datavedas.com

Code:

Image source: Author

Output:

Stratified k-fold cross-validation

As seen above, k-fold validation can’t be used for imbalanced datasets because data is split into k-folds with a uniform probability distribution.
Not so with stratified k-fold, which is an enhanced version of the k-fold cross-validation technique. Although it too splits the dataset into k equal
folds, each fold has the same ratio of instances of target variables that are in the complete dataset. This enables it to work perfectly for
imbalanced datasets, but not for time-series data.

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 4/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us

Image source: dataaspirant.com

In the example above, the original dataset contains females that are a lot less than males, so this target variable distribution is imbalanced. In
the stratified k-fold cross-validation technique, this ratio of instances of the target variable is maintained in all the folds.

Code:

Image source: Author

Output:

Image source: Author

Leave-p-out cross-validation

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 5/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.
An exhaustive cross-validation technique, p samples are used as the validation set and n-p samples are used as the training set if a dataset
has n samples. The process is repeated until the entire dataset containing n samples gets divided on the validation set of p samples and the
About Hambu
training set of n-p samples. This continues till all samples are used as a validation set.
us
The technique, which has a high computation time, produces good results. However, it’s not considered ideal for an imbalanced dataset and is
deemed to be a computationally unfeasible method. This is because if the training set has all samples of one class, the model will not be able to
properly generalize and will become biased to either of the classes.

Code:

Image source: Author

Output:

Image source: Author

Leave-one-out cross-validation
In this technique, only 1 sample point is used as a validation set and the remaining n-1 samples are used in the training set. Think of it as a more
specific case of the leave-p-out cross-validation technique with P=1.

To understand this better, consider this example:

There are 1000 instances in your dataset. In each iteration, 1 instance will be used for the validation set and the remaining 999 instances will be
used as the training set. The process repeats itself until every instance from the dataset is used as a validation sample.

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 6/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us

Image source
The leave-one-out cross-validation method is computationally expensive to perform and shouldn’t be used with very large datasets. The good
news is that the technique is very simple and requires no configuration to specify. It also provides a reliable and unbiased estimate for your
model performance.

Code:

Image source: Author

Output:

Image source: Author

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 7/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

Monte Carlo cross-validation

About Hambu
us
Also known as shuffle split cross-validation and repeated random subsampling cross-validation, the Monte Carlo technique involves splitting
the whole data into training data and test data. Splitting can be done in the percentage of 70-30% or 60-40% - or anything you prefer. The only
condition for each iteration is to keep the train-test split percentage different.

The next step is to fit the model on the train data set in that iteration and calculate the accuracy of the fitted model on the test dataset. Repeat
these iterations many times - 100,400,500 or even higher - and take the average of all the test errors to conclude how well your model performs.

For a 100-iteration run, the model training will look like this:

Image source: medium.com

You can see that in each iteration, the split ratio of the training set and test set is different. The average has been taken to get the test errors.

Code:

Image source: Author

Output:

Image source: Author

Time series (rolling cross-validation / forward chaining method)

Before going into the details of the rolling cross-validation technique, it’s important to understand what time-series data is.

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 8/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.
Time series is the type of data collected at different points in time. This kind of data allows one to understand what factors influence certain
variables from period to period. Some examples of time series data are weather records, economic indicators, etc.
About Hambu
us data instances randomly and assign them the test
In the case of time series datasets, the cross-validation is not that trivial. You can’t choose
set or the train set. Hence, this technique is used to perform cross-validation on time series data with time as the important factor.

Since the order of data is very important for time series-related problems, the dataset is split into training and validation sets according to time.
Therefore, it’s also called the forward chaining method or rolling cross-validation.

To begin:
Start the training with a small subset of data. Perform forecasting for the later data points and check their accuracy. The forecasted data points
are then included as part of the next training dataset and the next data points are forecasted. The process goes on.

The image below shows the method.

Image source: medium.com

Code:

Image source: Author

Output:

Image source: Author

Try your hand at these codes and play around with them to get a hang of how cross-validation is done using these seven techniques.
Happy coding!

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 9/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
us
Related articles

Guide to Autoregressive Models

The regression-based algorithm is one of the most widely used approaches in the field of data science to predict the
target variable...

How to Set-up Your Data Science Stack on a Budget

Data science has progressed to the point that no organization can afford to disregard it. And the first order of business
for any...

Introduction to DAGsHub and DVCs in Machine Learning for Beginners.

Every machine learning problem demands a unique solution subjected to its distinctiveness...

Introduction to Statistics for Machine Learning

Machine learning is a subset of artificial intelligence in which a model holds the capability of...

An Introduction to Naive Bayes Algorithm for Beginners

The Naive Bayes Algorithm is one of the crucial algorithms in machine learning that helps with...

Software Engineer Software Engineer

A rapidly-growing cloud-based call A U.S.-based company that is

center software development working to revolutionize the future
company that connects with of churn prevention for
helpdesk, productivity, and CRM subscription-based businesses, is
solutions, is looking for a Software looking for a Software Engineer. The
Engineer. The engineer will be engineer must possess a “find a
responsible for determining where… way, or make one” attitude and wil…

 Software  251-10K employees  Software  1-10 employees

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 10/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

Ruby on Rails React AWS Lambda React Node.js

About Hambu

Apply now us
Apply now

Press
What’s up with Turing?
Get the latest news about us here.

Blog
Know more about remote work.
Checkout our blog here.

Contact
Have any questions?
We’d love to hear from you.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

Hire Developers

Engineering services

LLM training and enhancement

Generative AI

AI/ML

Custom engineering

All services

On-demand talent

Technical professionals and teams

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 11/12
22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

About Hambu
For developers
us

Browse remote jobs

Get hired

Developer reviews

Developer resources

Tech interview questions

Resources

Blog

More resources

Company

About

Press

Turing careers

Connect

Help center

Sitemap

Privacy settings

1900 Embarcadero Road Palo Alto, CA, 94303

https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations 12/12

Unit 5 (ML)
No ratings yet
Unit 5 (ML)
25 pages
Understanding Cross-Validation Techniques
No ratings yet
Understanding Cross-Validation Techniques
5 pages
Data Preparation and Cross-Validation Guide
No ratings yet
Data Preparation and Cross-Validation Guide
20 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
8 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
Machine Learning Data Splits Explained
No ratings yet
Machine Learning Data Splits Explained
30 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Understanding Cross Validation Methods
No ratings yet
Understanding Cross Validation Methods
11 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Model Selection Techniques in ML
No ratings yet
Model Selection Techniques in ML
58 pages
Cross-Validation Techniques Guide
No ratings yet
Cross-Validation Techniques Guide
10 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
Understanding Generalization Errors in ML
No ratings yet
Understanding Generalization Errors in ML
9 pages
Cross Validation - Notes
100% (1)
Cross Validation - Notes
10 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
K-Fold vs Leave-One-Out Cross-Validation
No ratings yet
K-Fold vs Leave-One-Out Cross-Validation
5 pages
Sklearn Cross-Validation Guide
100% (1)
Sklearn Cross-Validation Guide
9 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Cross Validation
No ratings yet
Cross Validation
13 pages
Benefits of Cross-Validation in Data Science
No ratings yet
Benefits of Cross-Validation in Data Science
18 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
Cross Validation Techniques Guide
No ratings yet
Cross Validation Techniques Guide
21 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
107 pages
Unit V
No ratings yet
Unit V
16 pages
Cross-Validation in Machine Learning Guide
No ratings yet
Cross-Validation in Machine Learning Guide
5 pages
Model Evaluation & Cross-Validation Guide
No ratings yet
Model Evaluation & Cross-Validation Guide
43 pages
Model Validation
No ratings yet
Model Validation
5 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
M.L L-6 Re-Sampling Methods
No ratings yet
M.L L-6 Re-Sampling Methods
24 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
20 pages
Model Validation Techniques
No ratings yet
Model Validation Techniques
9 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Unit 2
No ratings yet
Unit 2
28 pages
Cross Validation 1
No ratings yet
Cross Validation 1
5 pages
Overfitting and Model Evaluation Techniques
No ratings yet
Overfitting and Model Evaluation Techniques
20 pages
Cross Validation
No ratings yet
Cross Validation
14 pages
Classification Model Evaluation Metrics
No ratings yet
Classification Model Evaluation Metrics
16 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
20 pages
Introduction To K-Fold Cross-Validation
No ratings yet
Introduction To K-Fold Cross-Validation
6 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
5 Machine Learning Applications in Business
No ratings yet
5 Machine Learning Applications in Business
37 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Unit 2 Part 2 Data Science Final 23june
No ratings yet
Unit 2 Part 2 Data Science Final 23june
39 pages
18 Bias Variance K-foldCrossValidation Boosting
No ratings yet
18 Bias Variance K-foldCrossValidation Boosting
23 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
12 pages
Cross Validation in Thesis Writing
100% (4)
Cross Validation in Thesis Writing
5 pages
Analysis of K-Fold Cross-Validation Over Hold-Out
No ratings yet
Analysis of K-Fold Cross-Validation Over Hold-Out
6 pages
K Fold
No ratings yet
K Fold
21 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Cross Validation Techniques Explained
No ratings yet
Cross Validation Techniques Explained
11 pages
Cross Validation Methods in Machine Learning
No ratings yet
Cross Validation Methods in Machine Learning
11 pages
Lec 11
No ratings yet
Lec 11
26 pages
Cost Estimate for Two-Storey Residence
No ratings yet
Cost Estimate for Two-Storey Residence
6 pages
BOEING 777-243 (ER) Ei-Isa MSN 32855
No ratings yet
BOEING 777-243 (ER) Ei-Isa MSN 32855
18 pages
Leadership and Innovation
100% (2)
Leadership and Innovation
22 pages
Unit I
No ratings yet
Unit I
2 pages
Addiction and Mental Health
No ratings yet
Addiction and Mental Health
3 pages
CANON ImageRUNNER Ir7086, Ir7095, Ir7105 Series Portable Manual
67% (3)
CANON ImageRUNNER Ir7086, Ir7095, Ir7105 Series Portable Manual
170 pages
Mindfulness-Based Stress Reduction
No ratings yet
Mindfulness-Based Stress Reduction
4 pages
68 2024 14343
No ratings yet
68 2024 14343
5 pages
Industrial Noise Pollution and Its Impac
No ratings yet
Industrial Noise Pollution and Its Impac
10 pages
2009-RF Circuit Design Courseware
No ratings yet
2009-RF Circuit Design Courseware
23 pages
Sustainable Construction in Developing Nations
No ratings yet
Sustainable Construction in Developing Nations
7 pages
Anglais 6e, 2nd Seq
No ratings yet
Anglais 6e, 2nd Seq
3 pages
BSC Marksheet 3RD Year
100% (3)
BSC Marksheet 3RD Year
1 page
Instrument Design & Installation Specs
100% (1)
Instrument Design & Installation Specs
79 pages
Engineering Math Lecture Notes
No ratings yet
Engineering Math Lecture Notes
22 pages
Laptop
No ratings yet
Laptop
19 pages
Becoming A Person of Influence
No ratings yet
Becoming A Person of Influence
11 pages
I.K.Gujral Punjab Technical University Jalandhar: Grade Cum Marks Sheet
No ratings yet
I.K.Gujral Punjab Technical University Jalandhar: Grade Cum Marks Sheet
1 page
Understanding External Ballistics
No ratings yet
Understanding External Ballistics
3 pages
Scoring Guidelines For 2d
No ratings yet
Scoring Guidelines For 2d
11 pages
ADOBOND - New Adhesive Unit Project Report
0% (1)
ADOBOND - New Adhesive Unit Project Report
12 pages
Thom21e Ch06 Final
No ratings yet
Thom21e Ch06 Final
28 pages
TMD and Allodynia in Migraine Patients
No ratings yet
TMD and Allodynia in Migraine Patients
8 pages
Oracle Fusion PO - Creating A Non-Catalog Goods or Services Billed by Amount or Quantity R
No ratings yet
Oracle Fusion PO - Creating A Non-Catalog Goods or Services Billed by Amount or Quantity R
15 pages
Guide Frame
No ratings yet
Guide Frame
3 pages
English Test for Grade 8 Students
No ratings yet
English Test for Grade 8 Students
8 pages
MSc Petroleum & Gas Dissertation Topics
No ratings yet
MSc Petroleum & Gas Dissertation Topics
3 pages
Legal Technique & Logic Syllabus
No ratings yet
Legal Technique & Logic Syllabus
4 pages
MR Fluid
No ratings yet
MR Fluid
5 pages
Problem Solving Games PDF
No ratings yet
Problem Solving Games PDF
12 pages

Different Types of Cross-Validations in Machine Learning

Uploaded by

Different Types of Cross-Validations in Machine Learning

Uploaded by

22/07/2024, 13:11 Different Types of Cross-Validations in Machine Learning.

What we For About Get Hambu

Different Types of Cross-Validations in Machine Learning and Their

3. Stratified k-fold cross-validation

6. Monte Carlo (shuffle-split)

7. Time series (rolling cross-validation)

Image source: sqlrelease.com

Here’s an example of how to perform nok-fold cross-validation using Python.

Image source: Author

Image source: Author

Image source: Author

Image source: Author

Image source: datavedas.com

Image source: datavedas.com

Image source: Author

Stratified k-fold cross-validation

Image source: dataaspirant.com

Image source: Author

Image source: Author

Image source: Author

Image source: Author

To understand this better, consider this example:

Image source: Author

Image source: Author

Monte Carlo cross-validation

Image source: medium.com

Image source: Author

Image source: Author

Time series (rolling cross-validation / forward chaining method)

The image below shows the method.

Image source: medium.com

Image source: Author

Image source: Author

Guide to Autoregressive Models

How to Set-up Your Data Science Stack on a Budget

Introduction to DAGsHub and DVCs in Machine Learning for Beginners.

Introduction to Statistics for Machine Learning

An Introduction to Naive Bayes Algorithm for Beginners

Apply for remote Machine Learning developer jobs in top US companies

Software Engineer Software Engineer

A rapidly-growing cloud-based call A U.S.-based company that is

 Software  251-10K employees  Software  1-10 employees

Ruby on Rails React AWS Lambda React Node.js

Hire remote developers

LLM training and enhancement

Technical professionals and teams

Browse remote jobs

Tech interview questions

1900 Embarcadero Road Palo Alto, CA, 94303

You might also like