Random Forest

Uploaded by

patelnirmal917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views21 pages

Random Forest

Uploaded by

patelnirmal917

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Random Forest

What is Decision Tree?

A Decision Tree algorithm is one of the
most popular machine learning
algorithms. It uses a tree like structure
and their possible combinations to solve
a particular problem.

It belongs to the class of supervised

learning algorithms where it can be used
for both classification and regression
purposes.
What is Decision Tree?
A decision tree is a structure that includes a root node, branches, and leaf nodes.

Each internal node denotes a test on an attribute, each branch denotes the outcome of a
test, and each leaf node holds a class label. The topmost node in the tree is the root
node.
Terminology
• Root Node: It represents the entire population or
sample. This further gets divided into two or more
homogeneous sets.

• Splitting: It is a process of dividing a node into two or

more sub-nodes.

• Decision / Internal Node: When a sub-node splits into

further sub-nodes, then it is called a decision node.

• Leaf/Terminal Node: Nodes that do not split are called

Leaf or Terminal nodes.
Terminology
• Pruning: When we remove sub-nodes of a
decision node, this process is called pruning.
It is the opposite process of splitting.

• Branch/Sub-Tree: A sub-section of an entire

tree is called a branch or sub-tree.

• Parent and Child Node: A node, which is

divided into sub-nodes is called the parent
node of sub-nodes where sub-nodes are the
children of a parent node.
Random Forest
A Random Forest Algorithm is a supervised
machine learning algorithm that is extremely
popular and is used for Classification and
Regression problems in Machine Learning.

As a forest comprises numerous trees, and

the more trees more it will be robust. Similarly,
the greater the number of trees in a Random
Forest Algorithm, the higher its accuracy and
problem-solving ability.
Random Forest
Random Forest is a classifier that contains
several decision trees on various subsets of
the given dataset and takes the average to
improve the predictive accuracy of that
dataset. It is based on the concept of
ensemble learning which is a process of
combining multiple classifiers to solve a
complex problem and improve the
performance of the model.
Why Random Forest ?
•Reduces overfitting compared to individual decision trees.
•Handles both classification and regression tasks.
•Robust to noise in the data.
•High performance with minimal parameter tuning.
How does Random Forest work
1. Random Forest - Different Trees with Different Splits

• Random Forest is an ensemble of multiple decision trees, where each tree is built using a random
subset of the features and random sampling of data points (with bootstrapping).

• While all trees use the same dataset, the randomness involved in their construction can lead to
different splits at each level. For example:
• One tree might split first on Variable 1 because it provides the best impurity reduction for that
specific subset of the data.
• Another tree might split first on Variable 2 because it provides the best impurity reduction for
that subset of the data.

• Even though both features may have high importance, the decision to split on one or the other
depends on the data samples and the random feature subset chosen during tree construction.
How does Random Forest work
2. Why Different First Splits?

• Random Selection of Features: When building each tree, Random Forest randomly selects
a subset of features for each split. This means that Variable 1 might be more useful in one
tree, while Variable 2 could be the best split in another.

• Impurity Reduction at Each Node: The first split is chosen based on the feature that
maximizes impurity reduction (e.g., Gini index or entropy). Even if Variable 2 has the highest
importance across the entire forest, Variable 1 might be the best split for that specific
tree’s data, leading to the observed difference.
How does Random Forest work
Step 1: Create Multiple Decision Trees
• For each tree, a random subset of the training data is chosen (with replacement).
• At each node in the tree, only a random subset of features is considered for splitting.

Step 2: Train Each Tree

• Each tree is independently trained using the subset of data and random features.
How does Random Forest work
Step 3: Make Predictions
• For Classification: The majority vote from all trees determines the final class label.
• For Regression: The average of predictions from all trees gives the final predicted value.

Step 4: Model Evaluation

• Out-of-Bag (OOB) scoring is used to assess model performance during training, without
the need for a separate validation set.
How does
Random
Forest work
Understanding the code of Random Forest
RandomForestClassifier( RandomForestRegressor(
random_state=42, random_state=42,
n_jobs=-1, n_jobs=-1,
max_depth=5, max_depth=5,
n_estimators=100, n_estimators=100,
oob_score=True) oob_score=True)

For Classification For Regression

Understanding the code of Random Forest
RandomForestClassifier( random_state is used to set the seed for the random
random_state=42, number generator, ensuring that the results are reproducible.
n_jobs=-1, When you set random_state=42, it ensures that every time
max_depth=5, you run the model with this seed, you will get the same
n_estimators=100, results.

oob_score=True) •42 is just a random number; any integer can be used here.
The important thing is that using the same number
guarantees the same split of the data when training.
Understanding the code of Random Forest
RandomForestClassifier( n_jobs specifies the number of CPU cores to use for
random_state=42, parallel processing.
n_jobs=-1, •If n_jobs=-1, it means use all available cores in the
max_depth=5, system to speed up training and prediction. This is useful
n_estimators=100, for large datasets where training multiple trees in

oob_score=True) parallel can save time.

•If you set n_jobs=1, it will use only one core, and for
n_jobs=2, it will use two cores, and so on.
Understanding the code of Random Forest
RandomForestClassifier( max_depth limits the maximum depth of the individual
random_state=42, trees in the forest.
n_jobs=-1, •A depth of 5 means that each tree will have a maximum
max_depth=5, of 5 levels from the root to the leaf nodes. Limiting the
n_estimators=100, depth helps in preventing overfitting, making the model

oob_score=True) more generalizable.

•If max_depth=None, the trees will grow until they
completely fit the data, which can lead to overfitting.
Understanding the code of Random Forest
RandomForestClassifier( n_estimators specifies the number of decision trees in
random_state=42, the random forest. In this case, the random forest will
n_jobs=-1, consist of 100 individual decision trees.
max_depth=5, • Increasing the number of trees can improve the
n_estimators=100, model's performance, but it also increases
oob_score=True) computational time and resources. Usually, a larger
number of trees helps the model to be more accurate
by reducing variance, but after a certain point, the
improvement may plateau.
Understanding the code of Random Forest
RandomForestClassifier( oob_score stands for Out-Of-Bag score. It is used to
random_state=42, estimate the accuracy of the random forest model without
using a separate validation set.
n_jobs=-1,
•When building a random forest, each tree is trained on a
max_depth=5, different bootstrapped sample of the data, meaning some
n_estimators=100, data points are left out (out-of-bag) for each tree.
•By setting oob_score=True, the RandomForestClassifier
oob_score=True)
will calculate the accuracy based on these out-of-bag
samples. This can serve as an internal cross-validation and
give an estimate of the model's performance without the
need for a separate test set.
Advantages of Random Forest

• Can handle both classification and regression tasks.

• Less prone to overfitting than decision trees.
• Can handle missing data and outliers well.
• Feature importance estimation.
Disadvantages of Random Forest

• Slower to predict compared to a single decision tree (because it requires multiple

trees to be evaluated).
• Less interpretable than a single decision tree.
• Memory-intensive (especially with a large number of trees).

Random Forest Presentation
No ratings yet
Random Forest Presentation
37 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Random Forest 1737667979
No ratings yet
Random Forest 1737667979
11 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest
No ratings yet
Random Forest
10 pages
Daily Dose of Data Science - Archive
No ratings yet
Daily Dose of Data Science - Archive
354 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
320C10
100% (2)
320C10
59 pages
10 Random Forest
No ratings yet
10 Random Forest
13 pages
Random Forest
No ratings yet
Random Forest
14 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
Wacc Project
No ratings yet
Wacc Project
8 pages
Unleashing The Power of Random Forest - A Journey Through Algorithmic Canopies
No ratings yet
Unleashing The Power of Random Forest - A Journey Through Algorithmic Canopies
14 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
Random Forests
No ratings yet
Random Forests
35 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forest in ML
No ratings yet
Random Forest in ML
13 pages
03 - Random Forest
No ratings yet
03 - Random Forest
24 pages
Random Forest
No ratings yet
Random Forest
25 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest
No ratings yet
Random Forest
32 pages
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
Acr Model
No ratings yet
Acr Model
6 pages
Estimating The Bass Diffusion Model
No ratings yet
Estimating The Bass Diffusion Model
3 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
25 June 2024 12:34: Random Fores Page 1
No ratings yet
25 June 2024 12:34: Random Fores Page 1
6 pages
Random Forest
No ratings yet
Random Forest
13 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random Forest
No ratings yet
Random Forest
6 pages
CSL0777 L26
No ratings yet
CSL0777 L26
33 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Econometrics: Two Variable Regression: The Problem of Estimation
No ratings yet
Econometrics: Two Variable Regression: The Problem of Estimation
28 pages
Random Forest
No ratings yet
Random Forest
2 pages
E131 - Exercise 1 - Multiple Linear Regression
No ratings yet
E131 - Exercise 1 - Multiple Linear Regression
5 pages
Project Sp24
No ratings yet
Project Sp24
8 pages
15MA305-Statistics For Information Technology: Dr. S. Athithan
No ratings yet
15MA305-Statistics For Information Technology: Dr. S. Athithan
18 pages
Random Forest Algorithm 1
No ratings yet
Random Forest Algorithm 1
14 pages
Transformações No R
No ratings yet
Transformações No R
4 pages
Consider The Data From The Myopia Study Described in Section 1 6 6 Whose Variables Are Described
No ratings yet
Consider The Data From The Myopia Study Described in Section 1 6 6 Whose Variables Are Described
2 pages
Random Forest
No ratings yet
Random Forest
18 pages
Difference in Differences
No ratings yet
Difference in Differences
7 pages
Roc Curve
No ratings yet
Roc Curve
43 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Đề thi KTL gà
No ratings yet
Đề thi KTL gà
12 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Presentation Southern Water Corp Statistics Student Reference
No ratings yet
Presentation Southern Water Corp Statistics Student Reference
5 pages
Choosing The Correct Statistical Test in SAS, Stata, SPSS and R
No ratings yet
Choosing The Correct Statistical Test in SAS, Stata, SPSS and R
8 pages
Hamza Samad 3
No ratings yet
Hamza Samad 3
2 pages
Random Forest
No ratings yet
Random Forest
3 pages
Random Forest Medical Diagnosis 1684665707
No ratings yet
Random Forest Medical Diagnosis 1684665707
10 pages
Random Forest
No ratings yet
Random Forest
8 pages
Pengaruh Rekrutmen Terhadap Kinerja Karyawan: Roidah Lina
No ratings yet
Pengaruh Rekrutmen Terhadap Kinerja Karyawan: Roidah Lina
10 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Anova One Way PDF
No ratings yet
Anova One Way PDF
32 pages
Ho - Sample Size
No ratings yet
Ho - Sample Size
5 pages
Homework 4
No ratings yet
Homework 4
6 pages
Tsvar
No ratings yet
Tsvar
12 pages
Machine Learning - Random Forest
No ratings yet
Machine Learning - Random Forest
6 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Wooldridge 7e Ch17 SM
No ratings yet
Wooldridge 7e Ch17 SM
14 pages
Random Forest
No ratings yet
Random Forest
4 pages
Syllabus For Post Graduate Program in Machine Learning & Artificial Intelligence - Demo
No ratings yet
Syllabus For Post Graduate Program in Machine Learning & Artificial Intelligence - Demo
4 pages
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
No ratings yet
Hubungan Persepsi Mahasiswa Tentang Keluarga Harmonis Dengan Kesiapan Menikah
7 pages
Random Forest
No ratings yet
Random Forest
2 pages
Assignment of Fundamentals of Econometrics, On Multiple Regression
No ratings yet
Assignment of Fundamentals of Econometrics, On Multiple Regression
4 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
2 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Random Forest Algorithm Unit 3
No ratings yet
Random Forest Algorithm Unit 3
2 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
3 pages
HousePricePrediction Poster
No ratings yet
HousePricePrediction Poster
1 page
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
3 pages
Random Forest
No ratings yet
Random Forest
8 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet