Introduction to Random Forest
& R Packages for RF
Shuma Ishigami
2/2/2018 Shuma Ishigami 1
Agenda
• Random Forest Algorithm
– Decision Tree
– Bootstrapping
– Random Forest
• R packages for RF
– Sample codes
– Comparison
2/2/2018 Shuma Ishigami 2
What is the Random Forest Algorithm ?
Random Forest =
[Something “Randomized”]+[“Forest” consist of trees] =
[Randomly chosen samples +
Randomly selected feature vectors] +
[Many Decision Trees]
2/2/2018 Shuma Ishigami 3
Random Forest
To put it simply, a random forest is
A set of many Decision Trees, where each tree use
Randomly Drawn Bootstrap Sample and
Randomly selected predictors
2/2/2018 Shuma Ishigami 4
Decision Tree
• A supervised* learning algorithm for both
classification and regression
• Has “Nodes” and “Branches”, (like a tree)
• DT is a set of simple rules that spilt data into
subgroup
Notes: * When we say “supervised” learning, we give a computer with a
training data with feature variables and “answer” so as the computer can
learn/find a rule from these training questions.
2/2/2018 Shuma Ishigami 5
Simple rule ?
This rule only
involves one feature
variable, 𝑋1
Rule: Is 𝑋1 > a ?
2/2/2018 Shuma Ishigami 6
𝑋2
𝑋1
𝑋2
𝑋1
a
This rule involves
two feature variable,
𝑋1 and 𝑋2
Rule: Is 𝑋1 > b AND
𝑋2> c ?
b
c
𝑋2
𝑋1
Example: Two categories( and ) and
two feature variables(X1 and X2)
2/2/2018 Shuma Ishigami 7
𝑋2
𝑋1
a
b
Try to divide data into two categories
by simple rules
2/2/2018 Shuma Ishigami 8
𝑋1 > 𝑎 ?
𝑋2 > 𝑏 ?
Yes
Yes No
No
The set of rules in the previous slide can
be represented in a tree from
2/2/2018 Shuma Ishigami 9
𝑋2
𝑋1
a
b
Now our trained DT categorizes any
new input falling in the left-below
area as class
2/2/2018 Shuma Ishigami 10
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
𝑋2
𝑋1
a
b
Let’s try with a new input whose
true class is
The DT could classify the new input
as . Correct, good job!
2/2/2018 Shuma Ishigami 11
Issue in Decision Tree
• Sensitive to Noise
• A few noise in training data would greatly
reduce prediction capability of the tree
2/2/2018 Shuma Ishigami 12
𝑋2
𝑋1
Two noises in training data
2/2/2018 Shuma Ishigami 13
𝑋2
𝑋1
Adding only two noises results in this
very complex tree
2/2/2018 Shuma Ishigami 14
𝑋2
𝑋1
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
Due to the few noises, a prediction of a
new input goes wrong.
2/2/2018 Shuma Ishigami 15
Bootstrap sampling
• Bootstrap sampling method
– Randomly draw samples with replacement from original data
– Sample with replacement means every time we draw a sample from
the data, we replace the data. We may draw the same sample more
than once.
– Ex. Assume we have an original data {A,B,C,D,E} and draw sample from
it 3 times. First, picked up B from {A,B,C,D,E} . Second time, D from
{A,B,C,D,E} . Third, B again from {A,B,C,D,E}. Contrast to sample
without replacement, sample with replacement may be duplicated.
2/2/2018 Shuma Ishigami 16
Bagging(Bootstrap AGGregatING)
• Train many decision trees by using many
bootstrapping samples
• Then let each tree independently classify a
new input data and decide the predicted class
of the new data by majority rule
• Essential idea is that because noises are rarely
drawn as bootstrap samples so the effect of
such noises is negligible
2/2/2018 Shuma Ishigami 17
𝑋2
𝑋1
Randomly sample from the original
dataset. This bootstrap sample
luckily do not have any noise.
2/2/2018 Shuma Ishigami 18
𝑋2
𝑋1
Let’s make a decision tree with this
bootstrap sample.
2/2/2018 Shuma Ishigami 19
𝑋2
𝑋1
Another bootstrap sample and DT.
This time a noise comes in.
2/2/2018 Shuma Ishigami 20
𝑋2
𝑋1
Third sample and DT
2/2/2018 Shuma Ishigami 21
𝑋2
𝑋1
Fourth sample and DT
2/2/2018 Shuma Ishigami 22
𝑋1
𝑋2
4 DTs overlaped
2/2/2018 Shuma Ishigami 23
𝑋1
𝑋2
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
Determine the class of the new input
Our Forest(4 trees) now could
categorize the new input correctly
2/2/2018 Shuma Ishigami 24
Why Random Forest?
• [Bootstraping sample] + [Many Decision trees]
– Every tree use the same set of feature variables. This leads to high
correlations among trees, resulting in limited improvement in
prediction capability
• Random Forest
– To reduce correlation among a set of decision trees, RF
assign randomly chosen feature variables with each tree
and each tree categorize training data, relying only on the
subset of feature variables
– So in RF, decision trees use different bootstrapping
samples and different options of feature variables
2/2/2018 Shuma Ishigami 25
𝑋1
The algorithm assigns this tree with
X1 as a feature variable, so the tree
tries to categorize the given sample
by X1
2/2/2018 Shuma Ishigami 26
𝑋1
2nd tree
2/2/2018 Shuma Ishigami 27
𝑋2
This time, X2 chosen as a feature to
consider
2/2/2018 Shuma Ishigami 28
𝑋2
4th tree
2/2/2018 Shuma Ishigami 29
Classification in Random Forest
Let each tree in the forest to predict the class of
the new input, and then take a vote
New input
𝐶𝑙𝑎𝑠𝑠 ⇒ ?
VS
2/2/2018 Shuma Ishigami 30
𝑋1
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
2/2/2018 Shuma Ishigami 31
𝑋1
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
2/2/2018 Shuma Ishigami 32
𝑋2
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
2/2/2018 Shuma Ishigami 33
𝑋2
New input
𝐶𝑙𝑎𝑠𝑠( ) ⇒
2/2/2018 Shuma Ishigami 34
New input
𝐶𝑙𝑎𝑠𝑠 ⇒
1 3
VS
Taking a vote on the class of the new input, majority is in favor of
2/2/2018 Shuma Ishigami 35
Out-Of-Bag(OOB) Error
• Cross validation in RF
• We call a sample as Out-Of-Bag sample in a tree,
if the sample is NOT chosen as a bootstrapping sample that
grows the decision tree.
• For a sample, we can collect a set of trees where the sample
become OOB in these trees and predict the class of the
sample using these trees.
OOB error is defined as the average of ( # of errors in
prediction)/( # of trees that do not have the sample) for all
sample
2/2/2018 Shuma Ishigami 36
1
𝑋1𝑋1
Dark-color samples
= Bootstrapping
samples
Light-color samples
= Out-Of-Bag samples
This OOB sample is predicted as in
this tree
2/2/2018 Shuma Ishigami 37
R Packages for Random Forest
• randomForest
• party
• partykit
• randomForestSRC
• ranger
• Rborist
• grf
2/2/2018 Shuma Ishigami 38
“randomForest”: Sample Codes
randomForest(x = X, y = Y,
na.action = na.fail,
ntree = 100)
X and Y are dataframe
2/2/2018 Shuma Ishigami 39
“party”: Sample Codes
cforest(formula = Y ~ X,
data = Data,
controls = cforest_unbiased(ntree = 10))
Data is a dataframe , consisting of Y and X
2/2/2018 Shuma Ishigami 40
“partykit”: Sample Codes
cforest(formula = Y ~ X,
data = Data,
ntree = 100)
Data is a dataframe , consisting of Y and X
2/2/2018 Shuma Ishigami 41
“randomForestSRC”: Sample Codes
rfsrc(formula = Y ~ X,
data = as.data.frame(Data),
na.action = "na.impute",
ntree = 100)
Data is a dataframe , consisting of Y and X
2/2/2018 Shuma Ishigami 42
“ranger”: Sample Codes
ranger(formula = Y ~ X,
data = as.data.frame(Data),
num.trees = 100)
Data is a dataframe , consisting of Y and X
2/2/2018 Shuma Ishigami 43
“Rborist”: Sample Codes
Rborist(x = X,
y = Y,
nTree = 100)
X and Y are dataframe
2/2/2018 Shuma Ishigami 44
“grf”: Sample Codes
custom_forest(X = X, Y = Y,
num.trees = 100)
X and Y are dataframe or matrix
2/2/2018 Shuma Ishigami 45
Attributes
a. Use factor variables as feature variables ?
b. Use numerical variables as feature variables ?
c. Can a package handle with missing values in feature variables ?
d. Use a factor variable as target variable ?
e. Use a numerical variable as target variable ?
f. Computation time
g. Parallel Processing
2/2/2018 Shuma Ishigami 46
Comparison Table
random
Forest
party partykit random
ForestSRC
ranger Rborist grf
a Yes.
But # of
levels
should be
less than 53.
Error with
NA.
Yes. Yes.
# of Levels
< 31.
Yes. Yes.
Error with
NA.
Yes.
Error with
NA.
No.
Can not
handle with
factor type
feature
variables.
b Yes.
Error with
NA.
Yes. Yes. Yes. Yes.
Error with
NA.
Yes.
Error with
NA.
Yes.
2/2/2018 Shuma Ishigami 47
random
Forest
party partykit random
ForestSRC
ranger Rborist grf
c Has a
function for
imputing
NA.
No. Has a
option for
how to
handle with
NA
Has a
function for
imputing
NA.
No. No. No.
d Yes Yes Yes Yes Yes Yes Yes
e Yes Yes Yes Yes Yes Yes No
f 3.96 sec 331.79 sec Not end in
sufficient
time
8.44 sec 5.07 sec 2.79 sec NA
g With
external
packages.
No. Mclapply OpenMP Can set # of
threads to
use
Use all
cores as
default
Can set # of
threads to
use
Notes: For time comparison, I generated a data with a factor type binary variable as target and 10 numerical feature
variables. The sample size is 100,000 and I measured time to grow a forest with 10 trees. I show the average of three
tries with different seeds of random number.
2/2/2018 Shuma Ishigami 48

Random forest r_packages_slides_02-02-2018_eng

  • 1.
    Introduction to RandomForest & R Packages for RF Shuma Ishigami 2/2/2018 Shuma Ishigami 1
  • 2.
    Agenda • Random ForestAlgorithm – Decision Tree – Bootstrapping – Random Forest • R packages for RF – Sample codes – Comparison 2/2/2018 Shuma Ishigami 2
  • 3.
    What is theRandom Forest Algorithm ? Random Forest = [Something “Randomized”]+[“Forest” consist of trees] = [Randomly chosen samples + Randomly selected feature vectors] + [Many Decision Trees] 2/2/2018 Shuma Ishigami 3
  • 4.
    Random Forest To putit simply, a random forest is A set of many Decision Trees, where each tree use Randomly Drawn Bootstrap Sample and Randomly selected predictors 2/2/2018 Shuma Ishigami 4
  • 5.
    Decision Tree • Asupervised* learning algorithm for both classification and regression • Has “Nodes” and “Branches”, (like a tree) • DT is a set of simple rules that spilt data into subgroup Notes: * When we say “supervised” learning, we give a computer with a training data with feature variables and “answer” so as the computer can learn/find a rule from these training questions. 2/2/2018 Shuma Ishigami 5
  • 6.
    Simple rule ? Thisrule only involves one feature variable, 𝑋1 Rule: Is 𝑋1 > a ? 2/2/2018 Shuma Ishigami 6 𝑋2 𝑋1 𝑋2 𝑋1 a This rule involves two feature variable, 𝑋1 and 𝑋2 Rule: Is 𝑋1 > b AND 𝑋2> c ? b c
  • 7.
    𝑋2 𝑋1 Example: Two categories(and ) and two feature variables(X1 and X2) 2/2/2018 Shuma Ishigami 7
  • 8.
    𝑋2 𝑋1 a b Try to dividedata into two categories by simple rules 2/2/2018 Shuma Ishigami 8
  • 9.
    𝑋1 > 𝑎? 𝑋2 > 𝑏 ? Yes Yes No No The set of rules in the previous slide can be represented in a tree from 2/2/2018 Shuma Ishigami 9
  • 10.
    𝑋2 𝑋1 a b Now our trainedDT categorizes any new input falling in the left-below area as class 2/2/2018 Shuma Ishigami 10
  • 11.
    New input 𝐶𝑙𝑎𝑠𝑠( )⇒ 𝑋2 𝑋1 a b Let’s try with a new input whose true class is The DT could classify the new input as . Correct, good job! 2/2/2018 Shuma Ishigami 11
  • 12.
    Issue in DecisionTree • Sensitive to Noise • A few noise in training data would greatly reduce prediction capability of the tree 2/2/2018 Shuma Ishigami 12
  • 13.
    𝑋2 𝑋1 Two noises intraining data 2/2/2018 Shuma Ishigami 13
  • 14.
    𝑋2 𝑋1 Adding only twonoises results in this very complex tree 2/2/2018 Shuma Ishigami 14
  • 15.
    𝑋2 𝑋1 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ Due to the few noises, a prediction of a new input goes wrong. 2/2/2018 Shuma Ishigami 15
  • 16.
    Bootstrap sampling • Bootstrapsampling method – Randomly draw samples with replacement from original data – Sample with replacement means every time we draw a sample from the data, we replace the data. We may draw the same sample more than once. – Ex. Assume we have an original data {A,B,C,D,E} and draw sample from it 3 times. First, picked up B from {A,B,C,D,E} . Second time, D from {A,B,C,D,E} . Third, B again from {A,B,C,D,E}. Contrast to sample without replacement, sample with replacement may be duplicated. 2/2/2018 Shuma Ishigami 16
  • 17.
    Bagging(Bootstrap AGGregatING) • Trainmany decision trees by using many bootstrapping samples • Then let each tree independently classify a new input data and decide the predicted class of the new data by majority rule • Essential idea is that because noises are rarely drawn as bootstrap samples so the effect of such noises is negligible 2/2/2018 Shuma Ishigami 17
  • 18.
    𝑋2 𝑋1 Randomly sample fromthe original dataset. This bootstrap sample luckily do not have any noise. 2/2/2018 Shuma Ishigami 18
  • 19.
    𝑋2 𝑋1 Let’s make adecision tree with this bootstrap sample. 2/2/2018 Shuma Ishigami 19
  • 20.
    𝑋2 𝑋1 Another bootstrap sampleand DT. This time a noise comes in. 2/2/2018 Shuma Ishigami 20
  • 21.
    𝑋2 𝑋1 Third sample andDT 2/2/2018 Shuma Ishigami 21
  • 22.
    𝑋2 𝑋1 Fourth sample andDT 2/2/2018 Shuma Ishigami 22
  • 23.
  • 24.
    𝑋1 𝑋2 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ Determine the class of the new input Our Forest(4 trees) now could categorize the new input correctly 2/2/2018 Shuma Ishigami 24
  • 25.
    Why Random Forest? •[Bootstraping sample] + [Many Decision trees] – Every tree use the same set of feature variables. This leads to high correlations among trees, resulting in limited improvement in prediction capability • Random Forest – To reduce correlation among a set of decision trees, RF assign randomly chosen feature variables with each tree and each tree categorize training data, relying only on the subset of feature variables – So in RF, decision trees use different bootstrapping samples and different options of feature variables 2/2/2018 Shuma Ishigami 25
  • 26.
    𝑋1 The algorithm assignsthis tree with X1 as a feature variable, so the tree tries to categorize the given sample by X1 2/2/2018 Shuma Ishigami 26
  • 27.
  • 28.
    𝑋2 This time, X2chosen as a feature to consider 2/2/2018 Shuma Ishigami 28
  • 29.
  • 30.
    Classification in RandomForest Let each tree in the forest to predict the class of the new input, and then take a vote New input 𝐶𝑙𝑎𝑠𝑠 ⇒ ? VS 2/2/2018 Shuma Ishigami 30
  • 31.
    𝑋1 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ 2/2/2018 Shuma Ishigami 31
  • 32.
    𝑋1 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ 2/2/2018 Shuma Ishigami 32
  • 33.
    𝑋2 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ 2/2/2018 Shuma Ishigami 33
  • 34.
    𝑋2 New input 𝐶𝑙𝑎𝑠𝑠( )⇒ 2/2/2018 Shuma Ishigami 34
  • 35.
    New input 𝐶𝑙𝑎𝑠𝑠 ⇒ 13 VS Taking a vote on the class of the new input, majority is in favor of 2/2/2018 Shuma Ishigami 35
  • 36.
    Out-Of-Bag(OOB) Error • Crossvalidation in RF • We call a sample as Out-Of-Bag sample in a tree, if the sample is NOT chosen as a bootstrapping sample that grows the decision tree. • For a sample, we can collect a set of trees where the sample become OOB in these trees and predict the class of the sample using these trees. OOB error is defined as the average of ( # of errors in prediction)/( # of trees that do not have the sample) for all sample 2/2/2018 Shuma Ishigami 36
  • 37.
    1 𝑋1𝑋1 Dark-color samples = Bootstrapping samples Light-colorsamples = Out-Of-Bag samples This OOB sample is predicted as in this tree 2/2/2018 Shuma Ishigami 37
  • 38.
    R Packages forRandom Forest • randomForest • party • partykit • randomForestSRC • ranger • Rborist • grf 2/2/2018 Shuma Ishigami 38
  • 39.
    “randomForest”: Sample Codes randomForest(x= X, y = Y, na.action = na.fail, ntree = 100) X and Y are dataframe 2/2/2018 Shuma Ishigami 39
  • 40.
    “party”: Sample Codes cforest(formula= Y ~ X, data = Data, controls = cforest_unbiased(ntree = 10)) Data is a dataframe , consisting of Y and X 2/2/2018 Shuma Ishigami 40
  • 41.
    “partykit”: Sample Codes cforest(formula= Y ~ X, data = Data, ntree = 100) Data is a dataframe , consisting of Y and X 2/2/2018 Shuma Ishigami 41
  • 42.
    “randomForestSRC”: Sample Codes rfsrc(formula= Y ~ X, data = as.data.frame(Data), na.action = "na.impute", ntree = 100) Data is a dataframe , consisting of Y and X 2/2/2018 Shuma Ishigami 42
  • 43.
    “ranger”: Sample Codes ranger(formula= Y ~ X, data = as.data.frame(Data), num.trees = 100) Data is a dataframe , consisting of Y and X 2/2/2018 Shuma Ishigami 43
  • 44.
    “Rborist”: Sample Codes Rborist(x= X, y = Y, nTree = 100) X and Y are dataframe 2/2/2018 Shuma Ishigami 44
  • 45.
    “grf”: Sample Codes custom_forest(X= X, Y = Y, num.trees = 100) X and Y are dataframe or matrix 2/2/2018 Shuma Ishigami 45
  • 46.
    Attributes a. Use factorvariables as feature variables ? b. Use numerical variables as feature variables ? c. Can a package handle with missing values in feature variables ? d. Use a factor variable as target variable ? e. Use a numerical variable as target variable ? f. Computation time g. Parallel Processing 2/2/2018 Shuma Ishigami 46
  • 47.
    Comparison Table random Forest party partykitrandom ForestSRC ranger Rborist grf a Yes. But # of levels should be less than 53. Error with NA. Yes. Yes. # of Levels < 31. Yes. Yes. Error with NA. Yes. Error with NA. No. Can not handle with factor type feature variables. b Yes. Error with NA. Yes. Yes. Yes. Yes. Error with NA. Yes. Error with NA. Yes. 2/2/2018 Shuma Ishigami 47
  • 48.
    random Forest party partykit random ForestSRC rangerRborist grf c Has a function for imputing NA. No. Has a option for how to handle with NA Has a function for imputing NA. No. No. No. d Yes Yes Yes Yes Yes Yes Yes e Yes Yes Yes Yes Yes Yes No f 3.96 sec 331.79 sec Not end in sufficient time 8.44 sec 5.07 sec 2.79 sec NA g With external packages. No. Mclapply OpenMP Can set # of threads to use Use all cores as default Can set # of threads to use Notes: For time comparison, I generated a data with a factor type binary variable as target and 10 numerical feature variables. The sample size is 100,000 and I measured time to grow a forest with 10 trees. I show the average of three tries with different seeds of random number. 2/2/2018 Shuma Ishigami 48

Editor's Notes

  • #8 
  • #9 決定木の学習
  • #48 Notes: f: packageの速さについて Factor ~ Numeric model. 2 level factor . 10 random generated numeric features without missing value. 10 trees. Default for other options. 100,000 samples. Average of 3 trials.