0% found this document useful (0 votes)
30 views

Session 17-Decision Tree

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Session 17-Decision Tree

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Decision Tree

Dr. Rajiv Kumar


IIM Kashipur

Note: Content used in this PPT has copied from various source.
Why decision trees?

• It enables analysts to completely analyze all possible consequences of


a decision, quantify the values of outcomes and the probabilities of
achieving them.

• Decision trees are very intuitive and easy to explain.

• Decision trees require relatively little effort from users for data
preparation. Missing values does not prevent splitting the data to
build the trees. They are also not sensitive to the presence of outliers.
Application areas of decision trees

• Increase capacity vs outsourcing to fulfil demand.

• Purchase cars for company car fleet or to get them


on lease.

• When to launch a new product?

• Which celebrity to invite to endorse your product?


What is a decision tree?

A decision tree is a tree-like structure in which an internal node (decision


node) represents a test on an attribute. The outcome of the test is
represented by a branch. Each leaf / terminal node represents class label.
A path from root to leaf represents classification rules.

Gender Root Node


Depth = 1 Female Male
Branch
Age Income Internal Node
(Decision node)
<=30 >30 <=50,000 > 50,000
Yes No Yes No Leaf Node
Advantages of a decision tree

• Does not require domain knowledge / expertise.


• Is easy to comprehend.
• The classification steps of a decision tree are simple and
fast.
• Works with both numerical as well as categorical data.
• Able to handle both continuous and discrete attributes
• Scales to big data
• Requires very little data preparation (as it works with NAs,
no need for normalization, etc.)
Decision Tree Representation in R

Representation using ‘party’ Package


The party package contains many functions but the core function is the ctree() function.
It follows the concept of recursive partitioning and embeds the tree-structured models
into conditional inference procedures.

ctree(formula, data, controls = ctree_control ()…)

where, formula argument defines a symbolic description of the model to be fit using “~”
symbol; data argument defines data frame that contains the variables in the selected
model; controls argument is an optional argument that contains an object of class
TreeControl. It is obtained using ctree_control; the dots “…”define other optional
arguments.
Decision Tree Representation in R

Representation using “rpart” Package


The core function of the package is rpart() that fits the given data into a
fit model. The basic syntax of the rpart() function is as follows:

rpart(formula, data, method = (anova/class/poisson/exp)…)

where, formula argument defines a symbolic description of the model to


be fit using “~” symbol; data argument defines data frame that contains
the variables in the selected model; method argument is an optional
argument that defines the method through which a model is
implemented; the dots “…” define other optional arguments.
ID3 Algorithm

ID3 algorithm is one of the most used basic decision tree algorithm. In 1983, Ross
Quinlan developed this algorithm. The basic concept of ID3 is to construct a tree by
following the top-down and a greedy search methodology. It constructs the tree that
starts from the root of the tree and moves downside of the tree. In addition, for
performing the testing of each attribute at every node, greedy method is used. The ID3
algorithm does not require any backtracking for creating the tree.

R language provides a package “data.tree” for implementing the ID3 algorithm. The
package “data.tree” creates a tree from the hierarchical data. It provides many methods
for traversing the tree in different orders. After converting the tree data into a data
frame, any operation like print, aggregation can be applied on it. Due to this, many
applications like machine learning, financial data analysis use this package.
Measuring Features

Entropy—Measures Homogeneity
Entropy measures the impurity of collected samples that contain positive and negative
labels. A dataset is pure if it contains only a single class; otherwise, the dataset is impure.
Entropy calculates the information gain for an attribute
of the tree. In simple words, entropy measures the homogeneity of the dataset. ID3
algorithm uses entropy to calculate the homogeneity of a sample. The entropy is zero if
the sample is completely homogeneous and if the sample is equally divided (50% - 50%)
it has entropy of one.
Information Gain—Measures the Expected Reduction in Entropy

The expected reduction of the entropy that is related to the specified attribute during
the splitting of decision tree node is called the information gain. Let the Gain(S, A) be
the information gain of an attribute A. Then the information gain is defined by the
following formula:
Gain(S, A) = Entropy(S) -
Measuring Features

Entropy—Measures Homogeneity
Entropy measures the impurity of collected samples that contain positive and negative
labels. A dataset is pure if it contains only a single class; otherwise, the dataset is impure.
Entropy calculates the information gain for an attribute of the tree. In simple words,
entropy measures the homogeneity of the dataset. ID3 algorithm uses entropy to
calculate the homogeneity of a sample. The entropy is zero if the sample is completely
homogeneous and if the sample is equally divided (50% - 50%) it has entropy of one.
Information Gain—Measures the Expected Reduction in Entropy

The expected reduction of the entropy that is related to the specified attribute during
the splitting of decision tree node is called the information gain. Let the Gain(S, A) be
the information gain of an attribute A. Then the information gain is defined by the
following formula:
Gain(S, A) = Entropy(S) -
Inductive Bias In Decision Tree Learning

• An inductive bias is a set of assumptions that includes training data for


predicting the output from the given input data. It is also called
learning bias whose main objective is to design an algorithm that has
the ability to learn and predict the outcome. For this, the learning
algorithms use training examples that define the relationship between
input and output. Each algorithm has different inductive biases.

• The inductive bias of the ID3 decision tree learning is the shortest tree.
Hence, when ID3 or any other decision tree learning classifies the tree,
then the shortest tree is preferred over larger trees for the induction
bias. Also, the trees that place high information gain attributes that are
close to the root are also preferred over those that are not close and
they are used as inductive bias.
Issues in Decision Tree Learning

• Overfitting is one of the major issues in decision tree


learning. The decision tree grows each ranch deeply to
classify the training data and instance. However, in case the
training data is small or the data is noisy, then the overfitting
problem occurs. In simple words, decision tree is perfect to
classify training data, but it does not perform well on
unknown real-world instances. It happens due to noise in the
training data and a number of training instances that are too
small to fit.
Issues in Decision Tree Learning

Avoiding Overfitting the Data

• To avoid overfitting, stop growing the tree earlier. If the tree stops growing, then the
problem automatically resolves since the obtained training set is already small in size
and easily fits into the model.

• Another method uses a separate set of examples that do not include any training
data. For this, training and validation set method can be used. This method works
even if the training set is misled due to random errors. The validation set exhibits the
same random fluctuations by 2/3 training set and 1/3 validation set.

• The next method for avoiding overfitting is to use a statistical test. It estimates
whether to expand a node of tree or not. In addition, the test that expands a node
improves performance beyond the training set.
Issues in Decision Tree Learning

Reduced Error Pruning: Pruning or reduced error pruning is another method for resolving
overfitting problems. The simple concept of pruning is to remove subtrees from a tree. The
reduced error pruning algorithm goes through the entire tree and removes the nodes including
the subtree of that node that have no negative effect on the accuracy of the decision tree. It turns
the subtree into a leaf node with the most common label.

Rule Post-Pruning
• Rule post-pruning is the best method for resolving the overfitting problem that gives high
accuracy hypotheses. This method prunes the tree and reduces the overfitting problem. The
steps of the rule post-pruning method are as follows:
• Infer the decision tree from the training set and grow the tree until the training data is fitted as
well as possible. It allows overfitting to happen.
• Now convert the learned tree into an equivalent set of rules by creating one rule for each path
from the root node to a leaf node.
• Prune each rule by removing any precondition that results in improving its estimates accuracy.
• At last, sort the pruned rules by their estimates accuracy and consider them in this sequence
when classifying subsequent instances.
Decision Tree in R (1 of 2)

library(party) Output
ct<-ctree(speed~dist, data=cars)
plot(ct)
Decision Tree in R (1 of 2)

Classify a Case

library(rpart)
nativeSpeaker_find<-data.frame("age" = 11, "shoeSize" = 30.63692,
"score" = 55.721149)

model <-rpart(nativeSpeaker ~ age + shoeSize + score,


data=readingSkills)
prediction <-predict(model, newdata=nativeSpeaker_find, type = "class")
print(prediction)
Output

You might also like