Department of
CSE
COURSE NAME: DATA
WAREHOUSE AND MINING
COURSE CODE: 22DSB3202
Topic:
Classification by Decision
Tree Induction
Session – 14
AIM OF THE
SESSION
An ability to understand the Decision tree and different types of classification.
INSTRUCTIONAL
OBJECTIVES
This Session is designed to:
1. Demonstrate Basic Learning Methods
2. Describe best attribute selection
3. Procedure of entropy information gain
4. Describe the concept of overfitting tree pruning
LEARNING OUTCOMES
At the end of this session, you should be able to:
1. Define different of methods of Classifications
2. Describe Algorithm for Decision tree Induction
3. Summarize the concept of Decision Tree induction
SESSION INTRODUCTION
Rule Induction:
Rule induction is a data mining process of
deducing if-then rules from a data set. These
symbolic decision rules explain an inherent
relationship between the attributes and class
labels in the data set. Many real-life
experiences are based on intuitive rule
induction. For example, we can proclaim a rule
that states “if it is 8 a.m. on a weekday, then
highway traffic will be heavy” and “if it is 8 p.m.
on a Sunday, then the traffic will be light.
SESSION INTRODUCTION
INTRODUCTION OF DECISION TREE INDUCTION
Decision Tree is a supervised learning method used in data mining
for classification and regression methods. It is a tree that helps us
in decision-making purposes. The decision tree creates
classification or regression models as a tree structure. It separates
a data set into smaller subsets, and at the same time, the decision
tree is steadily developed. The final tree is a tree with the decision
nodes and leaf nodes. A decision node has at least two branches.
The leaf nodes show a classification or decision. We can't
accomplish more split on leaf nodes-The uppermost decision node
in a tree that relates to the best predictor called the root node.
Decision trees can deal with both categorical and numerical data.
SESSION DESCRIPTION
Decision Tree is the most powerful
and popular tool for classification
and prediction. A Decision tree is a
flowchart-like tree structure, where
each internal node denotes a test on
an attribute, each branch represents
an outcome of the test, and each leaf
node (terminal node) holds a class
label.
SESSION DESCRIPTION
Short note on Decision Tree:-
•A decision tree which is also known as prediction tree refers a tree structure to mention the
sequences of decisions as well as consequences.
•Considering the input X = (X1, X2,… Xn), the aim is to predict a response or output variable Y.
•Each element in the set (X1, X2,…Xn) is known as input variable. It is possible to achieve the
prediction by the process of building a decision tree which has test points as well as branches.
•At each test point, it is decided to select a particular branch and traverse down the tree.
•Ultimately, a final point is reached, and it will be easy to make prediction.
•In a decision tree, all the test points exhibit testing specific input variables (or attributes), and the
developed decision tree is represented by the branches.
•Because of flexibility as well as simple visualization, decision trees are mostly probably deployed in
data mining applications for the purpose of classification.
•In the decision tree, the input values are considered as categorical or continuous.
•A structure of test points (known as nodes) and branches is established by the decision tree by
which the decision being made will be represented.
•Leaf node is the one which do not have further branches. The returning value of leaf nodes is class
labels while in some cases they return the probability scores.
•It is possible to convert decision tree into a set of decision rules.
•There are two types of Decision trees: classification trees and regression trees.
SESSION DESCRIPTION
•Classification trees are generally applied to output variables which are
categorical and mostly binary in nature, for example yes or no, sale or
not, and so on.
•Whereas regression trees are applied to output variables which are
numeric or continuous, for example predicted price of a consumer good.
•In variety of situations, it is possible to apply decision tree. It is easy to
represent them in a visual way, and the analogous straightforward.
•Also as the result is a sequence of logical if-then statements, there is no
any presence of underlying assumption regarding a linear or nonlinear
relationship between the input variables and the response variable
SESSION DESCRIPTION
An Illustrative Example (1/2)
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
SESSION DESCRIPTION (Cont..)
Overfitting in Decision Trees
Consider adding noisy training example
<Sunny, Hot, Normal, Strong, PlayTennis = No>
What effect on earlier tree?
SESSION DESCRIPTION (Cont..)
Problem of Overfitting can be solved by selecting the best
attribute of split.
• So for that
Information we
gain need
is the the concept
expected reduction inof entropy
entropy causedand information
by partitioning the
gain.
examples on an attribute.
• The higher the information gain the more effective the attribute in
classifying training data.
• Expected reduction in entropy knowing A
Gain(S, A) = Entropy(S) − Entropy(Sv)
v Values(A)
Values(A) possible values for A
Sv subset of S for which A has value v
SESSION DESCRIPTION (Cont..)
Concept of Entropy
If a point represents a gas molecule,
then which system has the more
entropy?
How to measure? ?
More ordered Less ordered
less entropy higher entropy
More organized or Less organized or ordered
(less probable) disordered (more probable )
Following figure shows three possibilities
for partitioning tuples based on the
splitting criterion, each with examples.
A is discrete-valued
A is continuous-valued
A is discrete-valued and a binary
tree must be produced
12/03/2024
ENTROPY AND INFORMATION THEORY
• Entropy specifies the number the average length (in bits) of the message needed to
transmit the outcome of a random variable. This depends on the probability distribution.
• Optimal length code assigns log2 p bits to messages with probability p. Most probable
messages get shorter codes.
• Example: 8-sided [unbalanced] die
1 2 3 4 5 6 7 8
4/16 4/16 2/16 2/16 1/16 1/16 1/16 1/16
2 bits 2 bits 3 bits 3 bits 4bits 4bits 4bits 4bits
E = (1/4 log2 4) 2 + (1/8 log2 8) 2 + (1/16 log2 16) 4 = 1+3/4+1 = 2,75
Maria Simi
12/03/2024
INFORMATION GAIN AS ENTROPY
REDUCTION
• Information gain is the expected reduction in entropy caused by partitioning the
examples on an attribute.
• The higher the information gain the more effective the attribute in classifying
training data.
• Expected reduction in entropy knowing A
|Sv|
Gain(S, A) = Entropy(S) − Entropy(Sv) |S|
v Values(A)
Values(A) possible values for A
Sv subset of S for which A has value v
Maria Simi
12/03/2024
EXAMPLE: EXPECTED INFORMATION GAIN
• Let
• Values(Wind) = {Weak, Strong}
• S = [9+, 5−]
• SWeak = [6+, 2−]
• SStrong = [3+, 3−]
• Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0,94 − 8/14 0,811 − 6/14 1,00
= 0,048
Maria Simi
12/03/2024
WHICH ATTRIBUTE IS THE BEST CLASSIFIER?
Maria Simi
EXAMPLE
12/03/2024
Maria Simi
12/03/2024
FIRST STEP: WHICH ATTRIBUTE TO TEST AT THE ROOT?
• Which attribute should be tested at the root?
• Gain(S, Outlook) = 0.246
• Gain(S, Humidity) = 0.151
• Gain(S, Wind) = 0.084
• Gain(S, Temperature) = 0.029
• Outlook provides the best prediction for the target
• Lets grow the tree:
• add to the tree a successor for each possible value of Outlook
• partition the training samples according to the value of Outlook
Maria Simi
12/03/2024
AFTER FIRST STEP
Maria Simi
12/03/2024
SECOND STEP
Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 3/5 0.0 2/5 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 2/5 1.0 3.5 0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 2/5 0.0 2/5 1.0 1/5 0.0 = 0.570
Humidity provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Humidity
partition the training samples according to the value of Humidity
Maria Simi
SECOND AND THIRD 12/03/2024
STEPS
Maria Simi
Calculate Entropy
ACTIVITIES/ CASE STUDIE/ IMPORTANT FACTS RELATED TO
THE SESSION
CASE STUDY
Induction of a decision tree using
information gain
presents a training set, D, of class-labeled
tuples randomly selected from the All-
Electronics customer database.
The class label attribute, buys computer, has
two distinct values (namely, {yes, no}).
therefore, there are two distinct classes (i.e., m
D 2). Let class C1 correspond to yes and class
C2 correspond to no.
There are nine tuples of class yes and five
tuples of class no.
EXAMPLES
Cont.
A (root) node N is created for the tuples in D.
To find the splitting criterion for these tuples, we must compute the
information gain of each attribute.
Now, put required values-
Next, we need to compute the expected information requirement for
each attribute.
Let’s start with the attribute age.
The expected information needed to classify a tuple in D if the tuples are
partitioned according to age is-
Hence, the gain in information from such a partitioning would be-
Similarly, we can compute Gain.income/ D 0.029 bits, Gain.student/ D
0.151 bits, and Gain.credit rating/ D 0.048 bits.
SUMMARY
Because age has the highest information gain
among the attributes, it is selected as the
splitting attribute.
Node N is labeled with age, and branches are
grown for each of the attribute’s values.
The tuples falling into the partition for age =
middle_ aged all belong to the same class.
Note: The attribute age has the highest
information gain and therefore becomes the
splitting attribute at the root node of the
decision tree. Branches are grown for each
outcome of age. The tuples are shown
partitioned accordingly.
Tree Pruning
Algorithm for Decision
tree Induction
Inductive inference with decision trees
Decision Trees is one of the most
widely used and practical
methods of inductive inference
Features
Method for approximating discrete-
valued functions (including
boolean)
Learned functions are represented
as decision trees (or if-then-else
rules)
Expressive hypotheses space,
including disjunction
Robust to noisy data
SELF-ASSESSMENT QUESTIONS
1. … A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
2. Choose from the following that are Decision Tree nodes?
(a) a) Decision
Nodes
b) End Nodes
c) Chance Nodes
d) All of the
mentioned
TERMINAL QUESTIONS
1. Describe decision tree induction
2. List out the advantages and disadvantages of decision tree
3. Analyze and illustrate decision tree algorithim with example
4. Summarize the concept of entropy ,information gain and gain ratio .
REFERENCES FOR FURTHER LEARNING OF THE
SESSION
Reference Books:
• Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier, 2011.
• https://www.upgrad.com/blog/data-mining-techniques/
• https://www.javatpoint.com/data-mining-techniques
• https://www.datasciencecentral.com/profiles/blogs/the-7-most-important-data-
mining-techniques
• https://onix-Classifications.com/blog/8-data-mining-techniques-you-must-learn-to-
succeed-in-business
• https://www.infogix.com/top-5-data-mining-techniques/
Sites and Web links:
1. https://www.geeksforgeeks.org/data-mining/
2. https://www.javatpoint.com/data-mining
3. https://www.springboard.com/blog/data-science/data-mining/
4. https://onlinecourses.nptel.ac.in/noc21_cs06/preview
5. https://www.codingninjas.com/codestudio/library/rule-based-classification-in-data-
mining
THANK YOU
Team – Course Name