0% found this document useful (0 votes)
33 views55 pages

Decision Tree Induction Basics

Uploaded by

Sailaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views55 pages

Decision Tree Induction Basics

Uploaded by

Sailaja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Department of

CSE
COURSE NAME: DATA
WAREHOUSE AND MINING
COURSE CODE: 22DSB3202
Topic:

Classification by Decision
Tree Induction

Session – 14
AIM OF THE
SESSION
An ability to understand the Decision tree and different types of classification.

INSTRUCTIONAL
OBJECTIVES
This Session is designed to:
1. Demonstrate Basic Learning Methods
2. Describe best attribute selection
3. Procedure of entropy information gain
4. Describe the concept of overfitting tree pruning

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define different of methods of Classifications
2. Describe Algorithm for Decision tree Induction
3. Summarize the concept of Decision Tree induction
SESSION INTRODUCTION

Rule Induction:
Rule induction is a data mining process of
deducing if-then rules from a data set. These
symbolic decision rules explain an inherent
relationship between the attributes and class
labels in the data set. Many real-life
experiences are based on intuitive rule
induction. For example, we can proclaim a rule
that states “if it is 8 a.m. on a weekday, then
highway traffic will be heavy” and “if it is 8 p.m.
on a Sunday, then the traffic will be light.
SESSION INTRODUCTION

INTRODUCTION OF DECISION TREE INDUCTION


Decision Tree is a supervised learning method used in data mining
for classification and regression methods. It is a tree that helps us
in decision-making purposes. The decision tree creates
classification or regression models as a tree structure. It separates
a data set into smaller subsets, and at the same time, the decision
tree is steadily developed. The final tree is a tree with the decision
nodes and leaf nodes. A decision node has at least two branches.
The leaf nodes show a classification or decision. We can't
accomplish more split on leaf nodes-The uppermost decision node
in a tree that relates to the best predictor called the root node.
Decision trees can deal with both categorical and numerical data.
SESSION DESCRIPTION

Decision Tree is the most powerful


and popular tool for classification
and prediction. A Decision tree is a
flowchart-like tree structure, where
each internal node denotes a test on
an attribute, each branch represents
an outcome of the test, and each leaf
node (terminal node) holds a class
label.
SESSION DESCRIPTION
Short note on Decision Tree:-
•A decision tree which is also known as prediction tree refers a tree structure to mention the
sequences of decisions as well as consequences.
•Considering the input X = (X1, X2,… Xn), the aim is to predict a response or output variable Y.
•Each element in the set (X1, X2,…Xn) is known as input variable. It is possible to achieve the
prediction by the process of building a decision tree which has test points as well as branches.
•At each test point, it is decided to select a particular branch and traverse down the tree.
•Ultimately, a final point is reached, and it will be easy to make prediction.
•In a decision tree, all the test points exhibit testing specific input variables (or attributes), and the
developed decision tree is represented by the branches.
•Because of flexibility as well as simple visualization, decision trees are mostly probably deployed in
data mining applications for the purpose of classification.
•In the decision tree, the input values are considered as categorical or continuous.
•A structure of test points (known as nodes) and branches is established by the decision tree by
which the decision being made will be represented.
•Leaf node is the one which do not have further branches. The returning value of leaf nodes is class
labels while in some cases they return the probability scores.
•It is possible to convert decision tree into a set of decision rules.
•There are two types of Decision trees: classification trees and regression trees.
SESSION DESCRIPTION

•Classification trees are generally applied to output variables which are


categorical and mostly binary in nature, for example yes or no, sale or
not, and so on.
•Whereas regression trees are applied to output variables which are
numeric or continuous, for example predicted price of a consumer good.
•In variety of situations, it is possible to apply decision tree. It is easy to
represent them in a visual way, and the analogous straightforward.
•Also as the result is a sequence of logical if-then statements, there is no
any presence of underlying assumption regarding a linear or nonlinear
relationship between the input variables and the response variable
SESSION DESCRIPTION

An Illustrative Example (1/2)


Day Outlook Temperature Humidity Wind Play Tennis

D1 Sunny Hot High Weak No


D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
SESSION DESCRIPTION (Cont..)

Overfitting in Decision Trees

Consider adding noisy training example


<Sunny, Hot, Normal, Strong, PlayTennis = No>
What effect on earlier tree?
SESSION DESCRIPTION (Cont..)

Problem of Overfitting can be solved by selecting the best


attribute of split.
• So for that
Information we
gain need
is the the concept
expected reduction inof entropy
entropy causedand information
by partitioning the
gain.
examples on an attribute.
• The higher the information gain the more effective the attribute in
classifying training data.
• Expected reduction in entropy knowing A

Gain(S, A) = Entropy(S) − Entropy(Sv)


v  Values(A)

Values(A) possible values for A

Sv subset of S for which A has value v


SESSION DESCRIPTION (Cont..)

Concept of Entropy
If a point represents a gas molecule,
then which system has the more
entropy?

How to measure? ?

More ordered Less ordered


less entropy higher entropy
More organized or Less organized or ordered
(less probable) disordered (more probable )
Following figure shows three possibilities
for partitioning tuples based on the
splitting criterion, each with examples.
A is discrete-valued
A is continuous-valued
A is discrete-valued and a binary
tree must be produced
12/03/2024

ENTROPY AND INFORMATION THEORY

• Entropy specifies the number the average length (in bits) of the message needed to
transmit the outcome of a random variable. This depends on the probability distribution.

• Optimal length code assigns  log2 p bits to messages with probability p. Most probable
messages get shorter codes.
• Example: 8-sided [unbalanced] die
1 2 3 4 5 6 7 8
4/16 4/16 2/16 2/16 1/16 1/16 1/16 1/16
2 bits 2 bits 3 bits 3 bits 4bits 4bits 4bits 4bits
E = (1/4 log2 4)  2 + (1/8 log2 8)  2 + (1/16 log2 16)  4 = 1+3/4+1 = 2,75

Maria Simi
12/03/2024

INFORMATION GAIN AS ENTROPY


REDUCTION
• Information gain is the expected reduction in entropy caused by partitioning the
examples on an attribute.
• The higher the information gain the more effective the attribute in classifying
training data.
• Expected reduction in entropy knowing A
|Sv|
Gain(S, A) = Entropy(S) − Entropy(Sv) |S|
v  Values(A)

Values(A) possible values for A

Sv subset of S for which A has value v

Maria Simi
12/03/2024

EXAMPLE: EXPECTED INFORMATION GAIN

• Let
• Values(Wind) = {Weak, Strong}
• S = [9+, 5−]
• SWeak = [6+, 2−]
• SStrong = [3+, 3−]
• Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)

= 0,94 − 8/14  0,811 − 6/14  1,00


= 0,048

Maria Simi
12/03/2024

WHICH ATTRIBUTE IS THE BEST CLASSIFIER?

Maria Simi
EXAMPLE
12/03/2024

Maria Simi
12/03/2024

FIRST STEP: WHICH ATTRIBUTE TO TEST AT THE ROOT?

• Which attribute should be tested at the root?


• Gain(S, Outlook) = 0.246
• Gain(S, Humidity) = 0.151
• Gain(S, Wind) = 0.084
• Gain(S, Temperature) = 0.029
• Outlook provides the best prediction for the target
• Lets grow the tree:
• add to the tree a successor for each possible value of Outlook
• partition the training samples according to the value of Outlook

Maria Simi
12/03/2024

AFTER FIRST STEP

Maria Simi
12/03/2024

SECOND STEP

 Working on Outlook=Sunny node:


Gain(SSunny, Humidity) = 0.970  3/5  0.0  2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970  2/5  1.0  3.5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970  2/5  0.0  2/5  1.0  1/5  0.0 = 0.570

 Humidity provides the best prediction for the target


 Lets grow the tree:
 add to the tree a successor for each possible value of Humidity
 partition the training samples according to the value of Humidity

Maria Simi
SECOND AND THIRD 12/03/2024

STEPS

Maria Simi
Calculate Entropy
ACTIVITIES/ CASE STUDIE/ IMPORTANT FACTS RELATED TO
THE SESSION
CASE STUDY
Induction of a decision tree using
information gain
presents a training set, D, of class-labeled
tuples randomly selected from the All-
Electronics customer database.
The class label attribute, buys computer, has
two distinct values (namely, {yes, no}).
therefore, there are two distinct classes (i.e., m
D 2). Let class C1 correspond to yes and class
C2 correspond to no.
There are nine tuples of class yes and five
tuples of class no.
EXAMPLES

Cont.
A (root) node N is created for the tuples in D.
To find the splitting criterion for these tuples, we must compute the
information gain of each attribute.

Now, put required values-

Next, we need to compute the expected information requirement for


each attribute.
Let’s start with the attribute age.
The expected information needed to classify a tuple in D if the tuples are
partitioned according to age is-

Hence, the gain in information from such a partitioning would be-

Similarly, we can compute Gain.income/ D 0.029 bits, Gain.student/ D


0.151 bits, and Gain.credit rating/ D 0.048 bits.
SUMMARY

Because age has the highest information gain


among the attributes, it is selected as the
splitting attribute.
Node N is labeled with age, and branches are
grown for each of the attribute’s values.
The tuples falling into the partition for age =
middle_ aged all belong to the same class.
Note: The attribute age has the highest
information gain and therefore becomes the
splitting attribute at the root node of the
decision tree. Branches are grown for each
outcome of age. The tuples are shown
partitioned accordingly.
Tree Pruning
Algorithm for Decision
tree Induction
Inductive inference with decision trees

Decision Trees is one of the most


widely used and practical
methods of inductive inference
Features
Method for approximating discrete-
valued functions (including
boolean)
Learned functions are represented
as decision trees (or if-then-else
rules)
Expressive hypotheses space,
including disjunction
Robust to noisy data
SELF-ASSESSMENT QUESTIONS

1. … A _________ is a decision support tool that uses a tree-like graph or model of


decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks

2. Choose from the following that are Decision Tree nodes?


(a) a) Decision
Nodes
b) End Nodes
c) Chance Nodes
d) All of the
mentioned
TERMINAL QUESTIONS

1. Describe decision tree induction

2. List out the advantages and disadvantages of decision tree

3. Analyze and illustrate decision tree algorithim with example

4. Summarize the concept of entropy ,information gain and gain ratio .


REFERENCES FOR FURTHER LEARNING OF THE
SESSION
Reference Books:
• Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier, 2011.
• https://www.upgrad.com/blog/data-mining-techniques/
• https://www.javatpoint.com/data-mining-techniques
• https://www.datasciencecentral.com/profiles/blogs/the-7-most-important-data-
mining-techniques
• https://onix-Classifications.com/blog/8-data-mining-techniques-you-must-learn-to-
succeed-in-business
• https://www.infogix.com/top-5-data-mining-techniques/
Sites and Web links:
1. https://www.geeksforgeeks.org/data-mining/
2. https://www.javatpoint.com/data-mining
3. https://www.springboard.com/blog/data-science/data-mining/
4. https://onlinecourses.nptel.ac.in/noc21_cs06/preview
5. https://www.codingninjas.com/codestudio/library/rule-based-classification-in-data-
mining
THANK YOU

Team – Course Name

You might also like