0% found this document useful (0 votes)

3 views7 pages

Data Mining Practical 8

The document outlines an experiment on Decision Tree Learning, focusing on its definition, construction, and the ID3 algorithm. It explains how decision trees can be used to make decisions based on various attributes and provides a detailed example of using ID3 to classify whether to play baseball based on weather conditions. Additionally, it includes an exercise for constructing a decision tree using a customer database for credit card approval.

Uploaded by

akhilpapa303

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Data Mining Practical 8

Uploaded by

akhilpapa303

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

Shree Swaminarayan Institute of Technology CE DEPT.

(VI SEMESTER)

Date:
Student Name:
Student Enrollment No:

EXPERIMENT NO: 8

TITLE: Decision Tree Learning.

OBJECTIVE:On completion of this exercise student will able to know about…

 What is Decision tree?

 Basic concept about decision tree.
 How to construct Decision Tree?
 Decision tree algorithm (ID3)

THEORY:

What is decision Tree?

Imagine you only ever do four things at the weekend: go shopping, watch a movie, play tennis or just stay
in. What you do depends on three things: the weather (windy, rainy or sunny); how much money you
have (rich or poor) and whether your parents are visiting. You say to your yourself: if my parents are
visiting, we'll go to the cinema. If they're not visiting and it's sunny, then I'll play tennis, but if it's windy,
and I'm rich, then I'll go shopping. If they're not visiting, it's windy and I'm poor, then I will go to the
cinema. If they're not visiting and it's rainy, then I'll stay in.
To remember all this, you draw a flowchart which will enable you to read off your decision. We call such
diagrams decision trees. A suitable decision tree for the weekend decision choices would be as follows:

Figure 1
We can see why such diagrams are called trees, because, while they are admittedly upside down, they
start from a root and have branches leading to leaves (the tips of the graph at the bottom). Note that the

Student Name (Enrollment No) Page no

Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

leaves are always decisions, and a particular decision might be at the end of multiple branches (for
example, we could choose to go to the cinema for two different reasons).
Armed with our decision tree, on Saturday morning, when we wake up, all we need to do is check (a) the
weather (b) how much money we have and (c) whether our parent's car is parked in the drive. The
decision tree will then enable us to make our decision. Suppose, for example, that the parents haven't
turned up and the sun is shining. Then this path through our decision tree will tell us what to do:

Figure 2

and hence we run off to play tennis because our decision tree told us to. Note that the decision tree covers
all eventualities. That is, there are no values that the weather, the parents turning up or the money
situation could take which aren't catered for in the decision tree. Note that, in this lecture, we will be
looking at how to automatically generate decision trees from examples, not at how to turn thought
processes into decision trees.

Reading Decision Trees

There is a link between decision tree representations and logical representations, which can be exploited
to make it easier to understand (read) learned decision trees. If we think about it, every decision tree is
actually a disjunction of implications (if ... then statements), and the implications are Horn clauses: a
conjunction of literals implying a single literal. In the above tree, we can see this by reading from the root
node to each leaf node:

If the parents are visiting, then go to the cinema

Or
if the parents are not visiting and it is sunny, then play tennis
Or
If the parents are not visiting and it is windy and you're rich, then go shopping
Or
Student Name (Enrollment No) Page no
Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

If the parents are not visiting and it is windy and you're poor, then go to cinema
Or
If the parents are not visiting and it is rainy, then stay in.

Of course, this is just a re-statement of the original mental decision making process we described.
Remember, however, that we will be programming an agent to learn decision trees from example, so this
kind of situation will not occur as we will start with only example situations. It will therefore be important
for us to be able to read the decision tree the agent suggests.

ID3 Algorithm

Start from root node of decision tree, testing the attribute specified by thisnode, then moving down the
tree branch according to the attribute value in thegiven set. This process is the repeated at the sub-tree
level.

What is decision tree learning algorithm suited for:

1. Instance is represented as attribute-value pairs. For example, attribute 'Temperature' and its value
'hot', 'mild', 'cool'. We are also concerning to extend attribute -value to continuous-valued data
(numeric attribute value) in our project.
2. The target function has discrete output values. It can easily deal with instance which is assigned to
a Boolean decision, such as 'true' and 'false', 'p(positive)' and 'n(negative)'. Although it is possible
to extend target to realvalued outputs, we will cover the issue in the later part of this report.
3. The training data may contain errors. This can be dealt with pruning techniques that we will not
cover here.

The 3 widely used decision tree learning algorithms are: ID3, ASSISTANTand C4.5. We will cover ID3
in this experiment.

Decision tree learning is attractive for 3 reasons: (Paul Utgoff& Carla Brodley, 1990)

1. Decision tree is a good generalization for unobserved instance, only if the instances are described
in terms of features that are correlated with the target concept.
2. The methods are efficient in computation that is proportional to the number of observed training
instances.
3. The resulting decision tree provides a representation of the concept that appeal to human because
it renders the classification process self-evident.
ID3 is a no incremental algorithm, meaning it derives its classes from a fixed set of training instances. An
incremental algorithm revises the current concept definition, if necessary, with a new sample. The classes
created by ID3 are inductive, that is, given a small set of training instances, the specific classes created by
ID3 are expected to work for all future instances. The distribution of the unknowns must be the same as
the test cases. Induction classes cannot be proven to work in every case since they may classify an infinite
number of instances. Note that ID3 (or any inductive algorithm) may misclassify data.

Data Description

The sample data used by ID3 has certain requirements, which are:
 Attribute-value description - the same attributes must describe each example and have a fixed
number of values.

Student Name (Enrollment No) Page no

Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

 Predefined classes - an example's attributes must already be defined, that is, they are not learned
by ID3.
 Discrete classes - classes must be sharply delineated. Continuous classes broken up into vague
categories such as a metal being "hard, quite hard, flexible, soft, quite soft" are suspect.
 Sufficient examples - since inductive generalization is used (i.e. not provable) there must be
enough test cases to distinguish valid patterns from chance occurrences.

Attribute Selection

1. How to find entropy?

How does ID3 decide which attribute is the best? A statistical property, called information gain, is
used. Gain measures how well a given attribute separates training examples into targeted classes.
The one with the highest information (information being the most useful for classification) is
selected. In order to define gain, we first borrow an idea from information theory called entropy.
Entropy measures the amount of information in an attribute.

Given a collection S of c outcomes

Where
pi is the proportion of S belonging to class I.
S is over c.
Log2 is log base 2.
Note that S is not an attribute but the entire sample set.

Example 1

If S is a collection of 14 examples with 9 YES and 5 NO examples then

Notice entropy is 0 if all members of S belong to the same class (the data is perfectly classified).
The range of entropy is 0 ("perfectly classified") to 1 ("totally random").

2. How to find Gain?

Two step process for find gain

a. Find entropy of particular attribute A. i.e. Entropy(S)
b. Find entropy of whole dataset. i.e. EntropyA(S)

Gain(S, A) is information gain of example set S on attribute A is defined as

Where:
Student Name (Enrollment No) Page no
Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

S is each value v of all possible values of attribute A

Sv = subset of S for which attribute A has value v
|Sv| = number of elements in Sv
|S| = number of elements in S

Example 2

Suppose S is a set of 14 examples in which one of the attributes is wind speed. The values of
Wind can be Weak or Strong. The classification of these 14 examples are 9 YES and 5 NO. For
attribute Wind, suppose there are 8 occurrences of Wind = Weak and 6 occurrences of Wind =
Strong. For Wind = Weak, 6 of the examples are YES and 2 are NO. For Wind = Strong, 3 are
YES and 3 are NO. Therefore

Entropy(Sweak) = - (6/8)log2(6/8) - (2/8)log2(2/8) = 0.811

Entropy(Sstrong) = - (3/6)log2(3/6) - (3/6)log2(3/6) = 1.00

= 0.940 - (8/14)0.811 - (6/14)1.00

= 0.048

For each attribute, the gain is calculated and the highest gain is used in the decision node.

Example of ID3 algorithm

Suppose we want ID3 to decide whether the weather is amenable to playing baseball. Over the course of 2
weeks, data is collected to help ID3 build a decision tree (see table 1).

The target classification is "should we play baseball?" which can be yes or no.

The weather attributes are outlook, temperature, humidity, and wind speed. They can have the following
values:

outlook = { sunny, overcast, rain }

temperature = {hot, mild, cool }
humidity = { high, normal }
wind = {weak, strong }

Examples of set S are:

Table 1

Day Outlook Temperature Humidity Wind Play ball

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
Student Name (Enrollment No) Page no
Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

We need to find which attribute will be the root node in our decision tree. The gain is calculated for all
four attributes:

Gain(S, Outlook) = 0.246

Gain(S, Temperature) = 0.029
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048 (calculated in example 2)

Outlook attribute has the highest gain, therefore it is used as the decision attribute in the root node.
Since Outlook has three possible values, the root node has three branches (sunny, overcast, rain). The
next question is "what attribute should be tested at the Sunny branch node?" Since we=92ve used Outlook
at the root, we only decide on the remaining three attributes: Humidity, Temperature, or Wind.

Ssunny = {D1, D2, D8, D9, D11} = 5 examples from table 1 with outlook = sunny
Gain(Ssunny, Humidity) = 0.970
Gain(Ssunny, Temperature) = 0.570
Gain(Ssunny, Wind) = 0.019

Humidity has the highest gain; therefore, it is used as the decision node. This process goes on until all
data is classified perfectly or we run out of attributes.

Student Name (Enrollment No) Page no

Shree Swaminarayan Institute of Technology CE DEPT. (VI SEMESTER)

The final decision tree

The decision tree can also be expressed in rule format:

IF outlook = sunny AND humidity = high THEN playball = no

IF outlook = rain AND humidity = high THEN playball = no
IF outlook = rain AND wind = strong THEN playball = yes
IF outlook = overcast THEN playball = yes
IF outlook = rain AND wind = weak THEN playball = yes

ID3 has been incorporated in a number of commercial rule-induction packages. Some specific
applications include medical diagnosis, credit risk assessment of loan applications, equipment
malfunctions by their cause, classification of soybean diseases, and web search classification.

EXCERSICE:

Consider the customer database described below where an application for a credit card is either approved
or rejected. Construct a decision tree (with Approved as the decision variable) using the entropy measure.

Case Income Own Age Years of Approved

in $K home employment
1 >60 Own 35 <5 Yes
2 30-60 Own 35 >5 Yes
3 <30 Rent 35 >5 No
4 <30 Own 35 >5 Yes
5 30-60 Own 35 <5 No
6 >60 Rent 35 >5 Yes
7 <30 Rent 35 <5 No
8 30-60 Rent 35 >5 Yes
9 <30 Own 35 <5 No
10 >60 Own 35 >5 Yes
11 30-60 Rent 35 <5 No
12 >60 Rent 35 <5 Yes

EVALUATION:

Observation &
Timely completion Viva Total
Implementation
4 2 4 10

Subject In-charge Name & Signature: ____________

Date: ________________

Student Name (Enrollment No) Page no

Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Module 2
No ratings yet
Module 2
42 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Module 3
No ratings yet
Module 3
103 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Decession Tree
No ratings yet
Decession Tree
72 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
ML UNIT 2 Decision Tree
No ratings yet
ML UNIT 2 Decision Tree
109 pages
Unit 3
No ratings yet
Unit 3
46 pages
Unit 3
No ratings yet
Unit 3
81 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
03 02 Decision Trees
No ratings yet
03 02 Decision Trees
61 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
42 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
L6 Decision Tree Classifier
No ratings yet
L6 Decision Tree Classifier
46 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Storey DecisionTrees
No ratings yet
Storey DecisionTrees
38 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
AIML - Module 3 - Updated
No ratings yet
AIML - Module 3 - Updated
42 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
3 - Decision Trees
No ratings yet
3 - Decision Trees
16 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Decision Tree Learning Lecture
No ratings yet
Decision Tree Learning Lecture
13 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Ai 01 Id3
No ratings yet
Ai 01 Id3
7 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
2014 DENSO Fuel Pump and Fuel Injector Catalog PDF
100% (2)
2014 DENSO Fuel Pump and Fuel Injector Catalog PDF
308 pages
Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Research Scholars Evaluation Based On Guides View Using Id3
4 pages
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
4 pages
Perkins Seria 400 C Typ H...
No ratings yet
Perkins Seria 400 C Typ H...
82 pages
Front-Page News and Real-World Cues A New Look at Agenda-Setting by The Media
No ratings yet
Front-Page News and Real-World Cues A New Look at Agenda-Setting by The Media
35 pages
How To Clean Your Computer
100% (1)
How To Clean Your Computer
20 pages
ID3
No ratings yet
ID3
7 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Os - Lab Manual
No ratings yet
Os - Lab Manual
21 pages
Terraform - From Zero To Expert
No ratings yet
Terraform - From Zero To Expert
29 pages
BSC (Maths) V-Sem
No ratings yet
BSC (Maths) V-Sem
10 pages
Hub PTTKHT c02
No ratings yet
Hub PTTKHT c02
65 pages
AM Project
No ratings yet
AM Project
30 pages
Fastcat Phase 2 - External Competetiveness
100% (5)
Fastcat Phase 2 - External Competetiveness
17 pages
Solutions Manual To Accompany Investment 8th Edition 9780073382371
100% (1)
Solutions Manual To Accompany Investment 8th Edition 9780073382371
19 pages
The Literature Review Its Role Within Research
No ratings yet
The Literature Review Its Role Within Research
16 pages
Certified Information Systems Auditor (CISA) - Mock Exam 5
No ratings yet
Certified Information Systems Auditor (CISA) - Mock Exam 5
10 pages
Instructions For The Conduct of The Examination Wit12 June 2023
No ratings yet
Instructions For The Conduct of The Examination Wit12 June 2023
15 pages
AquaCrop Plug-In Program (Version 6.0) FAO 2018 - Reference Manual
No ratings yet
AquaCrop Plug-In Program (Version 6.0) FAO 2018 - Reference Manual
24 pages
GranCM Digest 2023
No ratings yet
GranCM Digest 2023
13 pages
IT Word Document - Fraudulent Bank Transactions
No ratings yet
IT Word Document - Fraudulent Bank Transactions
14 pages
Data Mining Practical 9
No ratings yet
Data Mining Practical 9
3 pages
Comb 1
No ratings yet
Comb 1
5 pages
Data Mining Practical 10
No ratings yet
Data Mining Practical 10
3 pages
Document-Co2 Gas Fire Suppression System-655
No ratings yet
Document-Co2 Gas Fire Suppression System-655
6 pages
IoT Short Answers Complete
No ratings yet
IoT Short Answers Complete
4 pages
Oop I - Question Bank
No ratings yet
Oop I - Question Bank
5 pages
Frequency Relay: 1MRS 750418-MBG Spaf 140 C
No ratings yet
Frequency Relay: 1MRS 750418-MBG Spaf 140 C
8 pages
TSUBAKI - Corrente Lambda Lub Free
No ratings yet
TSUBAKI - Corrente Lambda Lub Free
4 pages
CBC Cams
No ratings yet
CBC Cams
2 pages
ICAI CA GPT Manual
No ratings yet
ICAI CA GPT Manual
4 pages
Erin-B6fnrb R0 en
No ratings yet
Erin-B6fnrb R0 en
4 pages
Cyber Law Index Harshil
No ratings yet
Cyber Law Index Harshil
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Coa - Question Bank
No ratings yet
Coa - Question Bank
2 pages
DSDH Hac B2a5
No ratings yet
DSDH Hac B2a5
3 pages
1 - Unit 1 - Assignment Brief 1
No ratings yet
1 - Unit 1 - Assignment Brief 1
3 pages
Student Memo On Online Examination
No ratings yet
Student Memo On Online Examination
2 pages
Best BBA College in NOIDA
No ratings yet
Best BBA College in NOIDA
1 page
Computer Networking Principles Bonaventure 1-30-31 OTC1
No ratings yet
Computer Networking Principles Bonaventure 1-30-31 OTC1
1 page
VBRI NOV StraightBoreRams PDF
No ratings yet
VBRI NOV StraightBoreRams PDF
1 page
Article - Different Houses
No ratings yet
Article - Different Houses
1 page
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
MCS-013: Discrete Mathematics
From Everand
MCS-013: Discrete Mathematics
Dr. DK Sukhani
No ratings yet

Data Mining Practical 8

Uploaded by

Data Mining Practical 8

Uploaded by

Shree Swaminarayan Institute of Technology CE DEPT.

TITLE: Decision Tree Learning.

 What is Decision tree?

What is decision Tree?

Student Name (Enrollment No) Page no

Reading Decision Trees

If the parents are visiting, then go to the cinema

What is decision tree learning algorithm suited for:

Student Name (Enrollment No) Page no

1. How to find entropy?

Given a collection S of c outcomes

If S is a collection of 14 examples with 9 YES and 5 NO examples then

2. How to find Gain?

Two step process for find gain

Gain(S, A) is information gain of example set S on attribute A is defined as

S is each value v of all possible values of attribute A

Entropy(Sweak) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = 0.811

Entropy(Sstrong) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00

= 0.940 - (8/14)*0.811 - (6/14)*1.00

Example of ID3 algorithm

outlook = { sunny, overcast, rain }

Examples of set S are:

Day Outlook Temperature Humidity Wind Play ball

D4 Rain Mild High Weak Yes

Gain(S, Outlook) = 0.246

Student Name (Enrollment No) Page no

The final decision tree

The decision tree can also be expressed in rule format:

IF outlook = sunny AND humidity = high THEN playball = no

Case Income Own Age Years of Approved

Subject In-charge Name & Signature: ____________

Student Name (Enrollment No) Page no

You might also like

Entropy(Sweak) = - (6/8)log2(6/8) - (2/8)log2(2/8) = 0.811

Entropy(Sstrong) = - (3/6)log2(3/6) - (3/6)log2(3/6) = 1.00

= 0.940 - (8/14)0.811 - (6/14)1.00