Unit-3 ML

The document provides an overview of Decision Tree Learning, detailing its structure, working mechanism, and key algorithms like ID3 and C4.5, along with attribute selection measures such as Information Gain, Gain Ratio, and Gini Index. It also touches on Radial Basis Function networks and Case-Based Reasoning, explaining their methodologies and applications in machine learning.

Uploaded by

Kartikeya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views47 pages

Unit-3 ML

Uploaded by

Kartikeya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit-3

Decision Tree Learning

Prepared By: Deepti Singh

Decision Tree
• It is a popular supervised learning algorithm used for
classification and regression tasks.
• It is used to model and predict outcomes based on input
data.
• It is a tree- like structure where
- each internal node tests on attributes,
- each branch corresponds to attribute value
- And each leaf node represent final decision or prediction.
Cont…
• Working:
1. Splitting: The dataset splits into subsets based on
feature values.
2. Decision Rule: At each node, a decision rule is
applied to split the data.
3. Predictions: The Process continues until leaf nodes
are reached and provide final predictions.
* Decision tree are easy to interpret & visualize, making them a
valuable tools for various applications.
Inductive Bias
• Inductive bias refers to the set of assumptions which
a learning algorithm makes to generalize from
specific training data to unseen data.
• It guides the learning process and helps the
algorithm to make predictions.
Decision Tree Learning Algorithm
• It is a variation of the top-down GREEDY
SEARCH algorithm.
• Two basic algorithms are :
- Iterative Dichotomiser 3 (ID3) algorithm
- C4.5 algorithm
Cont…
1. Attribute Selection Measures:
• It is a heuristic measure for selecting the splitting criterion that best
separates a given data of class labelled training tuples into inductive
individual classes.
• It is also known as splitting rules.
- because it determines how the tuples at a given node are to be split.
• The attribute having the best score for the measure is chosen as the
splitting attributes for the given tuples.
• The three popular attribute selection measures are:-
- Information Gain
- Gain Ratio
- Gini Index
Attribute Selection Measures
1.1 Information Gain

• In decision tree learning, the main selection measure that is used is

information gain.
• This algorithm will always try to maximize information gain, which measures
how well a given attribute separates the training examples according to their
target classification.
• When the DT is constructed, the attribute with highest information gain will be
tested first.
• This become the root node and the split is made based on the values of the
root node.
• Entropy is an entity that controls the split in data.
• It computes the homogeneity of examples.
Cont…
• The formula is:

• p= probability of various instances under consideration.

• The entropy is 0 if all members of S belongs to the same class.
• And 1 if there are equal no. of positive and negative examples.
• Entropy ranges between “0 and 1”, if there is unequal no. of positive and
negative examples.
• Information gain quantifies the reduction in enttropy after splitting the
dataset on a feature:
Attribute Selection Measures
1.2 Gain Ratio
• It is biased towards tests with many outcomes.
• It prefers to select attribute having a large no. values.
• Ex: Consider an attribute that acts as a unique identifier such as a
product id.
- A split on product id would results in a large no. of partitions, each one
just containing one tuple.
- The information gain is obtained by partitioning this attribute is
maximum. Clearly partitioning is useless for classification.
• C4.5 algorithm, a successor of ID3 uses an extension to information gain
known as gain ratio which attempts to overcome this bias.
Cont…
• It applies a kind of normalization to information gain using split
information:

• Gain Ratio:

• The attribute with maximum gain ratio is selected as the splitting

attribute.
Attribute Selection Measures
1.3 Gini Index
• The Gini Index is used in Classification and Rgeression task (CART).This
index measures the impurity of set of training tuples. Mathematically,

Where pi =the probability that tuple belongs to Class Ci

• The Gini index considers a binary split for each attribute. Consider the case
where A is a discrete-valued attribute having v distinct values,
{a1,a2,…an}occuring in a training set.
• To determine the best binary split on A , we examine all the possible
subsets that can be formed using known values of A.
Cont…
• Ex: if income has three possible values namely: {low, medium , high}, then
the possible sets are:
- {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low},
{medium}, {high}, {}.
• We exclude the empty set and power set, because they do not represent a
split. Therefore, there are (2v – 2) possible ways to form two partitions of
the data based on binary split on A.
• We considering a binary split, we compute a weighted sum of the
impurity of each resulting partition. Ex: a binary split on A partitions D into
D1 and D2, the Gini Index of D is :
•
Cont…
Cont…
ID3 Algorithm
Steps to Create a Decision Tree using the ID3
Algorithm:
• Step 1: Data Preprocessing:
Clean and preprocess the data. Handle missing values and convert categorical variables into
numerical representations if needed.
• Step 2: Selecting the Root Node:
Calculate the entropy of the target variable (class labels) based on the dataset. The formula
for entropy is:

• Step 3: Calculating Information Gain:

For each attribute in the dataset, calculate the information gain when the dataset is split on
that attribute. The formula for information gain is:
Cont…
• Step 4: Selecting the Best Attribute:
Choose the attribute with the highest information gain as the decision node for the tree.
• Step 5: Splitting the Dataset:
Split the dataset based on the values of the selected attribute.
• Step 6: Repeat the Process:
Recursively repeat steps 2 to 5 for each subset until a stopping criterion is met (e.g., the tree
depth reaches a maximum limit or all instances in a subset belong to the same class).
Solved Example
Step 1: Data Preprocessing: The dataset
does not require any preprocessing, as it
is already in a suitable format.
Step2: Calculating Entropy:
To calculate entropy, we first determine
the proportion of positive and negative
instances in the dataset:
Step 3: Calculating Information Gain:
We calculate the information gain for
each attribute (Weather, Temperature,
Humidity, Windy) and choose the
attribute with the highest information
gain as the root node.
Cont…
• Step 4: Selecting the Best Attribute:
The “Weather” attribute has the highest
information gain, so we select it as the
root node for our decision tree.
• Step 5: Splitting the Dataset:
We split the dataset based on the values
of the “Weather” attribute into three
subsets (Sunny, Overcast, Rainy).
• Step 6: Repeat the Process:
Since the “Weather” attribute has no
repeating values in any subset, we stop
splitting and label each leaf node with
the majority class in that subset.
Radial Basis Function (RBFs)
• One approach to function approximation that is closely related to distance-weighted
regression and also to artificial neural networks is learning with radial basis functions.
• In this approach, the learned hypothesis is a function of the form:

• where each xu is an instance from X and where the kernel function Ku(d(xu, x)) is defined so
that it decreases as the distance d(xu, x) increases. Here k is a user- provided constant that
specifies the number of kernel functions to be included.
• Even though is a global approximation to f(x), the contribution from each of the
Ku(d(xu, x)) terms is localized to a region nearby the point xu. It is common to choose each
function Ku(d(xu, x)) to be a Gaussian function centered at the point xu with some variance

…. eqn. 1
Cont…
• We will restrict our discussion here to this common Gaussian kernel function. As shown by
Hartman et al. (1990), the functional form of Equation (1) can approximate any function with
arbitrarily small error, provided a sufficiently large number k of such Gaussian kernels and
provided the width of each kernel can be separately specified.
• The function given by Equation (1) can be viewed as describing a two- layer network where
the first layer of units computes the values of the various Ku(d(xu, x)) and where the second
layer computes a linear combination of these first-layer unit values.
* Each hidden unit produces an activation
determined by a Gaussian function centered at
some instance xu. Therefore, its activation will be
close to zero unless the input x is near xu. The
output unit produces a linear combination of the
hidden unit activations. Although the network
shown here has just one output, multiple output
units can also be included.
Cont.

Summary:
• Radial basis function networks provide a global approximation to the target function,
represented by a linear combination of many local kernel functions. The value for any given
kernel function is non-negligible only when the input x falls into the region defined by its
particular center and width. Thus, the network can be viewed as a smooth linear
combination of many local approximations to the target function.
• One key advantage to RBF networks is that they can be trained much more efficiently than
feed-forward networks trained with BACKPROPAGATION. This follows from the fact that the
input layer and the output layer of an RBF are trained separately.
Case Based Reasoning
• Instance-based methods such as k-NEAREST NEIGHBOR and locally weighted regression share
three key properties.
• First, they are lazy learning methods in that they defer the decision of how to generalize
beyond the training data until a new query instance is observed.
• Second, they classify new query instances by analyzing similar instances while ignoring
instances that are very different from the query.
• Third, they represent instances as real-valued points in an n-dimensional Euclidean space.
• In CBR, instances are typicaly represented using more rich symbolic descriptions, and the
methods used to retrieve similar instances are correspondingly more elaborate.
Cont.
• CBR has been applied to problems such as
- conceptual design of mechanical devices based on a stored library of
previous designs,
- reasoning about new legal cases based on previous rulings, and solving
planning and scheduling problems by reusing and combining portions of
previous solutions to similar problems.
• Case base reasoning consist of cycle as shown below:
1. Retrieve: Given a new case, retrieve similar cases from the case base.
2. Reuse: Adapt the retrieved cases to fit to the new case.
3. Revise: Evaluate the solution and revise it based on how well it works.
4. Retain: Decide whether to retain this new case in the case base.

Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Attribute Selection Measures Explained
No ratings yet
Attribute Selection Measures Explained
46 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
80 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Trees
No ratings yet
Trees
78 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
6 pages
Decision Tree - Notes
No ratings yet
Decision Tree - Notes
8 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
DM 4
No ratings yet
DM 4
68 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Classification
No ratings yet
Classification
75 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
ID3 Decision Tree Algorithm Implementation
No ratings yet
ID3 Decision Tree Algorithm Implementation
20 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Classification - Decision Tree
No ratings yet
Classification - Decision Tree
32 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
8 Classification
No ratings yet
8 Classification
82 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
10 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Ages
No ratings yet
Ages
9 pages
Traditional vs. New Media Literacy Guide
100% (1)
Traditional vs. New Media Literacy Guide
19 pages
The Eden Prescription The War On Cancer Is Not What You Think Ethan Evers All Chapter Instant Download
100% (13)
The Eden Prescription The War On Cancer Is Not What You Think Ethan Evers All Chapter Instant Download
51 pages
Torque and Center of Mass Explained
No ratings yet
Torque and Center of Mass Explained
39 pages
Potensi Energi Mikrohidro Di Daerah Irigasi Studi PDF
No ratings yet
Potensi Energi Mikrohidro Di Daerah Irigasi Studi PDF
11 pages
Biomedical Optics Question Bank
100% (2)
Biomedical Optics Question Bank
5 pages
CF Moto MT650 Engine Manual
No ratings yet
CF Moto MT650 Engine Manual
44 pages
History of Fluorine: Moissan's Fluorine Cell, From His 1887 Publication
No ratings yet
History of Fluorine: Moissan's Fluorine Cell, From His 1887 Publication
3 pages
Contoh Soal Uts Akm 2
No ratings yet
Contoh Soal Uts Akm 2
3 pages
Dynamometer
No ratings yet
Dynamometer
3 pages
Class X Math Assignment 2024-25
No ratings yet
Class X Math Assignment 2024-25
1 page
Coconut-Lemon Beverage Optimization
No ratings yet
Coconut-Lemon Beverage Optimization
7 pages
Theme Weather and Seasons
No ratings yet
Theme Weather and Seasons
4 pages
Cordero 16.5.3
No ratings yet
Cordero 16.5.3
9 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Cost of Living Prediction Using ML
No ratings yet
Cost of Living Prediction Using ML
90 pages
A Free Ebook 6 Sets of Presentations (English & Malay) PDF
No ratings yet
A Free Ebook 6 Sets of Presentations (English & Malay) PDF
73 pages
"Peter and Wolf" Lesson Plan
No ratings yet
"Peter and Wolf" Lesson Plan
36 pages
BTS KTN-Mk2 Eng06 W
No ratings yet
BTS KTN-Mk2 Eng06 W
25 pages
Nurses in Implementation of National Rural Health Mission: Dr. Pratima Mittra Sr. Consultant, RCH - Ii / NRHM Nihfw
No ratings yet
Nurses in Implementation of National Rural Health Mission: Dr. Pratima Mittra Sr. Consultant, RCH - Ii / NRHM Nihfw
32 pages
Effectiveness of Catch Up Vaccination Interventions Versus 39ncw7ck6h
No ratings yet
Effectiveness of Catch Up Vaccination Interventions Versus 39ncw7ck6h
39 pages
Triangle Illumination with Phong Model
No ratings yet
Triangle Illumination with Phong Model
4 pages
Tenses Class8
No ratings yet
Tenses Class8
15 pages
‎⁨نسخة Biochemistry⁩
No ratings yet
‎⁨نسخة Biochemistry⁩
6 pages
Appeal Writing
No ratings yet
Appeal Writing
4 pages
Home Solar Inverter Overview
No ratings yet
Home Solar Inverter Overview
2 pages
Primary 6 Social Studies Guide
No ratings yet
Primary 6 Social Studies Guide
28 pages
Real Estate Firms in India
No ratings yet
Real Estate Firms in India
12 pages
Migration Trends and Challenges 2025
No ratings yet
Migration Trends and Challenges 2025
5 pages
Bike Build
No ratings yet
Bike Build
1 page

Unit-3 ML

Uploaded by

Unit-3 ML

Uploaded by

Unit-3

Decision Tree Learning

Prepared By: Deepti Singh

• In decision tree learning, the main selection measure that is used is

• p= probability of various instances under consideration.

• The attribute with maximum gain ratio is selected as the splitting

Where pi =the probability that tuple belongs to Class Ci

• Step 3: Calculating Information Gain:

You might also like