0% found this document useful (0 votes)
10 views

MCA3 (DS) Unit 4 ML

The document provides an overview of supervised and unsupervised learning algorithms, focusing on decision tree algorithms. It describes how decision trees work by splitting the data into nodes and branches based on attribute values to classify or predict target variables. The key algorithms discussed are ID3, C4.5, and CART. It explains that decision trees use entropy and information gain to determine the optimal attribute to split on at each node, with the goal of creating homogenous leaf nodes. Examples are also given to illustrate how a decision tree is constructed from sample training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

MCA3 (DS) Unit 4 ML

The document provides an overview of supervised and unsupervised learning algorithms, focusing on decision tree algorithms. It describes how decision trees work by splitting the data into nodes and branches based on attribute values to classify or predict target variables. The key algorithms discussed are ID3, C4.5, and CART. It explains that decision trees use entropy and information gain to determine the optimal attribute to split on at each node, with the goal of creating homogenous leaf nodes. Examples are also given to illustrate how a decision tree is constructed from sample training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Supervised and

Unsupervised
learning
Overview
• Introduction
• Decision Tree Representation
• Appropriate problems for Decision tree
• Learning Algorithm
• Hypothesis Space Search
• Inductive Bias in Decision Tree Learning
• Issues in Decision Tree Learning
• Locally Weighted Regression
• Radial Bases, Functions,
• Case Based Reasoning
Introduction
• Decision Tree algorithm belongs to the family of supervised learning
algorithms. Unlike other supervised learning algorithms, the decision tree
algorithm can be used for solving regression and classification
problems too.
• The goal of using a Decision Tree is to create a training model that can
use to predict the class or value of the target variable by learning simple
decision rules inferred from prior data(training data).
• In Decision Trees, for predicting a class label for a record we start from
the root of the tree. We compare the values of the root attribute with the
record’s attribute. On the basis of comparison, we follow the branch
corresponding to that value and jump to the next node.
Decision Tree Representation

• Decision trees classify instances by sorting them down the tree


from the root to some leaf node
• A node
– Specifies some attribute of an instance to be tested
• A branch
– Corresponds to one of the possible values for an attribute
Different decision tree algorithm

ID3 → (extension of D3)

C4.5 → (successor of ID3)

CART → (Classification And Regression Tree)

CHAID → (Chi-square automatic interaction detection Performs multi-level splits


when computing classification trees)

MARS → (multivariate adaptive regression splines)


Decision Tree Representation (cont.)

Outlook
A Decision Tree for the
concept PlayTennis
Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong


Strong Weak

No Yes No Yes
Decision Tree Representation (cont.)
Outlook
• Each path corresponds to a conjunction of attribute tests. For
example, if the instance is (Outlook=sunny, Temperature=Hot,
Sunny Rain
Humidity=high, Wind=Strong) then the path of (Outlook=Sunny ∧
Humidity=High) is matched so that the
Overcast
target value would be NO as shown in the tree.

Humidity Wind • A decision tree represents a disjunction of


Yes
conjunction of constraints on the attribute values of
instances. For example, three positive instances can
High Normal Strong Weak be represented as (Outlook=Sunny ∧ Humidity=normal)
∨ (Outlook=Overcast) ∨ (Outlook=Rain ∧Wind=Weak) as
shown in the tree.
No Yes No Yes
What is the merit of tree representation?
Decision Tree Representation (cont.)
• Appropriate Problems for Decision Tree Learning
– Instances are represented by attribute-value pairs
– The target function has discrete output values
– Disjunctive descriptions may be required
– The training data may contain errors
• Both errors in classification of the training examples and errors in
the attribute values
– The training data may contain missing attribute values
– Suitable for classification
Learning Algorithm

• Main question
– Which attribute should be tested at the root of the (sub)tree?
• Greedy search using some statistical measure

• Information gain
– A quantitative measure of the worth of an attribute
– How well a given attribute separates the training example according to
their target classification
– Information gain measures the expected reduction in entropy
Learning Algorithm Temperature

cool mild hot

outlook outlook windy

sunny rain sunny rain strong weak


overcast overcast

yes yes yes no


windy windy humid humid

strong Normal High Normal High


strong weak
weak
windy yes outlook yes
no yes yes no

sunny rain
strong weak
overcast

no yes no yes ?
What is entropy
• In decision tree machine learning, entropy is a measure used to quantify the impurity or
disorder within a set of data.

• It's a concept borrowed from information theory and is particularly useful in decision tree
algorithms, such as ID3, C4.5, and CART, for determining the best attribute to split the
data at each node.

• Entropy helps in deciding the order of attributes in the nodes of the tree during the
construction phase.

• The goal is to create splits that result in nodes containing data points that are as
homogeneous as possible with respect to the target variable.
Learning Algorithm (cont.)
• Entropy
– characterizes the (im)purity of an arbitrary of examples

For example
• The information required for classification of Table 3.2
=-(9/14)log2(9/14)-(5/14)log2(5/14)=0.940
Learning Algorithm (cont.)

The formula for entropy in the context of decision trees is often


expressed as:
Learning Algorithm (cont.)

• When entropy is high, it indicates high disorder or uncertainty in the


dataset. Conversely, when entropy is low, it suggests the data is more
ordered or homogeneous.
• During the construction of a decision tree, the attribute with the lowest
entropy (or highest information gain, which is the reduction in entropy)
is chosen as the splitting criterion, aiming to partition the data into
subsets that are as pure as possible in terms of the target variable.
• Lower entropy after a split indicates that the resulting subsets are more
homogeneous, making decisions or predictions more accurate within
each subset.
Learning Algorithm (cont.)

• Information gain and entropy

✓ Values (A): the set of all possible values for attribute A


✓ Sv : the subset of S for which attribute A has value v

– First term: the entropy of the original collection


– Second term: the expected value of the entropy after S is partitioned using attribute A
• Gain (S ,A)
– The expected reduction in entropy caused by knowing the value of attribute A
– The information provided about the target function value, given the value of some other
attribute A 15
Learning Algorithm (cont.)
• ID3 (Examples, Target_attribute, Attributes)
– Create a Root node for the tree
– If all Examples are positive, return the single node tree Root, with
label= +
– If all Examples are negative, return the single node tree Root, with
label= −
– If Attributes is empty, return the single-node tree Root, with label =
most common value of Target_attribute in Examples
– Otherwise begin
Continued to Next Slide16➔
An Illustrative Example

Day Outlook Temperature Humidity Wind Play Tennis


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
17
Training examples for the target concept PlayTennis
• Selecting the root node
– The information gain values for all four attributes
• Gain(S, Outlook)= 0.246  selected as root attribute
• Gain(S, Humidity)= 0.151
• Gain(S, Wind)= 0.048
• Gain(S, Temperature)= 0.029
• Adding a subtree
Hypothesis Space Search

Hypothesis space search refers to the


process of exploring and evaluating
different hypotheses or models in machine
learning to find the most suitable one for a
given problem.
It involves searching through a space of
possible models or configurations to
identify the one that best fits the data and
optimizes a defined objective (like
accuracy, error minimization, etc.).
19
Inductive Bias in Decision Tree Learning
•Approximate bias of ID3: Shorter trees are preferred over longer trees.
Trees that place high information gain attributes close to the root are
preferred over those that do not.
•Occam's Razor: prefer the simplest hypothesis that fits the data.
•Preference or search bias: preference for certain hypotheses over others
with no hard restriction on the hypotheses that can be enumerated. ID3
demonstrates a preference bias.
•Restriction or language bias: a categorical restriction on the set of
hypotheses considered. The candidate elimination algorithm demonstrates a
restriction bias.
•In general, it is better to have a preference bias than a restriction bias.
However, some learning systems have both biases. 20
Issues in Decision Tree Learning (cont.)

What is Overfitting?
Overfitting is a common problem that needs to be handled while training
a decision tree model. Overfitting occurs when a model fits too
closely to the training data and may become less accurate when
encountering new data or predicting future outcomes. In an overfit
condition, a model memorizes the noise of the training data and fails to
capture essential patterns

21
Issues in Decision Tree Learning (cont.)

• Avoiding overfitting
– How can we avoid overfitting?
• Stop growing before it reaches the point where it perfectly classifies the
training data
• Grow full tree, then post-prune
– How to select best tree?
• Measure performance statistically over training data
• Measure performance over separate validation data set
• MDL: minimize the complexity for encoding the training examples and
the decision tress
22
Issues in Decision Tree Learning (cont.)

23
Issues in Decision Tree Learning
Decision tree learning is a powerful and popular machine learning technique, but it's not
without its challenges and limitations. Here are some key issues associated with decision
tree learning:

1. Overfitting: Decision trees can easily overfit the training data, especially when they grow
to be very deep and complex. This results in the model learning noise or specific patterns
that are unique to the training set but do not generalize well to unseen data.

2. High Variance: Small changes in the training data can lead to significantly different trees.
This high variance can make decision trees unstable and sensitive to variations in the
dataset.

3. Feature Importance and Correlation: Decision trees can struggle with identifying and
using correlated features effectively. Redundant or highly correlated features might affect
the importance assigned to individual features or cause biased splits in the tree.
Issues in Decision Tree Learning
4. Bias in Attribute Selection Heuristics: The choice of attribute selection heuristics
(e.g., information gain, Gini impurity) can introduce biases towards certain types of
attributes or certain types of splits, impacting the final tree structure and
performance.

5. Handling Missing Values: While some decision tree algorithms handle missing
values well by making assumptions about the missing data, others might struggle or
require additional preprocessing steps.

Addressing these issues often involves using ensemble methods like Random
Forests, Gradient Boosting, or implementing techniques like cross-validation,
pruning, or feature engineering to improve decision tree models' performance and
robustness.
Locally Weighted Linear Regression

Locally Weighted Regression (LWR) is a non-parametric regression


technique used for making predictions based on locally weighted
linear regression. It differs from global regression methods (like linear
regression) by giving more weight to data points in the district of the
query point when making predictions.
Radial Bases Functions:
Radial Basis Functions (RBFs) are mathematical functions whose
value depends on the distance between the input and a center. They
are commonly used in various fields including machine learning,
interpolation, approximation, and signal processing.
Case Based Reasoning
Case-Based Reasoning (CBR) is an AI reasoning paradigm that solves
new problems by retrieving and reusing solutions from similar past
cases. It operates on the idea that similar problems tend to have
similar solutions, and it mimics human problem-solving by leveraging
past experiences.
THANK YOU

You might also like