0% found this document useful (0 votes)

171 views77 pages

Understanding Machine Learning Concepts

Here are the key steps to calculate information gain for the Outlook attribute: 1) Calculate entropy of the total examples: H(S) = 0.940 2) Calculate weighted entropy for each outlook value: - H(Sunny) = 0.971 - H(Overcast) = 0 - H(Rain) = 0.971 3) Calculate weighted average entropy: P(Sunny) * H(Sunny) + P(Overcast) * H(Overcast) + P(Rain) * H(Rain) 4) Information gain = Original entropy - Weighted average entropy = 0.940 - 0.628 = 0.312 So the

Uploaded by

Asnad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views77 pages

Understanding Machine Learning Concepts

Uploaded by

Asnad Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Machine Learning

Dr. Shazzad Hosain

Department of EECS
North South Universtiy

[email protected]
What is Machine Learning?

Learning Trained
algorithm
machine

TRAINING
DATA Answer

Data Mining is similar concept Query

For which tasks ?
Classification (binary/categorical target)
Regression and time series prediction
(continuous targets)
Clustering (targets unknown)
Rule discovery
For which applications ?
training
Customer knowledge
examples
Quality control
106 Market Analysis
105
Text Categorization
104
System diagnosis

OCR
103 Machine vision
HWR
102
Bioinformatics
10

10 102 103 104 105 inputs

Banking / Telecom / Retail

Identify:
Prospective customers
Dissatisfied customers
Good customers
Bad payers
Obtain:
More effective
advertising
Less credit risk
Fewer fraud
Decreased churn rate
Biomedical / Biometrics
Medicine:
Screening
Diagnosis and prognosis
Drug discovery

Security:
Face recognition
Signature / fingerprint / iris
verification
DNA fingerprinting
Computer / Internet
Computer interfaces:
Troubleshooting wizards
Handwriting and speech
Brain waves

Internet
Hit ranking
Spam filtering
Text categorization
Text translation
Recommendation
ML in a Nutshell
Tens of thousands of machine learning algorithms
Hundreds new every year
Every machine learning algorithm has three
components:
Representation
Evaluation
Optimization

8 Machine Learning
Representation
Decision trees
Sets of rules / Logic programs
Instances
Graphical models (Bayes/Markov nets)
Neural networks
Support vector machines
Model ensembles
Etc.

9 Machine Learning
Evaluation
Accuracy
Precision and recall
Squared error
Likelihood
Posterior probability
Cost / Utility
Margin
Entropy
K-L divergence
Etc.

10 Machine Learning
Optimization
Combinatorial optimization
E.g.: Greedy search
Convex optimization
E.g.: Gradient descent
Constrained optimization
E.g.: Linear programming

11 Machine Learning
Types of Learning
Supervised (inductive) learning
Training data includes desired outputs
Unsupervised learning
Training data does not include desired outputs
Semi-supervised learning
Training data includes a few desired outputs
Reinforcement learning
Rewards from sequence of actions

12 Machine Learning
Supervised Learning

Learning Through Examples

Supervised Learning
 When a set of targets of interest is provided by an
external teacher
we say that the learning is Supervised
 The targets usually are in the form of an input output
mapping that the net should learn
Learning From Examples

1 9
1 3 16 36
4 6 25 4
5 2
What We’ll Cover
Supervised learning
Decision tree induction
Neural networks
Rule induction
Instance-based learning
Bayesian learning
Support vector machines
Model ensembles
Learning theory

16 Machine Learning
Classification: Decision Trees

if X > 5 then blue

else if Y > 3 then blue
Y else if X > 2 then green
else blue

2 5 X

17
Classification: Neural Nets

 Can select more

complex regions
 Can be more accurate
 Also can overfit the
data – find patterns in
random noise

18
Decision Tree Learning

Learning Through Examples

Learning decision trees
Problem: decide whether to wait for a table at a restaurant,
based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Attribute-based representations
 Examples described by attribute values (Boolean, discrete, continuous)
 E.g., situations where I will/won't wait for a table:

 Classification of examples is positive (T) or negative (F)

Decision tree
Choosing an
attribute

 Idea: a good attribute splits the examples into subsets that

are (ideally) "all positive" or "all negative“

 Patrons? is a better choice

Choosing the Best Attribute
The key problem is choosing which attribute to split a
given set of examples.
Some possibilities are:
Random: Select any attribute at random
Least-Values: Choose the attribute with the smallest number
of possible values (fewer branches)
Most-Values: Choose the attribute with the largest number of
possible values (smaller subsets)
Max-Gain: Choose the attribute that has the largest expected
information gain, i.e. select attribute that will result in the
smallest expected size of the subtrees rooted at its children.
The ID3 algorithm uses the Max-Gain method of
selecting the best attribute.
ID3 (Iterative Dichotomiser 3) Algorithm
Top-down, greedy search through space of
possible decision trees
Remember, decision trees represent hypotheses, so
this is a search through hypothesis space.
What is top-down?
How to start tree?
What attribute should represent the root?
As you proceed down tree, choose attribute for each
successive node.
No backtracking:
So, algorithm proceeds from top to bottom
Question?
How do you determine which attribute best
classifies data?
Answer: Entropy!

Information gain:
Statistical quantity measuring how well an
attribute classifies the data.
Calculate the information gain for each attribute.
Choose attribute with greatest information gain.
Information Theory Background
If there are n equally probable possible messages, then the
probability p of each is 1/n
Information conveyed by a message is -log(p) = log(n)
Eg, if there are 16 messages, then log(16) = 4 and we need 4
bits to identify/send each message.
In general, if we are given a probability distribution
P = (p1, p2, .., pn)
the information conveyed by distribution (aka Entropy of P) is:
H(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))
Information Gain
Information gain is our metric for how well one attribute A i
classifies the training data.
Calculate the entropy for all training examples
 positive and negative cases
 p+ = #pos/Tot p- = #neg/Tot
 H(S) = -p+log2(p+) - p-log2(p-)
Determine which single attribute best classifies the training
examples using information gain.
 For each attribute find:

Gain( S , Ai )  H ( S)   P( A
v Values ( Ai )
i  v ) H ( Sv )

entropy Entropy for

value v
 Use attribute with greatest information gain as a root
Example: PlayTennis
Four attributes used for classification:
Outlook = {Sunny,Overcast,Rain}
Temperature = {Hot, Mild, Cool}
Humidity = {High, Normal}
Wind = {Weak, Strong}
One predicted (target) attribute (binary)
PlayTennis = {Yes, No}
Given 14 Training examples
9 positive
5 negative
Training Examples
Examples,
minterms,
cases, objects,
test cases,
14 cases 9 positive cases

Step 1: Calculate entropy for all cases:

NPos = 9 NNeg = 5 NTot = 14
H(S) = -(9/14)*log2(9/14) - (5/14)*log2(5/14) = 0.940
entropy
Step 2: Loop over all attributes, calculate gain:
Attribute = Outlook
 Loop over values of Outlook
Outlook = Sunny
NPos = 2 NNeg = 3 NTot = 5
H(Sunny) = -(2/5)*log2(2/5) - (3/5)*log2(3/5) = 0.971
Outlook = Overcast
NPos = 4 NNeg = 0 NTot = 4
H(Overcast) = -(4/4)*log24/4) - (0/4)*log2(0/4) = 0.00
Outlook = Rain
NPos = 3 NNeg = 2 NTot = 5
H(Rain) = -(3/5)*log2(3/5) - (2/5)*log2(2/5) = 0.971
 Calculate Information Gain for attribute Outlook

Gain(S, Outlook) = H(S) - NSunny/NTot*H(Sunny)

- NOver/NTot*H(Overcast)
- NRain/NTot*H(Rain)

Gain(S, Outlook) = 0.940 - (5/14)0.971 - (4/14)0 - (5/14)*0.971

Gain(S, Outlook) = 0.246
Attribute = Temperature
 (Repeat process looping over {Hot, Mild, Cool})
Gain(S, Temperature) = 0.029
Attribute = Humidity
 (Repeat process looping over {High, Normal})
Gain(S, Humidity) = 0.029
Attribute = Wind
 (Repeat process looping over {Weak, Strong})
Gain(S, Wind) = 0.048
Find attribute with greatest information gain:
Gain(S,Outlook) = 0.246, Gain(S,Temperature) = 0.029
Gain(S,Humidity) = 0.029, Gain(S,Wind) = 0.048

 Outlook is root node of tree

Iterate algorithm to find attributes which best classify
training examples under the values of the root node
Example continued
 Take three subsets:
 Outlook = Sunny (NTot = 5)
 Outlook = Overcast (NTot = 4)
 Outlook = Rainy (NTot = 5)
 For each subset, repeat the above calculation looping over all
attributes other than Outlook
For example:
 Outlook = Sunny (NPos = 2, NNeg=3, NTot = 5) H=0.971
 Temp = Hot (NPos = 0, NNeg=2, NTot = 2) H = 0.0
 Temp = Mild (NPos = 1, NNeg=1, NTot = 2) H = 1.0
 Temp = Cool (NPos = 1, NNeg=0, NTot = 1) H = 0.0

Gain(SSunny, Temperature) = 0.971 - (2/5)0 - (2/5)1 - (1/5)*0

Gain(SSunny, Temperature) = 0.571
Similarly:
Gain(SSunny, Humidity) = 0.971
Gain(SSunny, Wind) = 0.020

Humidity classifies Outlook=Sunny

instances best and is placed as the node under
Sunny outcome.
Repeat this process for Outlook = Overcast & Rainy
End up with tree:
Important:
Attributes are excluded from consideration if
they appear higher in the tree
Process continues for each new leaf node
until:
Every attribute has already been included
along path through the tree
or
Training examples associated with this leaf
all have same target attribute value.
Note: In this example data were perfect.
No contradictions
Branches led to unambiguous Yes, No decisions
If there are contradictions take the majority vote
This handles noisy data.
Another note:
Attributes are eliminated when they are assigned to a
node and never reconsidered.
e.g.You would not go back and reconsider Outlook under
Humidity
ID3 uses all of the training data at once
Contrast to Candidate-Elimination
Can handle noisy data.
The ID3 algorithm is used to build a decision tree, given a set of non-categorical attributes
C1, C2, .., Cn, the categorical attribute C, and a training set T of records.

function ID3 (R: a set of non-categorical attributes,

C: the categorical attribute,
S: a training set) returns a decision tree;
begin
If S is empty, return a single node with value Failure;
If every example in S has the same value for categorical
attribute, return single node with that value;
If R is empty, then return a single node with most
frequent of the values of the categorical attribute found in
examples S; [note: there will be errors, i.e., improperly
classified records];
Let D be attribute with largest Gain(D,S) among R’s attributes;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute D;
Return a tree with root labeled D and arcs labeled
d1, d2, .., dm going respectively to the trees
ID3(R-{D},C,S1), ID3(R-{D},C,S2) ,.., ID3(R-{D},C,Sm);
end ID3;
Entropy

Decision Tree Learning

–Does Entropy Make Sense?
If an event conveys information, that means it’s a
surprise.
If an event always occurs, P(Ai)=1, then it carries no
information. -log2(1) = 0
If an event rarely occurs (e.g. P(Ai)=0.001), it
carries a lot of info. -log2(0.001) = 9.97
The less likely (uncertain) the event, the more the
information it carries since, for 0  P(Ai)  1,
-log2(P(Ai)) increases as P(Ai) goes from 1 to 0.
(Note: ignore events with P(Ai)=0 since they never occur.)
What about entropy?
Is it a good measure of the information carried by an
ensemble of events?
If the events are equally probable, the entropy is maximum.

1) For N events, each occurring with probability 1/N.

H = -(1/N)log2(1/N) = -log2(1/N)
This is the maximum value.
(e.g. For N=256 (ascii characters) -log2(1/256) = 8
number of bits needed for characters.
Base 2 logs measure information in bits.)
This is a good thing since an ensemble of equally probable
events is as uncertain as it gets.

(Remember, information corresponds to surprise - uncertainty.)

Largest
entropy Entropy

Boolean
functions
with the same
number of
ones and
zeros have
largest
entropy
2) H is a continuous function of the probabilities.
 That is always a good thing.
3) If you sub-group events into compound events, the
entropy calculated for these compound groups is the same.
 That is good since the uncertainty is the same.

It is a remarkable fact that the equation for entropy

shown above (up to a multiplicative constant) is the
only function which satisfies these three conditions.
Choice of base 2 log corresponds to choosing units of
information.(BIT’s)
 Another remarkable thing:
This is the same definition of entropy used in statistical
mechanics for the measure of disorder.
 Corresponds to macroscopic thermodynamic quantity of
Second Law of Thermodynamics.
The concept of a quantitative measure for information
content plays an important role in many areas:
For example,
 Data communications (channel capacity)
 Data compression (limits on error-free encoding)
Entropy in a message corresponds to minimum number of
bits needed to encode that message.
In our case, for a set of training data, the entropy measures
the number of bits needed to encode classification for an
instance.
 Use probabilities found from entire set of training data.
 Prob(Class=Pos) = Num. of positive cases / Total case
 Prob(Class=Neg) = Num. of negative cases / Total cases
Hypothesis Space

Decision Tree Learning

Hypothesis Space

 The tree itself forms hypothesis

 Disjunction (OR’s) of conjunctions (AND’s)
 Each path from root to leaf forms conjunction
of constraints on attributes
 Separate branches are disjunctions

 Example from PlayTennis decision tree:

(Outlook=Sunny Humidity=Normal)

(Outlook=Overcast)

(Outlook=Rain  Wind=Weak)
Expressiveness
 Decision trees can express any function of the input attributes.
 E.g., for Boolean functions, truth table row → path to leaf:

 Trivially, there is a consistent decision tree for any training set with one path to
leaf for each example (unless f nondeterministic in x) but it probably won't
generalize to new examples

 Prefer to find more compact decision trees

Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
= number of distinct truth tables with 2n rows = 22n

 E.g., with 6 Boolean attributes, there are

18,446,744,073,709,551,616 trees

 Aim: find a small tree consistent with the training examples

 Idea: (recursively) choose "most significant" attribute as root of
(sub)tree
Extensions of Decision Tree Learning
Extensions of the Decision Tree Learning
Noisy data and Overfitting
Cross-Validation for Experimental Validation of
Performance
Pruning Decision Trees
Real-valued data
Using gain ratios
Generation of rules
Setting Parameters
Incremental learning
Noisy data and Overfitting
Many kinds of "noise" that could occur in the examples:
 Two examples have same attribute/value pairs, but different classifications
 Some values of attributes are incorrect because of:
 Errors in the data acquisition process
 Errors in the preprocessing phase
 The classification is wrong (e.g., + instead of -) because of some error
Some attributes are irrelevant to the decision-making process,
 e.g., color of a die is irrelevant to its outcome.
 Irrelevant attributes can result in overfitting the training data.
Noisy data and Overfitting
 Black dots are positive, others negative
 Two lines represent two hypothesis
 Thick line is complex hypothesis correctly
classifies all data
 Thin line is simple hypothesis but incorrectly
classifies some data

 The simple hypothesis makes some errors

but reasonably closely represents the trend in
the data
 The complex solution does not at all
represent the full set of data

Fix overfitting /overlearning

problem
 By cross validation
 By pruning lower nodes in the decision
Cross Validation: An Evaluation Methodology
Standard methodology: cross validation
1. Collect a large set of examples (all with correct classifications!).
2. Randomly divide collection into two disjoint sets: training and
test.
3. Apply learning algorithm to training set giving hypothesis H
4. Measure performance of H w.r.t. test set
Important: keep the training and test sets disjoint!
Learning is not to minimize training error (wrt data) but
the error for test/cross-validation: a way to fix overfitting
To study the efficiency and robustness of an algorithm,
repeat steps 2-4 for different training sets and sizes of
training sets.
If you improve your algorithm, start again with step 1 to
avoid evolving the algorithm to work well on just this
collection.
Pruning Decision Trees
Pre Pruning: Stop growing before a
fully grown tree
Post Pruning : Trim fully grown
tree from the bottom
Reduced Error Pruning
Rule post pruning
Reduced Error Pruning
Partitioning data in tree induction
Reduced Error Pruning
A post-pruning, cross-validation approach.

Partition training data in “grow” and “validation” sets.

Build a complete tree from the “grow” data.
Until accuracy on validation set decreases do:
For each non-leaf node, n, in the tree do:
Temporarily prune the subtree below n and replace it with a
leaf labeled with the current majority class at that node.
Measure and record the accuracy of the pruned tree on the validation set.
Permanently prune the node that results in the greatest increase in accuracy
on
the validation set.

60
Ockham’s Razor

Principle proposed by William of

Ockham in the fourteenth century:
“Pluralitas non est ponenda sine
neccesitate”.
Of two theories providing
similarly good predictions, prefer
the simplest one.
Shave off unnecessary parameters
of your models.
Real-valued data
Select a set of thresholds defining intervals;
each interval becomes a discrete value of the attribute
We can use some simple heuristics
always divide into quartiles
We can use domain knowledge
divide age into infant (0-2), toddler (3 - 5), and school aged (5-8)
 or treat this as another learning problem
try a range of ways to discretize the continuous variable
Find out which yield “better results” with respect to some metric.
Performance Evaluation

Decision Tree Learning

Metrics for Performance Evaluation
 Focus on the predictive capability of a model
 Rather than how fast it takes to classify or build models, scalability, etc.
 Confusion Matrix:

PREDICTED CLASS
Class=Yes Class=No
Class=Yes TP FN
ACTUAL Class=No FP TN TP (true positive)
CLASS FN (false negative)
 TP: predicted to be in YES, and is actually in it
 FP: predicted to be in YES, but is not actually in it FP (false positive)
 TN: predicted not to be in YES, and is not actually in it TN (true negative)
 FN: predicted not to be in YES, but is actually in it
Metrics for Performance
Accuracy
Evaluation…
PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes TP FN
CLASS
Class=No FP TN

Most widely-used metric:

TP  TN
Accuracy 
TP  TN  FP  FN
Limitation
Class of Accuracy
imbalance problem
Consider a 2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10

If model predicts everything to be class 0, accuracy

is 9990/10000 = 99.9 %
Accuracy is misleading because model does not detect
any class 1 example
Classifier Evaluation Metrics:
Accuracy, Error Rate, Sensitivity and Specificity
A\P Yes No
Yes TP FN P
 Sensitivity: True Positive
No FP TN N
recognition rate
 Sensitivity = TP/P
P’ N’ All
 Specificity: True Negative
recognition rate
 Classifier Accuracy, or recognition  Specificity = TN/N

rate: percentage of test set tuples

that are correctly classified
Accuracy = (TP + TN)/All
 Error rate: 1 – accuracy, or
Error rate = (FP + FN)/All

67
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
 Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive TP
precision 
TP  FP
 Recall: completeness – what % of positive tuples did the
classifier label as positive? TP
Perfect score is 1.0 recall 
TP  FN

 F-measure (F1 score or F-score)

harmonic mean of precision and recall,

2  precision  recall
F 
precision  recall
 Precision is biased towards TP & FP
 Recall is biased towards TP & FN
 F-measure is biased towards all except TN
Classifier Evaluation Metrics:
Matthews correlation coefficient (MCC)
MCC takes into account true and false positives and negatives.

Generally regarded as a balanced measure which can be used

even if the classes are of very different sizes.

It returns a value between −1 and +1.

 1 represents a perfect prediction
 0 no better than random prediction
 −1 indicates total disagreement between prediction and observation
Classifier Evaluation Metrics:
Matthews correlation coefficient (MCC)

N  TN  TP  FN  FP
TP  FN
S
N
TP  FP
P
N
TP / N  S  P
MCC 
PS (1  S )(1  P )
TP  TN  FP  FN
MCC 
(TP  FP )(TP  FN )(TN  FP )(TN  FN )
Summary

Decision Tree Learning

A greedy search approach
At each step, make decision which makes
greatest improvement in whatever you are
trying optimize.
Do not backtrack (unless you hit a dead end)
This type of search is likely not to be a
globally optimum solution, but generally works
well.
Types of problems decision tree learning is
good for:
Instances represented by attribute-value pairs
For algorithm in book, attributes take on a small number
of discrete values
Robust to imperfect training data
 classification errors
 errors in attribute values
 missing attribute values
Can be extended to real-valued attributes
(numerical data)
Target function has discrete output values
Algorithm in book assumes Boolean functions
Can be extended to multiple output values
Example Use
Equipment diagnosis
Medical diagnosis
Credit card risk analysis
Robot movement
Pattern Recognition
face recognition
hexapod walking gates
How well does it work?
Many case studies have shown that decision trees are at
least as accurate as human experts.
A study for diagnosing breast cancer:
 humans correctly classifying the examples 65% of the time,
 the decision tree classified 72% correct.
British Petroleum designed a decision tree for gas-oil separation
for offshore oil platforms/
 It replaced an earlier rule-based expert system.
Cessna designed an airplane flight controller using 90,000
examples and 20 attributes per example.
Summary of DT Learning
Inducing decision trees is one of the most widely used learning
methods in practice
Can out-perform human experts in many problems
Strengths include
 Fast
 simple to implement
 can convert result to a set of easily interpretable rules
 empirically valid in many commercial products
 handles noisy data
Weaknesses include:
 "Univariate" splits/partitioning using only one attribute at a time so limits
types of possible trees
 large decision trees may be hard to understand
 requires fixed-length feature vectors
References
Chapter 18 of “Artificial
Intelligence: A modern
approach” by Stuart Russell, Peter Norvig.
Chapter 10 of “AI Illuminated” by Ben Coppin.

Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
143 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
51 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
49 pages
Learning
No ratings yet
Learning
51 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
69 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Module 3 - Machine Learning Algorithms
No ratings yet
Module 3 - Machine Learning Algorithms
17 pages
Machine Learning for Engineers
100% (1)
Machine Learning for Engineers
80 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
53 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
61 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
14 pages
Data Mining & Decision Trees
No ratings yet
Data Mining & Decision Trees
38 pages
Supervised Learning and Decision Trees
No ratings yet
Supervised Learning and Decision Trees
84 pages
Decision Trees for Play Tennis Analysis
No ratings yet
Decision Trees for Play Tennis Analysis
51 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
Data Mining: Classification & Clustering Techniques
No ratings yet
Data Mining: Classification & Clustering Techniques
59 pages
Supervised Learning with Decision Trees
No ratings yet
Supervised Learning with Decision Trees
45 pages
Data Mining Classification Guide
No ratings yet
Data Mining Classification Guide
54 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Decision Trees
No ratings yet
Decision Trees
20 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
625 pages
Chapter Five Learning
No ratings yet
Chapter Five Learning
50 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
37 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Artificial Intelligence: Machine Learning
No ratings yet
Artificial Intelligence: Machine Learning
110 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
28 pages
Decision Trees for Classification Explained
No ratings yet
Decision Trees for Classification Explained
29 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
54 pages
Machine Learning: Decision Trees Overview
No ratings yet
Machine Learning: Decision Trees Overview
41 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
66 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
76 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
CH 5
No ratings yet
CH 5
84 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
45 pages
AI Learning Agents and Decision Trees
No ratings yet
AI Learning Agents and Decision Trees
21 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
42 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
20 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
60 pages
Understanding Decision Trees for Prediction
No ratings yet
Understanding Decision Trees for Prediction
61 pages
Decision Trees for Data Classification
No ratings yet
Decision Trees for Data Classification
33 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
ML Probabilistic Classifiers
No ratings yet
ML Probabilistic Classifiers
37 pages
Understanding Decision Trees in Deep Learning
No ratings yet
Understanding Decision Trees in Deep Learning
45 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Tree Models
No ratings yet
Tree Models
42 pages
Supervised Learning: Decision Trees & Classifiers
No ratings yet
Supervised Learning: Decision Trees & Classifiers
42 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Decision Tree Learning Overview
100% (1)
Decision Tree Learning Overview
41 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
9 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
33 pages
Classification and Prediction Techniques
100% (3)
Classification and Prediction Techniques
63 pages
Classification Methods in Data Mining
No ratings yet
Classification Methods in Data Mining
33 pages
PEOPLE'S REPUBLIC OF BANGLADESH FLAG RULES, 1972 (Revised Up To August, 2023)
No ratings yet
PEOPLE'S REPUBLIC OF BANGLADESH FLAG RULES, 1972 (Revised Up To August, 2023)
16 pages
Honors Compilers Course Intro
No ratings yet
Honors Compilers Course Intro
22 pages
Bio 103 L 1 Intr Biology (Dont Like It)
No ratings yet
Bio 103 L 1 Intr Biology (Dont Like It)
23 pages
Economics of Technology Management
No ratings yet
Economics of Technology Management
30 pages
12 February PSC QS
No ratings yet
12 February PSC QS
5 pages
Mobile OS Evolution and Market Strategy
No ratings yet
Mobile OS Evolution and Market Strategy
27 pages
Technology Management Lec 02
No ratings yet
Technology Management Lec 02
74 pages
Knowledge Representation & Reasoning Overview
100% (1)
Knowledge Representation & Reasoning Overview
23 pages
Logical Consequence in Propositional Logic
No ratings yet
Logical Consequence in Propositional Logic
70 pages
First Order Logic Overview and Syntax
No ratings yet
First Order Logic Overview and Syntax
80 pages
Lecture 02 Part A - Uninformed or Blind Search
No ratings yet
Lecture 02 Part A - Uninformed or Blind Search
92 pages
Lecture 02 Part A - Uninformed or Blind Search
No ratings yet
Lecture 02 Part A - Uninformed or Blind Search
92 pages
Advanced AI Course Overview and Syllabus
No ratings yet
Advanced AI Course Overview and Syllabus
34 pages
NSU Offered Course List
No ratings yet
NSU Offered Course List
225 pages
Advanced AI Course Overview and Syllabus
No ratings yet
Advanced AI Course Overview and Syllabus
34 pages
Lecture 01 Part C - Constraint Satisfaction Problem (CSP)
100% (1)
Lecture 01 Part C - Constraint Satisfaction Problem (CSP)
132 pages
Heaviside Functions
No ratings yet
Heaviside Functions
70 pages
Format of Material Testing Report
No ratings yet
Format of Material Testing Report
6 pages
MTH101 - Calculus and Analytical Geometry - UnSolved - Final Term Paper - 04
No ratings yet
MTH101 - Calculus and Analytical Geometry - UnSolved - Final Term Paper - 04
14 pages
Apttitude Formulae
No ratings yet
Apttitude Formulae
20 pages
Algebraic Fractions: Level 2 Further Maths
No ratings yet
Algebraic Fractions: Level 2 Further Maths
13 pages
CS 1 Class Standing and Grades Report
No ratings yet
CS 1 Class Standing and Grades Report
17 pages
A Study On The Simulation and Experiment of Evapor
No ratings yet
A Study On The Simulation and Experiment of Evapor
18 pages
Cengel Notes
No ratings yet
Cengel Notes
6 pages
10 Math Chapter 1 and 2 - 1
No ratings yet
10 Math Chapter 1 and 2 - 1
3 pages
This Content Downloaded From 146.50.98.82 On Sun, 13 Nov 2022 01:22:16 UTC
No ratings yet
This Content Downloaded From 146.50.98.82 On Sun, 13 Nov 2022 01:22:16 UTC
25 pages
2017 Metrobank - MTAP - DepeEd Math Challenge
No ratings yet
2017 Metrobank - MTAP - DepeEd Math Challenge
2 pages
Dev Institute STD 12: Mathematics Chapter 5 and 6: GUJCET 2023 Date: 29/03/23
No ratings yet
Dev Institute STD 12: Mathematics Chapter 5 and 6: GUJCET 2023 Date: 29/03/23
14 pages
5965-7917e 08 31 04
No ratings yet
5965-7917e 08 31 04
2 pages
Grade 1 Rectangle Geometry Lesson
No ratings yet
Grade 1 Rectangle Geometry Lesson
21 pages
Temperature Control Lab
100% (1)
Temperature Control Lab
16 pages
Calculus: Partial Derivatives Guide
No ratings yet
Calculus: Partial Derivatives Guide
3 pages
Physics
No ratings yet
Physics
86 pages
Differential Equations Assignment Solutions
No ratings yet
Differential Equations Assignment Solutions
37 pages
Exp 2023 Excellence
No ratings yet
Exp 2023 Excellence
17 pages
Maths Teacher Hub Paper 2H
No ratings yet
Maths Teacher Hub Paper 2H
28 pages
Sampling Theory PPT 1 1
No ratings yet
Sampling Theory PPT 1 1
41 pages
Vision Research: Balancing Hack, Math, Stat
No ratings yet
Vision Research: Balancing Hack, Math, Stat
11 pages
C Programming Control Statements Guide
No ratings yet
C Programming Control Statements Guide
23 pages
Schwarzschild Solution in General Relativity
No ratings yet
Schwarzschild Solution in General Relativity
23 pages
Multifractal Specific Heat in Heavy Ion Collisions
No ratings yet
Multifractal Specific Heat in Heavy Ion Collisions
18 pages
Playground Safety and Athletic Performance Analysis
No ratings yet
Playground Safety and Athletic Performance Analysis
68 pages
Math
No ratings yet
Math
981 pages
Maths Lines and Angles
No ratings yet
Maths Lines and Angles
6 pages

Understanding Machine Learning Concepts

Uploaded by

Understanding Machine Learning Concepts

Uploaded by

Machine Learning

Dr. Shazzad Hosain

Data Mining is similar concept Query

10 102 103 104 105 inputs

Learning Through Examples

if X > 5 then blue

 Can select more

Learning Through Examples

 Classification of examples is positive (T) or negative (F)

 Idea: a good attribute splits the examples into subsets that

 Patrons? is a better choice

entropy Entropy for

Step 1: Calculate entropy for all cases:

Gain(S, Outlook) = H(S) - NSunny/NTot*H(Sunny)

Gain(S, Outlook) = 0.940 - (5/14)*0.971 - (4/14)*0 - (5/14)*0.971

 Outlook is root node of tree

Gain(SSunny, Temperature) = 0.971 - (2/5)*0 - (2/5)*1 - (1/5)*0

Humidity classifies Outlook=Sunny

function ID3 (R: a set of non-categorical attributes,

Decision Tree Learning

1) For N events, each occurring with probability 1/N.

(Remember, information corresponds to surprise - uncertainty.)

It is a remarkable fact that the equation for entropy

Decision Tree Learning

 The tree itself forms hypothesis

 Example from PlayTennis decision tree:

 Prefer to find more compact decision trees

 E.g., with 6 Boolean attributes, there are

 Aim: find a small tree consistent with the training examples

 The simple hypothesis makes some errors

Fix overfitting /overlearning

Partition training data in “grow” and “validation” sets.

Principle proposed by William of

Decision Tree Learning

Most widely-used metric:

If model predicts everything to be class 0, accuracy

rate: percentage of test set tuples

 F-measure (F1 score or F-score)

Generally regarded as a balanced measure which can be used

It returns a value between −1 and +1.

Decision Tree Learning

You might also like

Gain(S, Outlook) = 0.940 - (5/14)0.971 - (4/14)0 - (5/14)*0.971

Gain(SSunny, Temperature) = 0.971 - (2/5)0 - (2/5)1 - (1/5)*0