0% found this document useful (0 votes)

20 views43 pages

Chapter4 Machine Learning Part3

The document discusses decision trees and the ID3/C4.5 algorithms for building decision trees from data. It explains that ID3/C4.5 use a top-down approach to recursively select the feature that best splits the data at each node, using information gain to determine the feature that creates the "purest" subsets. It provides examples of building decision trees from sample datasets to classify outcomes.

Uploaded by

Max Sun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views43 pages

Chapter4 Machine Learning Part3

Uploaded by

Max Sun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Chapter 4 Machine Learning

COMP 472 Artificial Intelligence

Russell & Norvig – Section 18.1 & 18.2

2 Supervised Learning Algorithms

´ Linear Regression
´ Logistic Regression
´ Naïve Bayes Classifier
´ Decision Tree
´ Random Forest
3 Decision Tree

´ Simplest, but most successful form of learning algorithm

´ Very well-know algorithm is ID3 (Quinlan, 1987) and its successor
C4.5
´ Look for features that are very good indicators of the result, place
these features (as questions) in nodes of the tree
´ Split the examples so that those with different values for the chosen
feature are in a different set
´ Repeat the same process with another feature

*ID3 = Iterative Dichotomiser 3

4 ID3* / C4.5 Algorithm

´ Top-down construction of the decision tree

´ Recursive selection of the “best feature” to use at the current node
in the tree
´ Once the feature is selected for the current node, generate
children nodes, one for each possible value of the selected
attribute
´ Partition the examples using the possible values of this attribute,
and assign these subsets of the examples to the appropriate
child node
´ Repeat for each child node until all examples associated with a
node are classified
5 Example 1
Information on last year’s students to determine if a student will get an ‘A’ this year

Features (X) Output f(X)

‘A’ last Black Works

Student Drinks? ‘A’ this year?
year? hair? hard?
X1: Richard Yes Yes No Yes No
X2: Alan Yes Yes Yes No Yes
X3: Alison No No Yes No No
X4: Jeff No Yes No Yes No
X5: Gail Yes No Yes Yes Yes
X6: Simon No Yes Yes Yes No
6 Example 1
Features Output f(X)

“A” last year? Student ‘A’ last Black Works Drinks? ‘A’ this year?
year? hair? hard?
Richard Yes Yes No Yes No
yes no Alan Yes Yes Yes No Yes
Alison No No Yes No No
Output = No Jeff No Yes No Yes No
Works hard?
Gail Yes No Yes Yes Yes
Simon No Yes Yes Yes No

yes no

Output = Yes Output = No

7 Example 2 The Restaurant
´ Goal: learn whether one should wait for a table
´ Attributes
´ Alternate: another suitable restaurant nearby
´ Bar: comfortable bar for waiting
´ Fri/Sat: true on Fridays and Saturdays
´ Hungry: whether one is hungry
´ Patrons: how many people are present (none, some, full)
´ Price: price range ($, $$, $$$)
´ Raining: raining outside
´ Reservation: reservation made
´ Type: kind of restaurant (French, Italian, Thai, Burger)
´ WaitEstimate: estimated wait by host (0-10 mins, 10-30, 30-60, >60)
8 Example 2 The Restaurant
´ Training Data
9 A First Decision Tree
Is it the best decision tree we can build?
10 Ockham’s Razer Principle
It is vain to do more than can be done with less… Entities should
not be multiplied beyond necessity. [Ockham, 1324]

´ In other words… always favor the simplest answer that correctly fits
the training data
´ i.e. the smallest tree on average
´ This type of assumption is called inductive bias
´ inductive bias = making a choice beyond what the training
instances contain
11 Finding the Best Tree
empty tree

´ can be seen as searching the space of

all possible decision trees
´ Inductive bias: prefer shorter trees on
average
´ how?
´ search the space of all decision trees
´ always pick the next attribute to split
the data based on its "discriminating
power" (information gain)
´ in effect, steepest ascent hill-
climbing search where heuristic is
information gain
complete tree
12 Which Tree is the Best ?
F1?

class F2?

class F3?

class F4?

class F5?

class F6?

class F7?

class class

F1?

F2? F3?

F4? F5? F6? F7?

class class class class class class class class

13 Choosing the Next Principle

´The key problem is choosing which feature to split a given

set of examples
ÍD3 uses Maximum Information-Gain:
Ćhoose the attribute that has the largest information gain
í.e., the attribute that will result in the smallest
expected size of the subtrees rooted at its children
ínformation theory
14 Intuitively
Output f(X)

´ Patron:
´ If value is Some… all outputs=Yes
´ If value is None… all outputs=No
´ If value is Full… we need more tests
´ Type:
´ If value is French… we need more tests
´ If value is Italian… we need more tests
´ If value is Thai… we need more tests
´ If value is Burger… we need more tests
´ …
´ So patron may lead to shorter tree…
15 Next Feature

´ For only data where patron = Full

´ hungry
´ If value is Yes… we need more tests
´ If value is No… all output= No
´ type:
´ If value is French… all output= No
´ If value is Italian… all output= No
´ If value is Thai… we need more tests
´ If value is Burger… we need more tests
´…
´ So hungry is more discriminating (only 1 new branch)…
16 Next Feature

´ 4 tests instead of 9
´ 11 branches instead of 21
17 Choosing the Next Attribute

´The key problem is choosing which feature to split a given set of

examples
´Most used strategy: information theory

H(X) = - å p(xi )log2p(xi ) Entropy (or information content)

xiÎX
æ1 1ö
H(fair coin toss) = - å p(xi )log2p(xi ) = Hç , ÷
xiÎX è2 2ø
æ1 1 1 1ö
= ç log2 + log2 ÷ = 1 bit
è2 2 2 2ø entropy of a fair coin toss with
2 possible outcomes, each
with a probability of 1/2
18 Entropy

´ Let X be a discrete random variable (RV) with i possible outcomes xi

´ Entropy (or information content)
n
H(X) = -å p(xi )log2p(xi )
i=1
´ measures the amount of information in a RV
´ average uncertainty of a RV
´ the average length of the message needed to transmit an outcome xi
of that variable
´ measured in bits
´ for only 2 outcomes x1 and x2, then 1 ≥ H(X) ≥ 0
19 Why -p(x)log2 (p(x))
20 Example: The Coin Flip
n
æ1 1 1 1ö
´ Fair coin: H(X) = -å p(xi )log2p(xi ) = - ç log2 + log2 ÷ = 1 bit
i=1 è2 2 2 2ø
n
æ 99 99 1 1 ö
´ Rigged coin: H(X) = -å p(xi )log2p(xi ) = - ç log2 + log2 ÷ = 0.08 bits
i=1 è 100 100 100 100 ø

fair coin -> high entropy

Entropy

rigged coin -> low entropy

P(head)
21 Choosing the Best Feature

´The "discriminating power" of an attribute A given a data set S

´ Let Values(A) = the set of values that attribute A can take
´ Let Sv = the set of examples in the data set which have value
v for attribute A (for each value v from Values(A) )

information gain (or

entropy reduction)

gain(S, A) = H(S) - H(S | A)

Sv
= H(S) - å x H(Sv )
v Î values(A) S
22 Some Intuition
Size Color Shape Output
Big Red Circle +
Small Red Circle +
Small Red Square -
Big Blue Circle -

´ Size is the least discriminating attribute (i.e. smallest

information gain)
´ Shape and color are the most discriminating attributes
(i.e. highest information gain)
23 A Small Example 1
Size Color Shape Output
Values(Color) = {red,blue}
Big Red Circle +
Small Red Circle + nColor

Small Red Square -

nred: 2+ 1- nblue: 0+ 1-
Big Blue Circle -
Sv
gain(S, Color) = H(S) - å x H(Sv )
æ2 2 2 2ö S
H(S) = -ç log2 + log2 ÷ = 1 v Î values(Color)
è4 4 4 4ø
for each v of Values(Color)
æ2 1 ö æ2 2 1 1ö
H(S | Color = red) = Hç , ÷ = -ç log2 + log2 ÷ = 0.918
è3 3ø è3 3 3 3ø
æ1 1ö
H(S | Color = blue) = H(1,0 ) = -ç log2 ÷ = 0
è1 1ø
3 1
H(S | Color) = (0.918) + (0) = 0.6885
4 4

gain(Color ) = H(S) - H(S | Color) = 1 - 0.6885 = 0.3115

24 A Small Example 2
Size Color Shape Output
Big Red Circle + n Shape
Small Red Circle +
n circle: 2+ 1- nsquare: 0+ 1-
Small Red Square -
Big Blue Circle -
Note: by definition,
´ Log 0 = -∞
æ2 2 2 2ö
H(S) = -ç log2 + log2 ÷ = 1
è4 4 4 4ø ´ 0log0 is 0

3 1
H(S | Shape) = (0.918) + (0) = 0.6885
4 4
gain(Shape) = H(S) - H(S | Shape) = 1 - 0.6885 = 0.3115
25 A Small Example 3
Size Color Shape Output

Big Red Circle + n Size

Small Red Circle + n big: 1+ 1- n small: 1+ 1-

Small Red Square -
Big Blue Circle -

æ2 2 2 2ö
H(S) = -ç log2 + log2 ÷ = 1
è4 4 4 4ø

1 1
H(S | Size) = (1) + (1) = 1
2 2
gain(Size) = H(S) - H(S | Size) = 1 - 1 = 0
26 A Small Example 4
Size Color Shape Output
Big Red Circle +
Small Red Circle +
Small Red Square -
Big Blue Circle -

gain(Shape) = 0.3115
gain(Color ) = 0.3115
gain(Size) = 0
´ So first separate according to either color or shape
(root of the tree)
27 A Small Example 4
Color
Size Color Shape Output
red blue
Big Red Circle +
S2 Size? or
Small Red Circle + -
Shape?
Small Red Square -
Big Blue Circle -

æ2 2 1 1ö
H(S2) = -ç log2 + log2 ÷
è3 3 3 3ø
for each v of Values(Size) for each v of Values(Shape)
æ1 0 ö æ2 0ö
H(S2 | Size = big) = Hç , ÷ = 0 H(S2 | Shape = circle) = Hç , ÷ = 0
è1 1 ø
è2 2 ø
æ1 1ö
H(S2 | Size = small) = Hç , ÷ = 1 æ1 1ö
è2 2ø H(S2 | Shape = square) = Hç , ÷ = 0
è1 1ø
1 2
H(S2 | Size) = (0) + (1) H(S2 | Shape)
3 3
gain(Size) = H(S2) - H(S2 | Color) gain(Shape) = H(S2) - H(S2 | Shape)
28 Back to the Restaurant
´ Training data:
29 The Restaurant Example
gain(alt) = ... gain(bar) = ... gain(fri) = ... gain(hun) = ...

æ2 æ 0 2ö 4 æ0 4ö 6 æ 2 4 öö
gain(pat) = 1 - çç x Hç , ÷ + x Hç , ÷ + x Hç , ÷ ÷÷
è 12 è 2 2 ø 12 è 4 4 ø 12 è 6 6 øø
æ2 æ0 0 2 2ö 4 æ0 0 4 4ö ö
= 1 - çç x - ç log2 + log2 ÷ + x - ç log2 + log2 ÷ + ... ÷÷ » 0.541bits
è 12 è2 2 2 2 ø 12 è4 4 4 4ø ø
gain(price) = ... gain(rain) = ... gain(res) = ...

æ2 æ1 1ö 2 æ1 1ö 4 æ2 2ö 4 æ 2 2 öö
gain(type) = 1 - çç x Hç , ÷ + x Hç , ÷ + x Hç , ÷ + x Hç , ÷ ÷÷ = 0 bits
è 12 è 2 2 ø 12 è 2 2 ø 12 è 4 4 ø 12 è 4 4 øø

gain(est) = ...

n Attribute pat (Patron) has the highest gain, so root of the

tree should be attribute Patrons
n do recursively for subtrees
30 Decision Boundaries
Feature 1

Feature 2
31 Decision Boundaries

Feature 1

Feature 2 > t1

t1
Feature 2
32 Decision Boundaries

Feature 1
Feature 2 > t1

t2 Feature 1 > t2

t1 Feature 2
??
33 Decision Boundaries
Feature 2 > t1
Feature 1

Feature 1 > t2
t2

t3 t1 Feature 2 > t3
Feature 2
34 Supervised Learning Algorithms

´ Linear Regression
´ Logistic Regression
´ Naïve Bayes Classifier
´ Decision Tree
´ Random Forest
35 Random Forest

´ Random forest builds multiple decision trees (called the forest)

and glues them together to get a more accurate and stable
prediction.
36 Why Random Forest

´ More Accuracy
´ reduce the variations and the predictions by combining the
result of multiple decision trees on different samples of the
data set.
´ Example: use the following data set to create a Random
Forest that predicts of a person has heart disease or not.
37 Creating a Random Forest

´ Example: use the following data set to create a Random

Forest that predicts of a person has heart disease or not.

Blood Blocked Chest Weight Heart

Flow Arteries Pain Disease
Normal Yes Yes 195 Yes
Abnormal No No 130 No
Normal No Yes 218 No
Abnormal Yes Yes 180 Yes
38 Creating a Random Forest

´ Step 1: Create a Bootstrapped Data Set

Bootstrapping is an estimation methods used to make predictions on a
data set by re-sampling it.

Blood Blocked Chest Weight Heart

Flow Arteries Pain Disease
Normal Yes Yes 195 Yes
Abnormal No No 130 No
Abnormal Yes Yes 180 Yes
Abnormal Yes Yes 180 Yes
39 Creating a Random Forest

´ Step 2: Creating Decision Tree

´ Build a Decision Tree by using the bootstrapped data set
´ Begin at the root nod & choose the best attribute to split the data set
´ Repeat the same process for each of the upcoming branch nodes.

Blocked Ateries
40 Creating a Random Forest

´ Step 3: Go back to Step1 and Repeat

´ Each Decision Tree predicts the output
class based on the respective predictor
variables used in the tree.
´ Go back to step1, create a new
bootstrapped data set and then build a
Decision Tree by considering only a
subset of variables at each step.
´ This iteration is performed 100’s of times,
creating multiple decision trees.
41 Creating a Random Forest

´ Step 4: Predicting the outcome of a new data point

´ To predict whether a new patient has heart disease, run the new data
down the decision trees.
´ After running the data down all the trees in the Random Forest, we
check which class got the majority votes.

A new patient’s record

Blood Flow Blocked Arteries Chest Pain Weight Heart Disease

Abnormal No Yes 185

Heart Disease
Yes No
1 0
Output
42 Creating a Random Forest

´ Step 5: Evaluate the Model

´ In a real-world problem, about 1/3rd of the original data set is not
included in the bootstrapped data set.
´ This sample data set that does not include in the bootstrapped data
set is known as the Out-Of-Bag(OOB) data set.
´ We can measure the accuracy of a Random Forest by the
proportion of OOB samples that are correctly classified.

Blood Flow Blocked Chest Pain Weight Heart

Arteries Disease
Normal Yes Yes 195 Yes Blood Blocked Chest Weight Heart
Flow Arteries Pain Disease
Abnormal No No 130 No
Normal No Yes 218 No
Abnormal Yes Yes 180 Yes

Abnormal Yes Yes 180 Yes OOB data set (testing data set)

Bootstrapped data set

43 The End

02 DecisionTrees Done
No ratings yet
02 DecisionTrees Done
68 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Ml Lecture04x2
No ratings yet
Ml Lecture04x2
16 pages
Learning
No ratings yet
Learning
51 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
Digital Design in Interior
100% (1)
Digital Design in Interior
35 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
CS6364 Lecture18 - ML Decision Tree(3)
No ratings yet
CS6364 Lecture18 - ML Decision Tree(3)
30 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
Unit 3
No ratings yet
Unit 3
46 pages
JU Ch9
No ratings yet
JU Ch9
21 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
Decision Trees
No ratings yet
Decision Trees
42 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
No ratings yet
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
35 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
CPE412 Pattern Recognition (Week 10)
No ratings yet
CPE412 Pattern Recognition (Week 10)
28 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Jntuk ML RECORD Full
No ratings yet
Jntuk ML RECORD Full
46 pages
Advance Diploma Programs Outlines
No ratings yet
Advance Diploma Programs Outlines
20 pages
Mid-Term Reading Test
No ratings yet
Mid-Term Reading Test
1 page
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Playing to Learn: Using Improv in the K-8 Classroom
From Everand
Playing to Learn: Using Improv in the K-8 Classroom
Kate Wiersema
No ratings yet
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Unit-3 SoftwareEngineering
No ratings yet
Unit-3 SoftwareEngineering
20 pages
AI-Powered+Predictive+Analytics+for+Vehicle+Maintenance+Scheduling (1)
No ratings yet
AI-Powered+Predictive+Analytics+for+Vehicle+Maintenance+Scheduling (1)
16 pages
Game Guide for Call of Duty: Ghosts (Unofficial)
From Everand
Game Guide for Call of Duty: Ghosts (Unofficial)
Fusion Media
No ratings yet
(Digital Copy) EY IMocha Skills-First Transformation
No ratings yet
(Digital Copy) EY IMocha Skills-First Transformation
56 pages
Lecture 2 Python Basic Elements Sep04 2018
No ratings yet
Lecture 2 Python Basic Elements Sep04 2018
38 pages
2005-2007 5EAT Manual
100% (1)
2005-2007 5EAT Manual
2 pages
Manual
No ratings yet
Manual
801 pages
VLT® AutomationDrive FC 301 - 302 - Design Guide 90-710 KW, Enclosure Sizes D and E
No ratings yet
VLT® AutomationDrive FC 301 - 302 - Design Guide 90-710 KW, Enclosure Sizes D and E
212 pages
JM Pharmacon,+109.+Ellen+Hotmian+ (849 856)
No ratings yet
JM Pharmacon,+109.+Ellen+Hotmian+ (849 856)
8 pages
June 2023
No ratings yet
June 2023
32 pages
SG Phonics Word Recognition 1
No ratings yet
SG Phonics Word Recognition 1
17 pages
Noetic Revolution 14
No ratings yet
Noetic Revolution 14
83 pages
PM - Group Project (Regency Plaza) PDF
No ratings yet
PM - Group Project (Regency Plaza) PDF
28 pages
From Bioeconomics To Degrowth
No ratings yet
From Bioeconomics To Degrowth
30 pages
Materials Letters: Biranchi Panda, Suvash Chandra Paul, Ming Jen Tan
No ratings yet
Materials Letters: Biranchi Panda, Suvash Chandra Paul, Ming Jen Tan
4 pages
WHCP Oil and Gas
100% (2)
WHCP Oil and Gas
2 pages
DS Controlpoint Video Export
No ratings yet
DS Controlpoint Video Export
5 pages
Review Problems - Chapter E
No ratings yet
Review Problems - Chapter E
2 pages
Avon Cosmetics Inc F. Pimentel Ave., Purok 1, Bgry. Cobangbang, Daet, Camarines Norte 4600 PH. VAT Reg.: TIN 000-107-629-014
No ratings yet
Avon Cosmetics Inc F. Pimentel Ave., Purok 1, Bgry. Cobangbang, Daet, Camarines Norte 4600 PH. VAT Reg.: TIN 000-107-629-014
4 pages
Gear Tooth Vernier Caliper
No ratings yet
Gear Tooth Vernier Caliper
12 pages
Ajay Narwal Resume PDF
No ratings yet
Ajay Narwal Resume PDF
2 pages
G5RV Multi-Band Antenna: by Louis Varney
No ratings yet
G5RV Multi-Band Antenna: by Louis Varney
15 pages
Dimensions: Glanded Standard Pump Atmos GIGA-N 125/200-11/4
No ratings yet
Dimensions: Glanded Standard Pump Atmos GIGA-N 125/200-11/4
3 pages
Fundamental Principles of Heat Transfer
No ratings yet
Fundamental Principles of Heat Transfer
10 pages
Basics of Perl: Advanced Unix Tools CS214 Spring 2002 Monday, March 11, 2002
No ratings yet
Basics of Perl: Advanced Unix Tools CS214 Spring 2002 Monday, March 11, 2002
5 pages
According To Judith Ramaley
No ratings yet
According To Judith Ramaley
2 pages
Ex08 DepthConv
No ratings yet
Ex08 DepthConv
4 pages
Marie Louise Von Franz - Animus e Anima Nos Contos de Fadas
No ratings yet
Marie Louise Von Franz - Animus e Anima Nos Contos de Fadas
133 pages

Chapter4 Machine Learning Part3

Uploaded by

Chapter4 Machine Learning Part3

Uploaded by

Chapter 4 Machine Learning

COMP 472 Artificial Intelligence

Russell & Norvig – Section 18.1 & 18.2

´ Simplest, but most successful form of learning algorithm

*ID3 = Iterative Dichotomiser 3

´ Top-down construction of the decision tree

Features (X) Output f(X)

‘A’ last Black Works

Output = Yes Output = No

´ can be seen as searching the space of

F4? F5? F6? F7?

class class class class class class class class

´The key problem is choosing which feature to split a given

´ For only data where patron = Full

´The key problem is choosing which feature to split a given set of

H(X) = - å p(xi )log2p(xi ) Entropy (or information content)

´ Let X be a discrete random variable (RV) with i possible outcomes xi

fair coin -> high entropy

rigged coin -> low entropy

´The "discriminating power" of an attribute A given a data set S

information gain (or

gain(S, A) = H(S) - H(S | A)

´ Size is the least discriminating attribute (i.e. smallest

Small Red Square -

gain(Color ) = H(S) - H(S | Color) = 1 - 0.6885 = 0.3115

Big Red Circle + n Size

Small Red Circle + n big: 1+ 1- n small: 1+ 1-

n Attribute pat (Patron) has the highest gain, so root of the

´ Random forest builds multiple decision trees (called the forest)

´ Example: use the following data set to create a Random

Blood Blocked Chest Weight Heart

´ Step 1: Create a Bootstrapped Data Set

Blood Blocked Chest Weight Heart

´ Step 2: Creating Decision Tree

´ Step 3: Go back to Step1 and Repeat

´ Step 4: Predicting the outcome of a new data point

A new patient’s record

Blood Flow Blocked Arteries Chest Pain Weight Heart Disease

´ Step 5: Evaluate the Model

Blood Flow Blocked Chest Pain Weight Heart

Bootstrapped data set

You might also like