Chapter 4A Tutorial Questions and Solutions
Chapter 4A Tutorial Questions and Solutions
1. Bayesian classifiers can predict class membership probabilities, in other word, __________
a. The probability that a given tuple belongs to a particular class.
b. The probability that a given tuple does not belong to a particular class.
c. None of the above.
(Entropy helps to determine the impurity of a node and as we go down the decision tree, entropy
decreases)
5. How do you choose the right node while constructing a decision tree?
a. An attribute having high entropy
b. An attribute having high entropy and information gain
c. An attribute having the lowest information gain.
d. An attribute having the highest information gain.
(We select first those attributes which are having maximum information gain)
6. In a naive Bayes algorithm, when an attribute value in the testing record has no example in the training
set, then the entire posterior probability will be zero.
a. True
b. False
c. None of the above.
(Since for a particular value in the attribute, the probability will be zero due to the absence of an
example present in the training dataset. This usually leads to the problem of zero probability in the
Naive Bayes algorithm.)
For question 10 and 11, show ALL calculations and workings. You will be required to submit your
answers next week.
Decision Tree Classification
10) For the following medical diagnosis data, create a decision tree using the ID3
Method
Strep Throat = 3
Allergy = 3
Cold = 4
-----
10
m
Info( D) = − pi log 2 ( pi )
i =1
v | Dj |
Info A ( D) = Info( D j )
j =1 |D|
1
Attribute Selection: Information Gain
The expected information needed to classify a tuple in D:
m
3 3 3 3 4 4
Info( D) = − pi log 2 ( pi ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 1.571
i =1 10 10 10 10 10 10
2
Attribute Selection
Sore Strep Allergy Cold
throat (ST) throat
means “ST=yes” = “ST=no”
Yes 2 1 2 = 5 out of 10 samples
No 1 2 2
𝐼𝑛𝑓𝑜𝑆𝑇 𝐷
5 2 2 1 1 2 2 5 1 1 2 2 2 2
= × − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 + × − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
10 5 5 5 5 5 5 10 5 5 5 5 5 5
3
(ii) Fever
Fever(F) Strep Allergy Cold
throat
Yes 1 0 4
No 2 3 0
𝐼𝑛𝑓𝑜𝐹 𝐷
5 1 1 4 4 5 2 2 3 3
= 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 + 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5
= 0.5 × 0.722 + 0.5 × 0.971
= 0.85 bits
4
Attribute Selection: Information Gain
(iii) Swollen Glands
Swollen Strep Allergy Cold
Glands(SG) throat
Yes 3 0 0
No 0 3 4
𝐼𝑛𝑓𝑜𝑆𝐺 𝐷
3 3 3 7 3 3 4 4
= 10 × − 3 𝑙𝑜𝑔2 3 + 10 × − 7 𝑙𝑜𝑔2 7 − 7 𝑙𝑜𝑔2 7
5
Attribute Selection: Information Gain
(iv) Congestion
Congestion Strep Allergy Cold
throat
Yes 1 3 4
No 2 0 0
𝐼𝑛𝑓𝑜𝐶 𝐷
8 1 1 3 3 4 4 2 2 2
= 10 × − 8 𝑙𝑜𝑔2 8 − 8 𝑙𝑜𝑔2 8 − 8 𝑙𝑜𝑔2 8 + 10 × − 2 𝑙𝑜𝑔2 2
= 0.8 × 1.405 + 0
= 1.124 bits
6
Attribute Selection: Information Gain
(v) Headache
Congestion Strep Allergy Cold
throat
Yes 1 2 2
No 2 1 2
𝐼𝑛𝑓𝑜𝐻 𝐷
5 1 1 2 2 2 2 5 2 2 1 1 2 2
= 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 + 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5
7
Highest
value ∴
root
node
Swollen Glands
No Yes
Strep Throat
Fever
No Yes
Allergy Cold
8
Decision Tree Classification
11) For the given test record, determine, using Naive Bayes Classifier, if the weather is suitable for playing
golf. X = (outlook = sunny, temperature = mild , humidity = normal, wind = true)
Yes = 9
No = 5
-----
14