C4.
5 Decision Tree Step-by-Step Calculations
Step 1: Entropy Calculation
The dataset contains 6 'Pass' results and 4 'Fail' results.
Total examples: 10
Entropy formula:
H(S) = -(p_+ log2(p_+)) - (p_- log2(p_-))
p_+ = 6/10 = 0.6 (Pass), p_- = 4/10 = 0.4 (Fail)
H(S) = -(0.6 log2(0.6)) - (0.4 log2(0.4))
H(S) = 0.971
Step 2: Information Gain Calculations
1. Assessment (Good, Average, Poor)
- Good: 6 examples (5 Pass, 1 Fail)
- Average: 3 examples (1 Pass, 2 Fail)
- Poor: 1 example (0 Pass, 1 Fail)
Entropy for 'Good' subset:
H(Good) = -(5/6 log2(5/6)) - (1/6 log2(1/6)) = 0.650
Entropy for 'Average' subset:
H(Average) = -(1/3 log2(1/3)) - (2/3 log2(2/3)) = 0.918
Entropy for 'Poor' subset:
H(Poor) = 0 (since all are Fail)
Weighted Entropy for 'Assessment':
H(Assessment) = (6/10) * 0.650 + (3/10) * 0.918 + (1/10) * 0
H(Assessment) = 0.665
Information Gain for 'Assessment':
IG(Assessment) = 0.971 - 0.665 = 0.306
2. Assignment (Yes, No)
- Yes: 6 examples (5 Pass, 1 Fail)
- No: 4 examples (1 Pass, 3 Fail)
Entropy for 'Yes' subset:
H(Yes) = -(5/6 log2(5/6)) - (1/6 log2(1/6)) = 0.650
Entropy for 'No' subset:
H(No) = -(1/4 log2(1/4)) - (3/4 log2(3/4)) = 0.811
Weighted Entropy for 'Assignment':
H(Assignment) = (6/10) * 0.650 + (4/10) * 0.811
H(Assignment) = 0.714
Information Gain for 'Assignment':
IG(Assignment) = 0.971 - 0.714 = 0.257
3. Project (Yes, No)
- Yes: 5 examples (4 Pass, 1 Fail)
- No: 5 examples (2 Pass, 3 Fail)
Entropy for 'Yes' subset:
H(Yes) = -(4/5 log2(4/5)) - (1/5 log2(1/5)) = 0.722
Entropy for 'No' subset:
H(No) = -(2/5 log2(2/5)) - (3/5 log2(3/5)) = 0.971
Weighted Entropy for 'Project':
H(Project) = (5/10) * 0.722 + (5/10) * 0.971
H(Project) = 0.846
Information Gain for 'Project':
IG(Project) = 0.971 - 0.846 = 0.125
4. Seminar (Good, Poor, Fair)
- Good: 4 examples (4 Pass, 0 Fail)
- Poor: 3 examples (1 Pass, 2 Fail)
- Fair: 3 examples (1 Pass, 2 Fail)
Entropy for 'Good' subset:
H(Good) = 0 (since all are Pass)
Entropy for 'Poor' subset:
H(Poor) = -(1/3 log2(1/3)) - (2/3 log2(2/3)) = 0.918
Entropy for 'Fair' subset:
H(Fair) = -(1/3 log2(1/3)) - (2/3 log2(2/3)) = 0.918
Weighted Entropy for 'Seminar':
H(Seminar) = (4/10) * 0 + (3/10) * 0.918 + (3/10) * 0.918
H(Seminar) = 0.550
Information Gain for 'Seminar':
IG(Seminar) = 0.971 - 0.550 = 0.421
Step 3: Choose the Best Attribute
The attribute with the highest information gain is 'Seminar' with IG = 0.421. Thus, 'Seminar' is
chosen as the root node.