Lab_Manual2 (2)
Lab_Manual2 (2)
Practical: 6
Aim: Write a program to demonstrate the working of the decision tree-based
ID3 algorithm.
The ID3 algorithm is a popular algorithm used to create a Decision Tree by selecting the
attribute that maximizes the information gain at each node. This program will:
1. Build a simple dataset.
2. Implement the ID3 algorithm to construct the decision tree.
3. Print the tree structure
Explanation:
1. Entropy Calculation:
○ Measures the uncertainty in the dataset.
2. Information Gain:
○ Measures the reduction in entropy after splitting the dataset on an attribute.
3. ID3 Algorithm:
○ Recursively selects the best attribute to split the data and builds the tree.
The ID3 (IterativeDichotomiser3) algorithm is a popular method used to build Decision Trees.
It splits the data based on the attribute that provides the highest Information Gain at each
step, and continues until the data is fully classified or no further splits can be made.
Steps:
1. Dataset:
Enrollment No:2203031050150 26 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem
2. Entropy:
Entropy is a measure of uncertainty or impurity in the dataset. It tells us how mixed or pure
a dataset is with respect to its labels. The formula for entropy is:
Where:
● Pi is the probability of label i.
3. Information Gain:
Information Gain is the reduction in entropy after a dataset is split on an attribute. It is given
by
Enrollment No:2203031050150 27 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem
Summary of Steps:
1. Calculate Entropy to measure uncertainty.
2. Calculate Information Gain for each attribute to find the best split.
3. Split the Dataset on the attribute with the highest gain.
4. Build the Tree recursively until all data is classified or no further splits can be made.
Code:
import math
import pandas as pd
def entropy(data):
Enrollment No:2203031050150 28 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem
def best_attribute_to_split(data):
attributes = data.columns[:-1]
gains = {attribute: information_gain(data, attribute) for attribute in attributes}
return max(gains, key=gains.get)
Enrollment No:2203031050150 29 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem
values = data[best_attr].unique()
for value in values:
subset = data[data[best_attr] == value]
if entropy(subset) == 0:
Enrollment No:2203031050150 30 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem
'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
df = pd.DataFrame(data)
decision_tree = id3(df)
print("Decision Tree:")
print_tree(decision_tree)
Output:
Enrollment No:2203031050150 31 | P a g e
Name: Yash Chudgar