0% found this document useful (0 votes)
6 views

Lab_Manual2 (2)

Uploaded by

kashish bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lab_Manual2 (2)

Uploaded by

kashish bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Faculty of Engineering& Technology

Subject-Name: Machine Learning Laboratory


Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

Practical: 6
Aim: Write a program to demonstrate the working of the decision tree-based
ID3 algorithm.
The ID3 algorithm is a popular algorithm used to create a Decision Tree by selecting the
attribute that maximizes the information gain at each node. This program will:
1. Build a simple dataset.
2. Implement the ID3 algorithm to construct the decision tree.
3. Print the tree structure
Explanation:
1. Entropy Calculation:
○ Measures the uncertainty in the dataset.
2. Information Gain:
○ Measures the reduction in entropy after splitting the dataset on an attribute.
3. ID3 Algorithm:
○ Recursively selects the best attribute to split the data and builds the tree.
The ID3 (IterativeDichotomiser3) algorithm is a popular method used to build Decision Trees.
It splits the data based on the attribute that provides the highest Information Gain at each
step, and continues until the data is fully classified or no further splits can be made.

Steps:
1. Dataset:

Outlook Temperature Humidity Windy PlayTennis

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rain Mild High False Yes

Enrollment No:2203031050150 26 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

Rain Cool Normal False Yes

Rain Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rain Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rain Mild High True No

2. Entropy:
Entropy is a measure of uncertainty or impurity in the dataset. It tells us how mixed or pure
a dataset is with respect to its labels. The formula for entropy is:

Where:
● Pi is the probability of label i.

3. Information Gain:
Information Gain is the reduction in entropy after a dataset is split on an attribute. It is given
by

Enrollment No:2203031050150 27 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

4. Choosing the Best Attribute to Split:


The attribute with the highest information gain is chosen as the best attribute to split the
data.

5. Building the Decision Tree (ID3 Algorithm):


The ID3 algorithm builds the decision tree recursively:
1. Select the best attribute to split the data.
2. Create a node for the attribute.
3. Split the data into subsets based on the values of the attribute.
4. Repeat the process for each subset.

6. Displaying the Decision Tree:


The tree is printed in a readable format using a recursive function.

Summary of Steps:
1. Calculate Entropy to measure uncertainty.
2. Calculate Information Gain for each attribute to find the best split.
3. Split the Dataset on the attribute with the highest gain.
4. Build the Tree recursively until all data is classified or no further splits can be made.

Code:
import math
import pandas as pd

def entropy(data):

Enrollment No:2203031050150 28 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

labels = data.iloc[:, -1]


label_counts = labels.value_counts()
probabilities = label_counts / len(labels)
return -sum(probabilities * probabilities.apply(lambda x: math.log2(x) if x > 0 else 0))

def information_gain(data, split_attribute):


total_entropy = entropy(data)
values = data[split_attribute].unique()
weighted_entropy = 0
for value in values:
subset = data[data[split_attribute] == value]
subset_entropy = entropy(subset)
weighted_entropy += (len(subset) / len(data)) * subset_entropy

return total_entropy - weighted_entropy

def best_attribute_to_split(data):
attributes = data.columns[:-1]
gains = {attribute: information_gain(data, attribute) for attribute in attributes}
return max(gains, key=gains.get)

def id3(data, tree=None):


best_attr = best_attribute_to_split(data)
if tree is None:
tree = {}
tree[best_attr] = {}

Enrollment No:2203031050150 29 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

values = data[best_attr].unique()
for value in values:
subset = data[data[best_attr] == value]
if entropy(subset) == 0:

tree[best_attr][value] = subset.iloc[0, -1]


else:
tree[best_attr][value] = id3(subset.drop(columns=[best_attr]))
return tree

def print_tree(tree, indent=""):


if not isinstance(tree, dict):
print(indent + "Label:", tree)
return
for attribute, sub_tree in tree.items():
print(indent + attribute)
for value, branch in sub_tree.items():
print(indent + f" {value}:")
print_tree(branch, indent + " ")
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain',
'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild',
'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal',
'Normal', 'Normal', 'High', 'Normal', 'High'],
'Windy': [False, True, False, False, False, True, True, False, False, False, True, True, False,
True],

Enrollment No:2203031050150 30 | P a g e
Name: Yash Chudgar
Faculty of Engineering& Technology
Subject-Name: Machine Learning Laboratory
Subject-Code:303105354
B.Tech – 3rd Year – 6th Sem

'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}
df = pd.DataFrame(data)
decision_tree = id3(df)
print("Decision Tree:")
print_tree(decision_tree)
Output:

Enrollment No:2203031050150 31 | P a g e
Name: Yash Chudgar

You might also like