0% found this document useful (0 votes)

25 views4 pages

Decision Tree

The document discusses decision tree classifiers and includes code to implement one. It imports necessary libraries, loads diabetes patient data from a CSV file, defines functions to calculate entropy and information gain, sets the features and label, and includes a recursive function to create the decision tree by splitting on the feature with the highest information gain at each node until it reaches pure leaf nodes or runs out of features.

Uploaded by

Dishant kumar yadav mhakhariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views4 pages

Decision Tree

Uploaded by

Dishant kumar yadav mhakhariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Decision tree classifier

Dishant Kumar Yadav 2021BCS0136

Implementation:

General Terms: Let us first discuss a few statistical concepts used in this post.

Entropy: The entropy of a dataset, is a measure the impurity, of the dataset Entropy can also be
thought, as a measure of uncertainty. We should try to minimize, the Entropy. The goal of
machine learning models is to reduce uncertainty or entropy, as far as possible.

Information Gain: Information gain, is a measure of, how much information, a feature gives us
about the classes. Decision Trees algorithm, will always try, to maximize information gain.
Feature, that perfectly partitions the data, should give maximum information. A feature, with the
highest Information gain, will be used for split first.

keyboard_arrow_down Import Libraries:

We are going to import NumPy and the pandas library.

# Import the required libraries

import pandas as pd
import numpy as np

from google.colab import files

uploaded = files.upload()

Choose Files diabetes11.csv

diabetes11.csv(text/csv) - 7491 bytes, last modified: 17/1/2024 - 100% done
Saving diabetes11.csv to diabetes11.csv

import shutil

# Assuming the file name is 'diabetes.csv'

shutil.move('diabetes11.csv', '/content/diabetes11.csv')
'/content/diabetes11.csv'

import os

# List files in the /content directory

os.listdir('/content')

['.config',
'diabetes (1).csv',
'diabetes.csv',
'diabetes11.csv',
'sample_data']

import pandas as pd

# Read the CSV file into a DataFrame

df = pd.read_csv('/content/diabetes11.csv')

# Display the first few rows of the DataFrame

df.head()
1 to 5 of 5 entries Filter
index Glucose BloodPressure diabetes
0 148 72 1
1 85 66 0
2 183 64 1
3 89 66 0
4 137 40 1
Show 25 per page

Like what you see? Visit the data table notebook to learn more about interactive tables.

Distributions

2-d distributions

Time series

# Define the calculate entropy function

def calculate_entropy(df_label):
classes,class_counts = np.unique(df_label,return_counts = True)
entropy_value = np.sum([(-class_counts[i]/np.sum(class_counts))*np.log2(class_counts
for i in range(len(classes))])
return entropy_value
# Define the calculate information gain function
def calculate_information_gain(dataset,feature,label):
# Calculate the dataset entropy
dataset_entropy = calculate_entropy(dataset[label])
values,feat_counts= np.unique(dataset[feature],return_counts=True)

# Calculate the weighted feature entropy # Call the ca

weighted_feature_entropy = np.sum([(feat_counts[i]/np.sum(feat_counts))*calculate_ent
==values[i]).dropna()[label]) for i in range(len(values))]
feature_info_gain = dataset_entropy - weighted_feature_entropy
return feature_info_gain
# Set the features and label
features = df.columns[:-1]
label = 'diabetes'
parent=None
features

Index(['Glucose', 'BloodPressure'], dtype='object')

import numpy as np

def create_decision_tree(dataset, df, features, label, parent=None):

datum = np.unique(df[label], return_counts=True)
unique_data = np.unique(dataset[label])

if len(unique_data) <= 1:
return unique_data[0]

elif len(dataset) == 0:
return unique_data[np.argmax(datum[1])]

elif len(features) == 0:
return parent

else:
parent = unique_data[np.argmax(datum[1])]

# call the calculate_information_gain function

item_values = [calculate_information_gain(dataset, feature, label) for feature in
optimum_feature = features[np.argmax(item_values)]

8.Program Decisiontree
No ratings yet
8.Program Decisiontree
15 pages
Experiment 8 ml vtu
No ratings yet
Experiment 8 ml vtu
4 pages
Bagging Codes
No ratings yet
Bagging Codes
1 page
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
End to End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End to End Project Multiple Disease Detection Using ML - Nomidl
24 pages
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
No ratings yet
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
71 pages
Decision Tree Code Explanation
No ratings yet
Decision Tree Code Explanation
4 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Machine Learning Practical
No ratings yet
Machine Learning Practical
59 pages
20MIS7043 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7043 (LAB 7) .Ipynb Colaboratory
4 pages
20MIS7095 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7095 (LAB 7) .Ipynb Colaboratory
4 pages
Lesson 5 - Supervised Learning-Classification
100% (1)
Lesson 5 - Supervised Learning-Classification
91 pages
Practical 9 Decision Tree Classification
No ratings yet
Practical 9 Decision Tree Classification
24 pages
C2_W4_Lab_02_Tree_Ensemble
No ratings yet
C2_W4_Lab_02_Tree_Ensemble
10 pages
Lecture 15: Tree-Based Algorithms — Applied ML
No ratings yet
Lecture 15: Tree-Based Algorithms — Applied ML
17 pages
Day 39
No ratings yet
Day 39
6 pages
pyhton 2
No ratings yet
pyhton 2
8 pages
Stochastic and Overfitting
No ratings yet
Stochastic and Overfitting
19 pages
23ucc554aiml
No ratings yet
23ucc554aiml
5 pages
practical 15 python
No ratings yet
practical 15 python
6 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
DMBI
No ratings yet
DMBI
15 pages
Decision tree classifier
No ratings yet
Decision tree classifier
3 pages
Experiment 8
No ratings yet
Experiment 8
14 pages
Prediction of Diabetes using Machine Learning
No ratings yet
Prediction of Diabetes using Machine Learning
4 pages
phyton
No ratings yet
phyton
10 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
16 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
mlPPT_11_45
No ratings yet
mlPPT_11_45
31 pages
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
No ratings yet
Federated Learning With Non-IID Data: Yue Zhao Meng Li Liangzhen Lai
12 pages
Decision Tree Classification
No ratings yet
Decision Tree Classification
1 page
Aih Exp 2
No ratings yet
Aih Exp 2
8 pages
Session 2 Machine Learning Execution
No ratings yet
Session 2 Machine Learning Execution
12 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
Ensemblediabetes - Ipynb - Colab
No ratings yet
Ensemblediabetes - Ipynb - Colab
4 pages
Diabetes Classification Report
No ratings yet
Diabetes Classification Report
17 pages
Vertopal.com C2 W4 Decision Tree With Markdown
No ratings yet
Vertopal.com C2 W4 Decision Tree With Markdown
14 pages
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
No ratings yet
14MachineLearningDecisionTreeRandomForest - Ipynb - Colaboratory
29 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Class Assignment
No ratings yet
Class Assignment
8 pages
ML0101EN Clas Decision Trees Drug Py v1
No ratings yet
ML0101EN Clas Decision Trees Drug Py v1
12 pages
ML LAb Task
No ratings yet
ML LAb Task
4 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Your First Deep Learning Project in Python With Keras Step-By-Step
No ratings yet
Your First Deep Learning Project in Python With Keras Step-By-Step
229 pages
1694600905-Unit2.4 Decision Tree CU 2.0
No ratings yet
1694600905-Unit2.4 Decision Tree CU 2.0
29 pages
Data Mining Journal 4 Kashan
No ratings yet
Data Mining Journal 4 Kashan
8 pages
Lecture 2 - Barriers To Communication
No ratings yet
Lecture 2 - Barriers To Communication
11 pages
IT0089 TB391 Decision Tree RABE
No ratings yet
IT0089 TB391 Decision Tree RABE
6 pages
PR 6
No ratings yet
PR 6
2 pages
CS440: HW3
No ratings yet
CS440: HW3
7 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Kalasalingam Academy of Research and Education (Deemed To Be University) Anand Nagar, Krishnankoil - 626126
No ratings yet
Kalasalingam Academy of Research and Education (Deemed To Be University) Anand Nagar, Krishnankoil - 626126
60 pages
Decision Trees
No ratings yet
Decision Trees
28 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision tree classifier

Dishant Kumar Yadav 2021BCS0136

keyboard_arrow_down Import Libraries:

# Import the required libraries

from google.colab import files

Choose Files diabetes11.csv

# Assuming the file name is 'diabetes.csv'

# List files in the /content directory

# Read the CSV file into a DataFrame

# Display the first few rows of the DataFrame

# Define the calculate entropy function

# Calculate the weighted feature entropy # Call the ca

Index(['Glucose', 'BloodPressure'], dtype='object')

def create_decision_tree(dataset, df, features, label, parent=None):

# call the calculate_information_gain function

You might also like