0% found this document useful (0 votes)

56 views6 pages

Class Assignment On Decision Trees

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views6 pages

Class Assignment On Decision Trees

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Class Assignment on Decision Trees

Name: Ansari Mohammed Shanouf Valijan

Class: B.E. Computer Engineering, Semester - VII
UID: 2021300004
Batch: Monday

Aim:
To implement decision trees for regression analysis on a healthcare dataset.

Dataset Description:
Here, in order to construct the decision tree, the Body Mass Index Detection dataset was
utilized.
(https://www.kaggle.com/datasets/sayanroy058/body-mass-index-detection)

The idea was to predict the BMI of a person given his/her age, weight, bio-impudence and
gender. The dataset has about 741 records.

Implementation:
Following is a step-by-step implementation of the task at hand-
Link to Notebook -> DecisionTreeRegression

Importing the necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
import seaborn as sns

Importing the dataset

df = pd.read_csv('/content/Body Mass Index.csv')

Dropping irrelevant columns and encoding the categorical columns

df = df.drop(columns=['BmiClass'])
label_encoder = LabelEncoder()

df['Gender_encoded'] = label_encoder.fit_transform(df['Gender'])
df = df.drop(columns=['Gender'])

Visualizing the various features of the dataset to better understand it

numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns

for col in numeric_columns:

plt.figure(figsize=(8, 4))
sns.histplot(df[col], kde=True, bins=30)
plt.title(f'Distribution of {col}')
plt.show()

categorical_columns = df.select_dtypes(include=['object']).columns

for col in categorical_columns:

plt.figure(figsize=(8, 4))
sns.countplot(data=df, x=col)
plt.title(f'Count of {col}')
plt.show()
Viewing the correlation among different features present in the dataset
corr_matrix = df.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()
The above plot clearly depicts a high dependence of BMI on weight, which is quite logical.
Further, height shows a correlation almost half as strong as weight, still an important factor
to take into consideration. Age seems to have the least positive correlation with the BMI.

Viewing pair-wise plots

sns.pairplot(df, hue='Bmi')
plt.show()

In the above plots, darker hues (purple in colour) depict higher BMI values and as can be
observed, almost all features with values towards higher end are pointing towards a high
BMI value. An exception to this is the Bio Impudence v/s Height plot where high BMI values
seem to be scattered.

Splitting the processed and analysed dataset into train and test sets
X = df.drop(columns='Bmi')
y = df['Bmi']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Defining the decision tree regressor model and training it (parameters were chosen after
experimenting with different configurations and choosing the ones that avoided overfitting)
regressor = DecisionTreeRegressor(
max_depth=25,
min_samples_split=40,
min_samples_leaf=15,
max_features='sqrt',
random_state=10
)
regressor.fit(X_train, y_train)

Evaluating the model

y_pred = regressor.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error (MAE): {mae}")

print(f"Mean Squared Error (MSE): {mse}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R-squared (R^2): {r2}")

Following performance parameters were obtained on training dataset-

Mean Absolute Error (MAE): 1.85
Mean Squared Error (MSE): 10.16
Root Mean Squared Error (RMSE): 3.19
R-squared (R^2): 0.89

Following performance parameters were obtained on test dataset-

Mean Absolute Error (MAE): 2.1160518106723467
Mean Squared Error (MSE): 10.597756621559329
Root Mean Squared Error (RMSE): 3.255419576883958
R-squared (R^2): 0.8517373327150053

Printing the decision tree as hypothesized

plt.figure(figsize=(20, 10))
plot_tree(regressor,
feature_names=X.columns,
filled=True,
rounded=True,)
plt.title('Decision Tree Visualization')
plt.show()

Decision tree that was hypothesized for the regression task is as follows-

Conclusion:
By implementing the assigned task, I was able to brush up on the basic concepts associated
with building a decision tree. I was able to build, train and test the tree in python and was
able to come up with the following inferences-
 For the assigned regression task, the analysis, logically, entailed a heavy dependence
on weight and height as features for the prediction of body mass index of an
individual.
 The model trained initially had a test r-square value of 0.98 which was identified as
overfitting. The rectified model, then, had the test r-square value of around 0.8517
while the r-square value on training data was approximately 0.89.

Introductory Econometrics Test Bank
100% (2)
Introductory Econometrics Test Bank
106 pages
Baitap 5
No ratings yet
Baitap 5
1 page
Cardiovascular Disease Prediction
No ratings yet
Cardiovascular Disease Prediction
2 pages
DT R
No ratings yet
DT R
2 pages
ML Lab Exp
No ratings yet
ML Lab Exp
7 pages
MlProject Cse 30 37
No ratings yet
MlProject Cse 30 37
27 pages
PR 6
No ratings yet
PR 6
2 pages
Aih Exp 2
No ratings yet
Aih Exp 2
8 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Chatbot For Prediction of Weight and BMI
No ratings yet
Chatbot For Prediction of Weight and BMI
3 pages
AI Lab9 22it3044
No ratings yet
AI Lab9 22it3044
21 pages
4164 ML-Assignment
No ratings yet
4164 ML-Assignment
4 pages
Macine Resit
No ratings yet
Macine Resit
7 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
Step 1
No ratings yet
Step 1
10 pages
Identification of Malnutrition and Prediction of BMI From Facial Images Using Machine Learning
No ratings yet
Identification of Malnutrition and Prediction of BMI From Facial Images Using Machine Learning
51 pages
Assign2 01clc.06 Duongmt
No ratings yet
Assign2 01clc.06 Duongmt
23 pages
Expt7 ML2025 250306 143857
No ratings yet
Expt7 ML2025 250306 143857
5 pages
Simple Linear Regression - Assign
No ratings yet
Simple Linear Regression - Assign
8 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Trees - Regression - Ipynb - Colab
No ratings yet
Trees - Regression - Ipynb - Colab
4 pages
SUMMARY
No ratings yet
SUMMARY
16 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
Lecture 15: Tree-Based Algorithms - Applied ML
No ratings yet
Lecture 15: Tree-Based Algorithms - Applied ML
17 pages
Data Mining Journal 4 Kashan
No ratings yet
Data Mining Journal 4 Kashan
8 pages
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
No ratings yet
Prediction of Heart Disease Using Decision Tree in Comparison With KNN To Improve Accuracy
5 pages
IT0089 TB391 Decision Tree RABE
No ratings yet
IT0089 TB391 Decision Tree RABE
6 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Heart Disease Predictor - ML - Report
No ratings yet
Heart Disease Predictor - ML - Report
15 pages
Prediction of Obesity Level Based On Lifestyle and Eating Habits Data
No ratings yet
Prediction of Obesity Level Based On Lifestyle and Eating Habits Data
4 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
Health Analysis and Diet Recommendation System
No ratings yet
Health Analysis and Diet Recommendation System
9 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Report - SVM
No ratings yet
Report - SVM
13 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Task 2
No ratings yet
Task 2
4 pages
Calories Burnt Prediction Report Detailed
No ratings yet
Calories Burnt Prediction Report Detailed
13 pages
FULLTEXT012
No ratings yet
FULLTEXT012
126 pages
Topic 6 - FE, RE and Tests
No ratings yet
Topic 6 - FE, RE and Tests
46 pages
Principles of Econometrics 4th Edition Carter Hill
No ratings yet
Principles of Econometrics 4th Edition Carter Hill
302 pages
Ridge Regression - Ryan
No ratings yet
Ridge Regression - Ryan
22 pages
Chris Brooks Chapter 3 Slides
No ratings yet
Chris Brooks Chapter 3 Slides
80 pages
Sen 1968
No ratings yet
Sen 1968
12 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
BCSP241006 BCS221016 BCS221023 Report
No ratings yet
BCSP241006 BCS221016 BCS221023 Report
38 pages
Tutorial - Minitab - Part II - 2011
No ratings yet
Tutorial - Minitab - Part II - 2011
18 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Rapport
No ratings yet
Rapport
21 pages
Body Fat Prediction
No ratings yet
Body Fat Prediction
11 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
Experiment 8
No ratings yet
Experiment 8
13 pages
Experiment 2
No ratings yet
Experiment 2
12 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Levine Bsfc7ge ch12
No ratings yet
Levine Bsfc7ge ch12
84 pages
Decision Trees
No ratings yet
Decision Trees
28 pages
Chapter 3 (PR)
No ratings yet
Chapter 3 (PR)
29 pages
Experiment 5
No ratings yet
Experiment 5
14 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Experiment 1
No ratings yet
Experiment 1
21 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
Callaway SantAnna 2020
No ratings yet
Callaway SantAnna 2020
45 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Indian Statistical Institute: Student's Brochure
No ratings yet
Indian Statistical Institute: Student's Brochure
47 pages
Tugas Praktikum Ekonometrika 2
No ratings yet
Tugas Praktikum Ekonometrika 2
18 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Example E7.2
No ratings yet
Example E7.2
17 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Practice 2+
No ratings yet
Practice 2+
25 pages
Experiment 3
No ratings yet
Experiment 3
9 pages
Experiment 1
No ratings yet
Experiment 1
16 pages
University of California Los Angeles
No ratings yet
University of California Los Angeles
45 pages
Experiment 7
No ratings yet
Experiment 7
13 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
Generalized Linear Models For Categorical and Continuous Limited Dependent Variables Michael Smithson Instant Download
No ratings yet
Generalized Linear Models For Categorical and Continuous Limited Dependent Variables Michael Smithson Instant Download
91 pages
BUSI 650 - Final Exam
No ratings yet
BUSI 650 - Final Exam
14 pages
Unit 9 Progress Check - FRQ Scoring Guide
No ratings yet
Unit 9 Progress Check - FRQ Scoring Guide
6 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Quiz Feedback 2
No ratings yet
Quiz Feedback 2
6 pages
Using SPSS For Multiple Regression: UDP 520 Lab 7 Lin Lin December 4, 2007
No ratings yet
Using SPSS For Multiple Regression: UDP 520 Lab 7 Lin Lin December 4, 2007
20 pages
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
P&S Unit-5 SU
No ratings yet
P&S Unit-5 SU
4 pages
The Gauss Markov Theorem
100% (2)
The Gauss Markov Theorem
17 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Final Exam - Sample Test
No ratings yet
Final Exam - Sample Test
6 pages
1 Forecasting-Questions
No ratings yet
1 Forecasting-Questions
4 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Jurnal 1 Lingkungan Kerja
No ratings yet
Jurnal 1 Lingkungan Kerja
7 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
20MIS7095 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7095 (LAB 7) .Ipynb Colaboratory
4 pages
20MIS7043 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7043 (LAB 7) .Ipynb Colaboratory
4 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
QUE: (7F) With (7A) : AT & T Wireless Case
No ratings yet
QUE: (7F) With (7A) : AT & T Wireless Case
8 pages
A New Two Parameter Shanker Distribution and Its Properties
No ratings yet
A New Two Parameter Shanker Distribution and Its Properties
10 pages
FAT 2 - Sample Solutions
No ratings yet
FAT 2 - Sample Solutions
4 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
26th FEB Assignment 1
No ratings yet
26th FEB Assignment 1
2 pages
Second Progres Report
No ratings yet
Second Progres Report
10 pages