DMBI
DMBI
PROBLEM STATEMENT:
"Predicting Diabetes Onset: Utilizing a dataset containing various health
indicators, including pregnancies, glucose levels, blood pressure, skin thickness,
insulin levels, BMI, pedigree function, and age, we aim to develop predictive
models that can effectively identify individuals at risk of developing diabetes. By
employing machine learning algorithms such as Bagging Classifier and Gradient
Boosting Classifier, trained on a labeled dataset, we seek to build robust models
capable of accurately classifying individuals into diabetic and non-diabetic groups.
The ultimate goal is to develop a predictive tool that can assist healthcare
professionals in early detection and intervention for individuals predisposed to
diabetes, thereby improving health outcomes and reducing the burden of the
disease.
Data Pre-processing:
Data Splitting:
Both bagging and boosting require splitting the data into training and testing sets.
The training set is used to build the ensemble models.
The testing set is used to evaluate the final predictions from the ensemble.
Ensemble Methods:
As we can see all the null values are getting removed from the data
Then The training of data is done:
E) NOW THE ALGORITHM HERE , WE ARE IMPLEMENTING IS
1) Bagging Classifier with Decision Trees:
The Bagging Classifier is utilized with Decision Trees as base estimators.
This ensemble method combines multiple decision tree models trained on
different subsets of the training data, with replacement. The final prediction
is typically determined by averaging the predictions of individual trees (for
regression) or by taking a majority vote (for classification).
It's implemented using BaggingClassifier from the sklearn.ensemble module.
These lines create a Bagging Classifier with a Decision Tree base estimator, fit it to
the training data, and then evaluate its performance on both training and testing
data using the evaluate function.
2. Building and Evaluating Gradient Boosting Classifier:
Similarly, these lines create a Gradient Boosting Classifier, fit it to the training
data, and evaluate its performance on both training and testing data using the
evaluate function.
3. ) Visualizing Results:
scores_df.plot(kind='barh', figsize=(15, 8))
F.) TO IDENTIFY FROM BOTH MODELS WHICH PERFORMS THE
BEST:
The use of all the model performance measures, including confusion matrix,
accuracy score, and classification report, is done within the evaluate function for
both the training and testing sets. The evaluate function calculates these measures
for each model (Bagging Classifier and Gradient Boosting Classifier) and prints
them out for analysis.
In the provided code, the use of all the model performance measures, including
confusion matrix, accuracy score, and classification report, is done within the
evaluate function for both the training and testing sets. The evaluate function
calculates these measures for each model (Bagging Classifier and Gradient
Boosting Classifier) and prints them out for analysis.
Bagging Classifier: