0% found this document useful (0 votes)
3 views13 pages

Dataset

The document outlines a project on diabetes prediction using a dataset from the Pima Indians Diabetes Database. It discusses the importance of diabetes, the objectives of the project, data cleaning, exploration, feature engineering, and predictive modeling techniques employed. The aim is to predict diabetes presence based on various medical measurements and to identify key features indicative of the disease.

Uploaded by

jeylan2045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Dataset

The document outlines a project on diabetes prediction using a dataset from the Pima Indians Diabetes Database. It discusses the importance of diabetes, the objectives of the project, data cleaning, exploration, feature engineering, and predictive modeling techniques employed. The aim is to predict diabetes presence based on various medical measurements and to identify key features indicative of the disease.

Uploaded by

jeylan2045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

1

ARSI UNIVERSITY
COLLGE OF BUSINSS AND ECONOMICS
DEPARTEMENT OF MANAGEMENT INFORMATION SYSTEM
PROJECT TITLE: DATA SCIENCE ON DIABETES PREDICTIONM
DATASET
Presented by: Group A student
2
INRODUCTION

 According to WHO, Diabetes is a chronic disease that occurs either when the
pancreas does not produce enough insulin or when the body cannot
effectively use the insulin it produces.
 Insulin is a hormone that regulates blood sugar.
 Hyperglycemia, or raised blood sugar, is a common effect of uncontrolled
diabetes and over time leads to serious damage to many of the body's
systems, especially the nerves and blood vessels.
3
CON…..

 Diabetes is a health condition that affects how your body turns food into energy.
 Most of the food you eat is broken down into sugar (also called glucose) and
released into your bloodstream.
 When your blood sugar goes up, it signals your pancreas to release insulin.
 Without ongoing, careful management, diabetes can lead to a buildup of sugars in
the blood, which can increase the risk of dangerous complications, including stroke
and heart disease.
 So that I decide to predict using Machine Learning in Python
4
Problem Statement/business understanding

 Diabetes dataset is to diagnostically predict whether or not a patient has diabetes, based on certain
diagnostic measurements included in the dataset.
 Several constraints were placed on the selection of these instances from a larger database. In particular, all
patients here are females at least 21 years old of Pima Indian Heritage.
 To know the impact of Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI and Diabetes
Pedigree Function based on available data.
 Based on regression analysis we predict the relationship between dependent variable(diabetes) and
independent variable (pregnancy, glucose, age, BMI....
5
Objectives

 Predict if person is diabetes patient or not


 To experiment with different classification methods to see which yields
the highest accuracy
 Classify whether someone has diabetes or not from given features
 To determine which features are the most indicative of diabetes
6
Data mining

 Data Set the dataset collected is originally from the Pima Indians
Diabetes Database is available on Kaggle.
 It consists of several medical analyst variables and one target variable.
 The objective of the dataset is to predict whether the patient has diabetes or not.
 The dataset consists of several independent variables and one dependent
variable.
7
CON…..

 Independent variables include the number of pregnancies the patient has


had their BMI,insulin level, age, and
 In this project i used Pima Indians Diabetes Database from Kaggle.
 This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
8
Data Cleaning

 We saw on df.head() that some features contain 0, it doesn't make sense here
and this indicates missing value Below we replace 0 value by Null:
 This part contain cleaning and preparing the data
 Under this Fix the inconsistencies within the data, handle missing values,
and treat data with principles of collinearity
 We observed that there is no missing values in dataset however the features
like Glucose, BloodPressure, Insulin, SkinThickness has 0 values which is
not possible.
9
Data Exploration

 This stage is all about building a model that best solves your problem.
 This stage always begins with a process called Data Splicing, where you split
your entire data set into two proportions.
 One for training the model (training data set) and the other for testing the
efficiency of the model (testing data set).
 This is followed by building the model by using the training data set and finally
evaluating the model by using the test data set.
10
Feature Engineering

Under feature engineering we use feature selection we select data train data and test
data
 Now, it’s time to add important features to the dataset discover some effective
features before fitting it into machine learning models
11
Predictive modeling

 In our proposed predictive model we have done pre- processing of raw data and
different feature engineering techniques to get better results.
 Algorithm is used for feature selection as it provides unbiased selection of
important features and unimportant features from an information system.
 Training of raw data after feature engineering has a significant role in supervised
learning.
 We have used highly correlated variables for better outcomes.
 Input data, here indicates to test data used for predict and confusion matrix
8.Data Visualization 12

 Visualize our data using Python notebook like Jupiter by


using interactive libraries and plotting different graphs
13

THANK
YOU FOR
YOUR
ATTENTI
ON !!

You might also like