0% found this document useful (0 votes)
15 views

Classification of Diabetes Using Deep Learning

Uploaded by

Kala Hariharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Classification of Diabetes Using Deep Learning

Uploaded by

Kala Hariharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344056683

Classification of Diabetes using Deep Learning

Conference Paper · July 2020


DOI: 10.1109/ICCSP48568.2020.9182293

CITATIONS READS

26 690

4 authors:

Santosh Kumar Bharat Bhushan


Siksha O Anusandhan University Sharda University
29 PUBLICATIONS 366 CITATIONS 138 PUBLICATIONS 4,149 CITATIONS

SEE PROFILE SEE PROFILE

Debabrata Singh Dilip Kumar Choubey


Siksha O Anusandhan Deemed to be University Indian Institute of Information Technology Bhagalpur
89 PUBLICATIONS 674 CITATIONS 76 PUBLICATIONS 884 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Debabrata Singh on 25 September 2021.

The user has requested enhancement of the downloaded file.


International Conference on Communication and Signal Processing, July 28 - 30, 2020, India

Classification of Diabetes using Deep Learning


Santosh Kumar, Bharat Bhusan, Debabrata Singh and Dilip kumar Choubey

Abstract—Deep Learning (DL) is a research area that has This paper explores the use of DL for the classification
flourished significantly in recent years and has shown remarkable of medical anomalies: Diabetes. We have developed a Multi-
potential for artificial intelligence in the field of medical applica- Layer Feed Forward Neural Network (MLFNN) based predic-
tions. We have implemented the DL algorithm for the diabetes tion technique for classification of diabetes, and the dataset
classification. This paper applied the Multi-Layer Feed Forward
Neural Networks (MLFNN) for the diabetes classification on the that we have used is Pima Indian Diabetes (PID) dataset. One
Pima Indian Diabetes datasets. Furthermore, various activation of the constraints in the PID dataset is missing values. We
functions, learning algorithms, and techniques to handle missing have experimented with various techniques used for handling
values are considered to enhance the classification accuracy of missing values. Furthermore, the activation function plays a
the diabetes dataset. Finally, the outcomes of experiments are significant role in Neural Network. We have illustrated two
compared with two machine learning algorithms like Nave Bayes
and Random Forest. The achieved classification accuracy by
different kinds of activation functions and their efficiency is
MLFNN (84.17%) is the best of all the other classifiers. reported by comparative analysis.
Index Terms—Machine Learning, Deep Learning, MLFNN, The rest of paper is illustrated in respective manner: The
Diabetes dataset basics and background of deep learning techniques is de-
scribed in section II.Section III focuses on Literature survey
I. I NTRODUCTION while Data Pre-processing methods and Experimental setup
Detecting a medical anomaly is usually considered to be a are explained in section IV and V respectively. Section VI
complex task and comes under the domain of medical experts illustrates the results and paper ends with conclusive remarks
and physicians. Classification of diabetes is such pathological in SectionVII.
II. BACKGROUND
cases, which usually requires many physicians with a wide
range of experience in the respective domains. Usually, a high DL uses feature hierarchy where each layer trains on the
amount of glucose in the blood affects the major functioning distinct set of features that are provided to it as a previous
organs of the human body and that leads to the causes of layer output. The deep layers can recognize the sophisticated
kidney damage and heart stroke [1]. In general, diabetes features in the data as they aggregate and then again recombine
patients are classified based on the pathological test. Thus, the features they receive from the last layer. This makes
for classifying if a person is diabetic or non-diabetic is a DL algorithms suitable for handling the large dataset. DL
very complex task that needs high-level skills and expertise algorithms can discover the pattern and structures within the
knowledge. datasets which are not either categorized or have any structure.
Deep learning (DL), which is a sub-domain of Artificial That makes DL algorithms potent tools in today’s world as the
Intelligence (AI), has been an active area of research in this majority of the available databases are either unstructured or
sector. DL can save time for physicians and provide balanced, unlabeled.
repeatable results by automating the manual processes used A. MLFNN
by them. Additionally, there is a significant amount of data The MLFNN is a fully connected network consisting of an
in medicine combination of which with a small sample size input layer, one or more hidden layer(s), and the output layer
of pathological cases makes essential use of deep learning [3]. The general MLFNN network is illustrated in Fig. 1.
techniques for diagnosing and classifying the disease [2]. The internal layers are called hidden layers because they are
hidden from the outside world. They receive input from the
internal processing units, process them and send the output to
Santosh Kumar and Debabrata Singh are with the CSIT Dept., ITER, SOA internal processing units in the next hidden layer. The weight
University, Bhubaneswar, Odisha, India coefficient characterizes the connection between two neurons.
(email: [email protected] [email protected]).
Bharat Bhusan is with the CSE Dept, HMRITM, GGSIP University, New
This weight reacts to the importance of the connection in the
Delhi, India (email: [email protected]) neural network. The activation function determines the output
Dilip kumar Choubey is with the School of CSE, VIT University, Vellore, of the neurons [4].
India (email: [email protected])
B. Backpropagation Algorithm

978-1-7281-4988-2/20/$31.00 ©2020 IEEE 0651

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
In this subsection, we go through the concept of back- zero mean and unit variance. It propagates through many
propagation. When artificial neural networks were first im- layers even in the presence of noise and perturbation. In this
way, SNN allows training deep neural networks with many
plemented, they were very slow to train and do not produce
layers by employing this powerful regularization technique and
effective results. To sort out this problem, the concept of making the learning highly robust. If some activation is not
backpropagation was introduced. The backpropagation algo- close to zero, there is an upper and lower bound in the variance
rithm comprises of two phases: propagation and the weight which makes it impossible to counter the dying and exploding
update phases. When a neural network receives an input, gradients problems.

III. L ITERATURE SURVEY

There is significant research that is going on in the ar-


eas of developing machine learning algorithms for medical
applications. All of these studies use a diabetes database
either the PID dataset or some other database. The author in
[6] used Parkinson’s Dataset for classification of Parkinson’s
disease (PD) using MLFNN with a backpropagation algorithm.
The performance metrics used in the paper are sensitivity,
specificity, and accuracy. They achieved 83.3% for sensitivity,
Fig. 1. MLFNN Architecture
63.3% for specificity, and 80% for accuracy in diagnosing and
there is generally a random number weight that is assigned in detection of PD using an MLFNN.
the hidden layers. The neural network produces some output The authors in [7] used a collected database from the
based on these weights. This is called the propagation phase. Cleveland database from the UCI repository to build a heart
This output is then compared with the original output using disease diagnosis system for classifying two cases of heart
loss function and the error is calculated. This error is then conditions (Normal, Abnormal). They proposed two classi-
propagated backward, and the weights are again updated in fiers: MLP and Support Vector Machine (SVM) on the dataset
such a way that this error is reduced in the next round. This consisting of thirteen medical factors to diagnose heart disease.
updating of weight to reduce the error is called a weight update The MLP classifier achieved 98% accuracy when evaluated on
phase. These rounds are repeated until the error rate reduces the collected database whereas SVM attained an accuracy of
to minimum values. 96%.
C. Activation Functions The authors in [8] used the OASIS dataset to develop
Alzheimer’s detection systems from magnetic resonance im-
Activation functions play a vital role while designing a ages. The classification system they proposed was based
neural network. To understand the activation function we on components: wavelet entropy (WE), and multi-layer per-
have to understand what artificial neurons do. It evaluates the ceptron (MLP). Their approach gave 92.40% accuracy, and
weighted sum of all the inputs it receives and adds bias and 92.14% sensitivity and 92.47% specificity.
then decides whether it should fire or not. So, this value can
range between + ∞ to - ∞. Thus, activation functions bound This paper [9] used the General Regression Neural Network
this value produced by neurons and decide if the external (GRNN) on the PID dataset for the classification of diabetes.
connection is fired or not. Below, we introduce the two kinds In the proposed model four layers are assumed to be associated
of activation unit: scaled exponential linear unit (SELU) and as one input, two hidden, and one output layer respectively.
exponential linear unit (ELU). The enlisted accuracy of this model for training and testing
ELU - Exponential  Linear Unit [4] with 0 < α is phases are 82.99% and 80.21%, respectively.
α(exp(x) − 1) f or x < 0 The author [10] proposed the Probability Neural Network
f (x) = (1)
x f or x≥0 (PNN) for the diabetes prediction. This model consists of one
The hyper parameter in ELU controls the saturation of the input layer with 8 neurons, one hidden layer and an output
negative inputs. ELU makes the learning process faster in deep layer with 2 neurons used to predict whether a person has
neural networks as compared with other activation units. ELU diabetes or not. The maximum recorded accuracy of this model
does not undergo the dying gradient problem as it has negative is 81.49% and 89.56% respectively for training and testing
values that allow the mean activation to move closer to zero phases.
as can be seen in batch normalization. It does so at a lower In this paper [11] author presented the Nave Bayes, Radial
computational cost. Based Artificial Neural Network, and J48 for the diagnosis of
SELU - The SELU activation function [5] is specified by:
 diabetes. The achieved accuracy is 76.95%,76.5%, and 74.34%
α(exp(x) − 1) f or x < 0 respectively for Nave Bayes, J48, and RBF classifiers.
f (x) = λ (2)
x f or x≥0 The author [12] presented a novel method for diabetes
SELU allows creating a mapping with properties that direct diagnosis that works in two phases. In the first phase several
to Self-Normalizing Neural Networks (SNNs). In SNNs the machine learning techniques applied to the UCI PID dataset
activation of the neuron automatically converges toward the and localized diabetes dataset while PCA and PSO used

0652

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
subsequently for feature selection in the second phase. Results M inM axScaler(f eaturerange = (0; 1); copy = T rue)
show the superiority of the proposed model over the traditional (3)
classification techniques.
It transforms all the features by scaling them in to a given
range. The transformation is given by the following equations
TABLE I
D ESCRIPTION OF PID DATASET 4 and 5.
Features Values X − X.min
Xstd = (4)
Number of times pregnant Numerical Values X.max − X.min
Plasma Glucose Concentration Numerical Values
Diastolic Blood Pressure Numerical Values (in mmHg)
Triceps skin fold thickness Numerical Values (in mm) Xscaled = X.std ∗ max − min + min (5)
2-Hour Serum insulin Numerical Values (in U/ml)
Body mass index Numerical Values (in kg=m2)
Diabetes pedigree function Numerical Values
Age Numerical Values (in years) The PID dataset suffers from a lot of missing values. They
can impact the classification accuracy of the neural network
model. So, to handle the missing data we have used three
different techniques. We first removed all the rows which have
TABLE II
M ISSING VALUE IN PID DATASET some missing attributes. In second, we replaced all the missing
values with zero and once replaced with the mean of the other
No. Features Values values in the attribute subsequently in the third technique.
1 Number of times pregnant -
2 Plasma Glucose Concentration 5
3 Diastolic Blood Pressure 35
V. E XPERIMENTAL S ETUP AND R ESULTS
4 Triceps skin fold thickness 227 In this section, we discuss the network architecture and
5 2-Hour Serum insulin 374
design consideration of MLFNN for the classification of
6 Body mass index 11
7 Diabetes pedigree function 1 diabetes that is considered for the experiments. Firstly, we
8 Age 63 present the network architecture that is used. In the second
part, we discuss the evaluation metrics that are used.
In this paper, we have experimented on the PID dataset that A. Network Architecture
is taken from the UCI Machine learning repository [13]. From
the dataset, eight relevant features are considered where the We have considered a feed-forward neural network for
total number of diabetes and non-diabetes cases is 268 and PID diagnosis. Figure 2 illustrates the proposed architecture
500 respectively. The features of the dataset have numerical for diabetes diagnosis. The recommended architecture of the
values as shown in Table I. There are a lot of missing values proposed model consists of layers in depth and width manner.
in the dataset as illustrated in Table II. The depth of the network defines the number of a hidden
layer associated with the network while the width corresponds
IV. DATA P RE - PROCESSING to the total number of neurons in the layers. The single-layer
The process of training in the neural network can be made feed-forward neural network is able to perform any operations
more effective by applying some pre-processing techniques in terms of learning but the problem with such network
on the network inputs before feeding them into the system. In is poor learning rate and correctness. In contrast, a deep
the PID dataset, we first checked if there are any correlated learning model can reduce the number of neurons per layer and
features. Correlation can be defined as a measure of how corresponding generalization errors. Another important design
strongly one input feature depends on another. By removing consideration is the number of connections between neurons.
any correlated features, we can increase the speed of learning The efficiency of the network depends upon the number of
of an algorithm. It can also reduce the bias in the neural connections, parameters, and the number of neurons. If the
network. A random forest can show a decline in performance number of connections decreases, then a number of parameters
if there is any correlation bias [14]. The normalization can also decrease with the computation time of networks. The
process the data to be appropriate for the training process. design of networks and architectural issues are fully problem-
In this process, the data is scaled in some specific range for dependent. The final architecture is generally considered after
every input feature to reduce the bias in the neural network. several experiments and recorded performance in hit and trial
Furthermore, it speeds up training time by initiating the manner. We have considered 10 specific neurons in the input
training process for each feature within the same scale. It is layer, three in hidden layer with 60 neurons each and one
very effective when the dissimilarity between the two input output layer with a single neuron in our MLFNN architecture.
features is on a very large scale. On the PID dataset, we For the MLFNN model evaluation, we have used repeated
have used MinMaxScaler for the process of normalization. It holdout validation technique which is also known as Monte
transforms all the features by scaling them into a given range. Carlo Cross-Validation. We have split the PID dataset into 90%
The equation 3 provides the formula for how normalization training and 10% testing. Then we repeated this experiment
converts the values. 200 times with different seed and then compute the average

0653

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. MLFNN Architecture for Diabetes

TABLE III
C OMPARATIVE ANALYSIS OF VARIOUS CLASSIFIERS

NB RF MLFNN
Training Accuracy% 77.56 97.42 83.46
Testing Accuracy% 70.08 73.25 81.73 Fig. 3. Figure showing change in the Loss per epoch after imputing the
missing values with mean. The figure shows one example out of 200 randomly
selected test sets.

performance. In this way, every time the model is evaluated on


the randomly selected test set. It gives us a better idea about
the model stability and how well it can perform on random
test data.

VI. R ESULTS AND D ISCUSSIONS


This section comprises several experiments with PID
datasets. First, we have developed the classification model
based on MLFNN and comparing the performance with
various machine learning classifiers. Furthermore, we have
compared the results with several missing value handling tech-
niques. At last, we have analyzed the classification accuracy
of the MLFNN model with different activation functions.

A. Comparison with machine learning classifiers


In this section, we have evaluated the performance of Naive
Bayes (NB), Random Forest (RF) classifier with our proposed
MLFNN model for the classification of diabetes. We first
discuss the classifiers we have used for evaluation and then Fig. 4. Figure showing change in the Loss per epoch SELU activation
function. The figure shows one example out of 200 randomly selected test
sets.
TABLE IV
P ERFORMANCE OF VARIOUS APPROACHES TO HANDLE MISSING VALUES
compare the results with the MLFNN model proposed by us.
Remove samples Replace(mean) Replace(zero) We have achieved the training accuracy of 83.46% and the
Training Ac- 85.32 83.46 84.28
curacy% testing accuracy of 81.73% on the PID dataset for the clas-
Testing Ac- 84.17 81.73 80.38 sification of diabetes. This result is achieved by imputing the
curacy% missing values with the mean. Naive Bayes (NB) classifier is a
very straightforward and robust algorithm for the classification
task [15]. For the NB classifier, we have achieved the training
TABLE V
P ERFORMANCE C OMPARISON WITH VARIOUS ACTIVATION FUNCTION accuracy of 77.56% and the testing accuracy of 70.08%.Table
III shows the classification report of NB classifier.
ELU SELU Random Forest (RF) algorithm is a type of supervised
Training Accuracy% 87.54 90.21
Testing Accuracy% 86.44 84.34 classification algorithm [16]. RF creates a forest with many
trees. There is a direct relationship between the number of

0654

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
trees in the forest and the results it can get. The higher Moreover, several activation functions such as ELU and SELU
the number of trees, the more accurate the result. The third has been taken, which plays a vital role in solving a problem
advantage is RF classifier can handle missing values, and the in the neural network. Furthermore, we have conducted a com-
last advantage is that the RF classifier can be modeled for parative study for the high-performance activation function
categorical values. There are two stages in the RF algorithm; units, where ELU proved as a better choice for the PID dataset.
in the first stage, the creation of the random forest, the other R EFERENCES
is to predict the random forest classifier built in the first stage. [1] K. Mehdi, S. Eftekhari, and J. Parvizian, “Diagnosing diabetes type II
For the RF classifier, we have achieved the training accuracy using a soft intelligent binary classification model,” Review of Bioinfor-
of 97.42% and the testing accuracy of 73.25%. Table III shows matics and Biometrics, 1.1, 9-23, Dec. 2012.
[2] K. Gnter, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing
the classification report of RF classifier. Table III shows the neural networks,” In Advances in neural information processing systems,
superiority of the MLFNN model in terms of classification pp. 971-980, 2017.
accuracy compared with the Nave Bayes and Random forest [3] S. Daniel, V. Kvasnicka, and J. Pospichal, “Introduction to multi-layer
feed-forward neural networks,” Chemometrics and intelligent laboratory
classifiers. The Random forest classifier succeeded to achieve systems, 39.1,43-62,1997.
the highest accuracy (97.42%) during the training, but the
[4] C. Djork-Arn, T. Unterthiner, and S. Hochreiter, “Fast and accurate
classification accuracy declined up to 73.25% in a testing deep network learning by exponential linear units (elus),” arXiv preprint
phase. On the other hand, Nave Bayes shows the steady arXiv,1511.07289, 2015.
performance and achieves 70.08% accuracy in the testing [5] K. Kamer, and T. Yildirim, “Medical diagnosis on Pima Indian diabetes
using general regression neural networks,” Proceedings of the interna-
phase. tional conference on artificial neural networks and neural information
processing, ICANN/ICONIP, Vol. 181, 2003.
B. Handling Missing Data [6] O. R. Funke, et al, “Application of neural networks in early detection and
diagnosis of Parkinson’s disease,” International Conference on Cyber
In this experiment, we have used mean value approaches and IT Service Management, CITSM, IEEE, 2014.
[7] H. Tabreer T., M. H. Jasim, and I. A. Hashim, “Heart Disease Diagnosis
to handle the missing data. We have replaced the missing System based on Multi-Layer Perceptron neural network and Support
data of an attribute by the mean value. Table IV shows the Vector Machine,” Int J Curr Eng Technol 77, no. 55, pp.2277-4106,2017.
classification accuracy by all the three techniques. Among [8] W. Shui-Hua, et al., “Single slice based detection for Alzheimers disease
via wavelet entropy and multilayer perceptron trained by biogeography-
the three approaches, remove samples (84.17%) is superior based optimization,” Multimedia Tools and Applications 77, no.9,
to other techniques. The reason behind this performance is pp.10393-10417, 2018.
that sometimes replacing values makes it hard for the neural [9] D. Harris, C. JC Burges, L. Kaufman, A. J. Smola, and V. Vapnik,
“Support vector regression machines,” In Advances in neural information
network to appropriately adjust the impact of certain attributes processing systems, pp. 155-161, 1997.
which usually plays a vital role in solving the problem. [10] K. Kamer, and T. Yildirim, “Medical diagnosis on Pima Indian diabetes
using general regression neural networks,” Proceedings of the interna-
In Fig. 3 it can be seen that loss on testing data becomes tional conference on artificial neural networks and neural information
processing,(ICANN/ICONIP), Vol. 181, 2003.
stable after 100 epochs and remain almost the same even after [11] S. Zahed, and A. Jafarian, “A new artificial neural networks approach for
500 epochs. Fig. 4 shows that the loss in the testing set for diagnosing diabetes disease type II,” International Journal of Advanced
the SELU activation function becomes almost stable after 200 Computer Science and Applications 7, vol. 6, pp.89-94, 2016.
[12] C. D. Kumar, et al., “Performance evaluation of classification methods
epochs which is reported in Table V. with PCA and PSO for diabetes,” Network Modeling Analysis in Health
Informatics and Bioinformatics 9, no. 1, p. 5,2020.
[13] Pima indian dataset, https://archive.ics.uci.edu/ml/machine-
VII. C ONCLUSION learningdatabases/pima-indians-diabetes, 2017.
[14] T. Laura, and T. Lengauer, “Classification with correlated features:
This paper has focused on implementing Deep learning unreliability of feature ranking and solutions,” ioinformatics 27, no. 14,
pp. 1986-1994, 2011.
algorithms in the medical domain. We have built Deep learning [15] M. Kevin P., “Naive bayes classifiers,” University of British Columbia
models for the classification of diabetes. In the experiment, 18, p.60, 2006.
various missing values handling techniques are carried out [16] P. Mahesh, “Random forest classifier for remote sensing classification,”
International Journal of Remote Sensing 26, no.1,pp.217-222,2005.
where remove samples approach outperformed in terms of
accuracy over replacing the instances with zero and mean.

0655

Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like