Classification of Diabetes Using Deep Learning
Classification of Diabetes Using Deep Learning
net/publication/344056683
CITATIONS READS
26 690
4 authors:
All content following this page was uploaded by Debabrata Singh on 25 September 2021.
Abstract—Deep Learning (DL) is a research area that has This paper explores the use of DL for the classification
flourished significantly in recent years and has shown remarkable of medical anomalies: Diabetes. We have developed a Multi-
potential for artificial intelligence in the field of medical applica- Layer Feed Forward Neural Network (MLFNN) based predic-
tions. We have implemented the DL algorithm for the diabetes tion technique for classification of diabetes, and the dataset
classification. This paper applied the Multi-Layer Feed Forward
Neural Networks (MLFNN) for the diabetes classification on the that we have used is Pima Indian Diabetes (PID) dataset. One
Pima Indian Diabetes datasets. Furthermore, various activation of the constraints in the PID dataset is missing values. We
functions, learning algorithms, and techniques to handle missing have experimented with various techniques used for handling
values are considered to enhance the classification accuracy of missing values. Furthermore, the activation function plays a
the diabetes dataset. Finally, the outcomes of experiments are significant role in Neural Network. We have illustrated two
compared with two machine learning algorithms like Nave Bayes
and Random Forest. The achieved classification accuracy by
different kinds of activation functions and their efficiency is
MLFNN (84.17%) is the best of all the other classifiers. reported by comparative analysis.
Index Terms—Machine Learning, Deep Learning, MLFNN, The rest of paper is illustrated in respective manner: The
Diabetes dataset basics and background of deep learning techniques is de-
scribed in section II.Section III focuses on Literature survey
I. I NTRODUCTION while Data Pre-processing methods and Experimental setup
Detecting a medical anomaly is usually considered to be a are explained in section IV and V respectively. Section VI
complex task and comes under the domain of medical experts illustrates the results and paper ends with conclusive remarks
and physicians. Classification of diabetes is such pathological in SectionVII.
II. BACKGROUND
cases, which usually requires many physicians with a wide
range of experience in the respective domains. Usually, a high DL uses feature hierarchy where each layer trains on the
amount of glucose in the blood affects the major functioning distinct set of features that are provided to it as a previous
organs of the human body and that leads to the causes of layer output. The deep layers can recognize the sophisticated
kidney damage and heart stroke [1]. In general, diabetes features in the data as they aggregate and then again recombine
patients are classified based on the pathological test. Thus, the features they receive from the last layer. This makes
for classifying if a person is diabetic or non-diabetic is a DL algorithms suitable for handling the large dataset. DL
very complex task that needs high-level skills and expertise algorithms can discover the pattern and structures within the
knowledge. datasets which are not either categorized or have any structure.
Deep learning (DL), which is a sub-domain of Artificial That makes DL algorithms potent tools in today’s world as the
Intelligence (AI), has been an active area of research in this majority of the available databases are either unstructured or
sector. DL can save time for physicians and provide balanced, unlabeled.
repeatable results by automating the manual processes used A. MLFNN
by them. Additionally, there is a significant amount of data The MLFNN is a fully connected network consisting of an
in medicine combination of which with a small sample size input layer, one or more hidden layer(s), and the output layer
of pathological cases makes essential use of deep learning [3]. The general MLFNN network is illustrated in Fig. 1.
techniques for diagnosing and classifying the disease [2]. The internal layers are called hidden layers because they are
hidden from the outside world. They receive input from the
internal processing units, process them and send the output to
Santosh Kumar and Debabrata Singh are with the CSIT Dept., ITER, SOA internal processing units in the next hidden layer. The weight
University, Bhubaneswar, Odisha, India coefficient characterizes the connection between two neurons.
(email: [email protected] [email protected]).
Bharat Bhusan is with the CSE Dept, HMRITM, GGSIP University, New
This weight reacts to the importance of the connection in the
Delhi, India (email: [email protected]) neural network. The activation function determines the output
Dilip kumar Choubey is with the School of CSE, VIT University, Vellore, of the neurons [4].
India (email: [email protected])
B. Backpropagation Algorithm
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
In this subsection, we go through the concept of back- zero mean and unit variance. It propagates through many
propagation. When artificial neural networks were first im- layers even in the presence of noise and perturbation. In this
way, SNN allows training deep neural networks with many
plemented, they were very slow to train and do not produce
layers by employing this powerful regularization technique and
effective results. To sort out this problem, the concept of making the learning highly robust. If some activation is not
backpropagation was introduced. The backpropagation algo- close to zero, there is an upper and lower bound in the variance
rithm comprises of two phases: propagation and the weight which makes it impossible to counter the dying and exploding
update phases. When a neural network receives an input, gradients problems.
0652
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
subsequently for feature selection in the second phase. Results M inM axScaler(f eaturerange = (0; 1); copy = T rue)
show the superiority of the proposed model over the traditional (3)
classification techniques.
It transforms all the features by scaling them in to a given
range. The transformation is given by the following equations
TABLE I
D ESCRIPTION OF PID DATASET 4 and 5.
Features Values X − X.min
Xstd = (4)
Number of times pregnant Numerical Values X.max − X.min
Plasma Glucose Concentration Numerical Values
Diastolic Blood Pressure Numerical Values (in mmHg)
Triceps skin fold thickness Numerical Values (in mm) Xscaled = X.std ∗ max − min + min (5)
2-Hour Serum insulin Numerical Values (in U/ml)
Body mass index Numerical Values (in kg=m2)
Diabetes pedigree function Numerical Values
Age Numerical Values (in years) The PID dataset suffers from a lot of missing values. They
can impact the classification accuracy of the neural network
model. So, to handle the missing data we have used three
different techniques. We first removed all the rows which have
TABLE II
M ISSING VALUE IN PID DATASET some missing attributes. In second, we replaced all the missing
values with zero and once replaced with the mean of the other
No. Features Values values in the attribute subsequently in the third technique.
1 Number of times pregnant -
2 Plasma Glucose Concentration 5
3 Diastolic Blood Pressure 35
V. E XPERIMENTAL S ETUP AND R ESULTS
4 Triceps skin fold thickness 227 In this section, we discuss the network architecture and
5 2-Hour Serum insulin 374
design consideration of MLFNN for the classification of
6 Body mass index 11
7 Diabetes pedigree function 1 diabetes that is considered for the experiments. Firstly, we
8 Age 63 present the network architecture that is used. In the second
part, we discuss the evaluation metrics that are used.
In this paper, we have experimented on the PID dataset that A. Network Architecture
is taken from the UCI Machine learning repository [13]. From
the dataset, eight relevant features are considered where the We have considered a feed-forward neural network for
total number of diabetes and non-diabetes cases is 268 and PID diagnosis. Figure 2 illustrates the proposed architecture
500 respectively. The features of the dataset have numerical for diabetes diagnosis. The recommended architecture of the
values as shown in Table I. There are a lot of missing values proposed model consists of layers in depth and width manner.
in the dataset as illustrated in Table II. The depth of the network defines the number of a hidden
layer associated with the network while the width corresponds
IV. DATA P RE - PROCESSING to the total number of neurons in the layers. The single-layer
The process of training in the neural network can be made feed-forward neural network is able to perform any operations
more effective by applying some pre-processing techniques in terms of learning but the problem with such network
on the network inputs before feeding them into the system. In is poor learning rate and correctness. In contrast, a deep
the PID dataset, we first checked if there are any correlated learning model can reduce the number of neurons per layer and
features. Correlation can be defined as a measure of how corresponding generalization errors. Another important design
strongly one input feature depends on another. By removing consideration is the number of connections between neurons.
any correlated features, we can increase the speed of learning The efficiency of the network depends upon the number of
of an algorithm. It can also reduce the bias in the neural connections, parameters, and the number of neurons. If the
network. A random forest can show a decline in performance number of connections decreases, then a number of parameters
if there is any correlation bias [14]. The normalization can also decrease with the computation time of networks. The
process the data to be appropriate for the training process. design of networks and architectural issues are fully problem-
In this process, the data is scaled in some specific range for dependent. The final architecture is generally considered after
every input feature to reduce the bias in the neural network. several experiments and recorded performance in hit and trial
Furthermore, it speeds up training time by initiating the manner. We have considered 10 specific neurons in the input
training process for each feature within the same scale. It is layer, three in hidden layer with 60 neurons each and one
very effective when the dissimilarity between the two input output layer with a single neuron in our MLFNN architecture.
features is on a very large scale. On the PID dataset, we For the MLFNN model evaluation, we have used repeated
have used MinMaxScaler for the process of normalization. It holdout validation technique which is also known as Monte
transforms all the features by scaling them into a given range. Carlo Cross-Validation. We have split the PID dataset into 90%
The equation 3 provides the formula for how normalization training and 10% testing. Then we repeated this experiment
converts the values. 200 times with different seed and then compute the average
0653
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. MLFNN Architecture for Diabetes
TABLE III
C OMPARATIVE ANALYSIS OF VARIOUS CLASSIFIERS
NB RF MLFNN
Training Accuracy% 77.56 97.42 83.46
Testing Accuracy% 70.08 73.25 81.73 Fig. 3. Figure showing change in the Loss per epoch after imputing the
missing values with mean. The figure shows one example out of 200 randomly
selected test sets.
0654
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
trees in the forest and the results it can get. The higher Moreover, several activation functions such as ELU and SELU
the number of trees, the more accurate the result. The third has been taken, which plays a vital role in solving a problem
advantage is RF classifier can handle missing values, and the in the neural network. Furthermore, we have conducted a com-
last advantage is that the RF classifier can be modeled for parative study for the high-performance activation function
categorical values. There are two stages in the RF algorithm; units, where ELU proved as a better choice for the PID dataset.
in the first stage, the creation of the random forest, the other R EFERENCES
is to predict the random forest classifier built in the first stage. [1] K. Mehdi, S. Eftekhari, and J. Parvizian, “Diagnosing diabetes type II
For the RF classifier, we have achieved the training accuracy using a soft intelligent binary classification model,” Review of Bioinfor-
of 97.42% and the testing accuracy of 73.25%. Table III shows matics and Biometrics, 1.1, 9-23, Dec. 2012.
[2] K. Gnter, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing
the classification report of RF classifier. Table III shows the neural networks,” In Advances in neural information processing systems,
superiority of the MLFNN model in terms of classification pp. 971-980, 2017.
accuracy compared with the Nave Bayes and Random forest [3] S. Daniel, V. Kvasnicka, and J. Pospichal, “Introduction to multi-layer
feed-forward neural networks,” Chemometrics and intelligent laboratory
classifiers. The Random forest classifier succeeded to achieve systems, 39.1,43-62,1997.
the highest accuracy (97.42%) during the training, but the
[4] C. Djork-Arn, T. Unterthiner, and S. Hochreiter, “Fast and accurate
classification accuracy declined up to 73.25% in a testing deep network learning by exponential linear units (elus),” arXiv preprint
phase. On the other hand, Nave Bayes shows the steady arXiv,1511.07289, 2015.
performance and achieves 70.08% accuracy in the testing [5] K. Kamer, and T. Yildirim, “Medical diagnosis on Pima Indian diabetes
using general regression neural networks,” Proceedings of the interna-
phase. tional conference on artificial neural networks and neural information
processing, ICANN/ICONIP, Vol. 181, 2003.
B. Handling Missing Data [6] O. R. Funke, et al, “Application of neural networks in early detection and
diagnosis of Parkinson’s disease,” International Conference on Cyber
In this experiment, we have used mean value approaches and IT Service Management, CITSM, IEEE, 2014.
[7] H. Tabreer T., M. H. Jasim, and I. A. Hashim, “Heart Disease Diagnosis
to handle the missing data. We have replaced the missing System based on Multi-Layer Perceptron neural network and Support
data of an attribute by the mean value. Table IV shows the Vector Machine,” Int J Curr Eng Technol 77, no. 55, pp.2277-4106,2017.
classification accuracy by all the three techniques. Among [8] W. Shui-Hua, et al., “Single slice based detection for Alzheimers disease
via wavelet entropy and multilayer perceptron trained by biogeography-
the three approaches, remove samples (84.17%) is superior based optimization,” Multimedia Tools and Applications 77, no.9,
to other techniques. The reason behind this performance is pp.10393-10417, 2018.
that sometimes replacing values makes it hard for the neural [9] D. Harris, C. JC Burges, L. Kaufman, A. J. Smola, and V. Vapnik,
“Support vector regression machines,” In Advances in neural information
network to appropriately adjust the impact of certain attributes processing systems, pp. 155-161, 1997.
which usually plays a vital role in solving the problem. [10] K. Kamer, and T. Yildirim, “Medical diagnosis on Pima Indian diabetes
using general regression neural networks,” Proceedings of the interna-
In Fig. 3 it can be seen that loss on testing data becomes tional conference on artificial neural networks and neural information
processing,(ICANN/ICONIP), Vol. 181, 2003.
stable after 100 epochs and remain almost the same even after [11] S. Zahed, and A. Jafarian, “A new artificial neural networks approach for
500 epochs. Fig. 4 shows that the loss in the testing set for diagnosing diabetes disease type II,” International Journal of Advanced
the SELU activation function becomes almost stable after 200 Computer Science and Applications 7, vol. 6, pp.89-94, 2016.
[12] C. D. Kumar, et al., “Performance evaluation of classification methods
epochs which is reported in Table V. with PCA and PSO for diabetes,” Network Modeling Analysis in Health
Informatics and Bioinformatics 9, no. 1, p. 5,2020.
[13] Pima indian dataset, https://archive.ics.uci.edu/ml/machine-
VII. C ONCLUSION learningdatabases/pima-indians-diabetes, 2017.
[14] T. Laura, and T. Lengauer, “Classification with correlated features:
This paper has focused on implementing Deep learning unreliability of feature ranking and solutions,” ioinformatics 27, no. 14,
pp. 1986-1994, 2011.
algorithms in the medical domain. We have built Deep learning [15] M. Kevin P., “Naive bayes classifiers,” University of British Columbia
models for the classification of diabetes. In the experiment, 18, p.60, 2006.
various missing values handling techniques are carried out [16] P. Mahesh, “Random forest classifier for remote sensing classification,”
International Journal of Remote Sensing 26, no.1,pp.217-222,2005.
where remove samples approach outperformed in terms of
accuracy over replacing the instances with zero and mean.
0655
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on September 25,2021 at 11:06:56 UTC from IEEE Xplore. Restrictions apply.
View publication stats