0% found this document useful (0 votes)
52 views17 pages

IEEE Access 2021 - Diabetic Retinopathy Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

Uploaded by

Nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views17 pages

IEEE Access 2021 - Diabetic Retinopathy Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

Uploaded by

Nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received July 14, 2021, accepted July 27, 2021, date of publication July 28, 2021, date of current

version August 9, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3101142

Diabetic Retinopathy Diagnosis From Fundus


Images Using Stacked Generalization
of Deep Models
HARSHIT KAUSHIK 1 , DILBAG SINGH 2, (Member, IEEE), MANJIT KAUR 2 , (Member, IEEE),
HAMMAM ALSHAZLY 3 , ATEF ZAGUIA 4, AND HABIB HAMAM 5 , (Senior Member, IEEE)
1 School of Computing and Information Technology, Manipal University Jaipur, Jaipur, Rajasthan 303007, India
2 School of Engineering and Applied Sciences, Bennett University, Greater Noida 201310, India
3 Department of Computer Science, Faculty of Computers and Information, South Valley University, Qena 83523, Egypt
4 Department of Computer Science, College of Computers and Information Technology, Taif University, Taif 21944, Saudi Arabia
5 Faculty of Engineering, Moncton University, Moncton, NB E1A3E9, Canada

Corresponding author: Manjit Kaur ([email protected])


This work was supported by the Taif University Researchers Supporting Project, Taif University, Taif, Saudi Arabia, under Grant
TURSP-2020/114.

ABSTRACT Diabetic retinopathy (DR) is a diabetes complication that affects the eye and can cause damage
from mild vision problems to complete blindness. It has been observed that the eye fundus images show
various kinds of color aberrations and irrelevant illuminations, which degrade the diagnostic analysis and
may hinder the results. In this research, we present a methodology to eliminate these unnecessary reflectance
properties of the images using a novel image processing schema and a stacked deep learning technique for
the diagnosis. For the luminosity normalization of the image, the gray world color constancy algorithm is
implemented which does image desaturation and improves the overall image quality. The effectiveness of the
proposed image enhancement technique is evaluated based on the peak signal to noise ratio (PSNR) and mean
squared error (MSE) of the normalized image. To develop a deep learning based computer-aided diagnostic
system, we present a novel methodology of stacked generalization of convolution neural networks (CNN).
Three custom CNN model weights are fed on the top of a single meta-learner classifier, which combines
the most optimum weights of the three sub-neural networks to obtain superior metrics of evaluation and
robust prediction results. The proposed stacked model reports an overall test accuracy of 97.92% (binary
classification) and 87.45% (multi-class classification). Extensive experimental results in terms of accuracy,
F-measure, sensitivity, specificity, recall and precision reveal that the proposed methodology of illumination
normalization greatly facilitated the deep learning model and yields better results than various state-of-art
techniques.

INDEX TERMS Convolutional neural networks, diabetic retinopathy, early diagnosis, fundus images, gray
world algorithm, ensemble learning.

I. INTRODUCTION that 40% − 45% of diabetic patients are likely to have DR in


Diabetic retinopathy (DR) is a medical condition that their life, but due to lack of knowledge and delayed diagnosis,
is caused by the damage to the blood vessels of the the condition escalates quickly [2].
light-sensitive tissue at the back of the eye (retina), which can The Early Treatment DR Study Research Group (ETDRS)
eventually cause complete blindness and various other eye has shown that if DR is correctly diagnosed on time, it may
problems depending on the severity of the disease. Though reduce the chances of vision loss by 50% [3]. The preva-
the treatment is available, it is estimated that numerous people lence of DR is maximum i.e., 25.04% in the people who
go blind every day because of this disease [1]. It is observed fall in the age bracket of 61-80 [4]. Till now retinal images
are manually assessed by ophthalmologists and clinicians
The associate editor coordinating the review of this manuscript and for predicting DR after the eye fundoscopic exam and to
approving it for publication was Amin Zehtabian . analyze signs such as cotton wool spots, retinal swellings,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
108276 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

and hemorrhages [2]. However, it is usually observed that • Scaling factor is an important step in color correction
during image the acquisition process, the fundus images show technique such as gray world in our case, therefore,
various kinds of irrelevant illuminations, non-uniform light the color channel with minimum mean is considered as
distribution, blurred or darkened candidate regions, which a reference to calculate the gray world illuminant.
subsequently affect the diagnostic process and result in biased • To automate the diagnostic process and to make predic-
predictions [5]. To detect DR, it is essential to obtain results tions using the desaturated images, a stacked general-
with high precision irrespective of any bias to avoid a wrong ization of three custom CNNs is developed, which is
judgment that may lead to a serious problem or in some fed into a single meta-learner to extract the most opti-
cases, even permanent blindness. During the fundoscopic mum weights from the sub-networks to achieve better
test, if the obtained image is highly saturated, it becomes performance. This method differs from a usual voting
difficult to carry out a proper visual assessment test even by classifier because the evaluation metrics (e.g. accuracy
a trained ophthalmologist or a clinician and hence, the pres- and mean squared error) are not averaged or voted, but
ence of non-uniform illuminations can impede correct pre- rather the meta-learner model gets multiple prediction
dictions [6]. Therefore, luminosity normalization becomes a probabilities as input, which are combined to generate
significant pre-processing aspect for a diverse set of retinal better features and thus achieve accurate results.
images. Some of the previous work have considered nor- • We consider the Exponential Linear Unit (ELU) activa-
malizing the luminosity of the retinal images using various tion function for each sub-model due to its fast conver-
statistical, mathematical, and particularly HSV color space gence and more accurate results.
based models to desaturate the image [5]–[7]. • To monitor the generalization of error and avoid con-
The previous diagnostic studies of DR can be classified ditions such as overfitting and bias-variance trade-off
into two types: 1) Automatic detection of the disease (binary), during training, techniques such as exponential learning
and b) Classification of different stages of the disease (multi- rate decay and early-stopping are also applied to give an
class). In our study, our focus is to automate the diagnos- overall regularization effect on the proposed model.
tic process and to combine the luminosity normalization • Extensive experiments and comparisons between the
pre-processing pipeline with an advanced artificial intelli- proposed model with the existing works in the diagno-
gence technique. Till now various image processing tech- sis of DR have been drawn to validate our model and
niques have been presented to detect DR by considering findings.
the definitive candidate regions such as cotton wool spots,
exudate, hemorrhages, and blood vessels, as reported in [8], The rest of the paper is organized as follows. Section II
[9]. These methods rely on manual feature extraction but, discusses the literature review of DR diagnosis. In Section III
since most of the retinal images depict non-uniform fea- we state our motivations. Image normalization and the
stacked generalization of deep CNNs model are discussed
tures, thus generalizing feature set for all images may give
in Section IV. Section V presents the experimental setup.
inappropriate diagnostic results when a large database is
considered. Section VI presents the quantitative analysis. The discussion
Various architectures of the multi-layered perceptron, con- is presented in Section VII. Finally, our conclusions and
volutional neural networks (CNN), and machine learning possible future work are presented in Section VIII.
algorithms have also been implemented for automatic dis-
ease detection [10]–[12]. However, none of these studies II. LITERATURE REVIEW
has addressed the problem of non-uniform reflectance and Different techniques have been presented by researchers to
over-saturation of a fundus image surface for developing deal with retinal image normalization, balancing luminos-
an unbiased DR diagnostic tool. Therefore, to alleviate this ity distribution, contrast normalization, and computer-aided
issue we have presented a novel color constancy technique diagnostic systems, which have proved to be of great impor-
to reduce irrelevant reflectance in fundus images and for tance in the field of retinal imaging. The literature survey of
the feature extraction of the pre-processed images, a stacked this study covers two major categories of DR works to ensure
generalization of deep CNNs is developed, which can also be that an overall view is given for better understanding. The
considered as a superior cross-validation technique for neural works of each category were evaluated based on different
networks models. performance metrics and design attributes based on the data
The main contributions of this paper can be summarized as pattern and proposed experimental design.
follows: The first category comprises of works, which solely
• We solve the non-ideal illumination and color degrada- focused on an image processing based methodology for DR
tion problems by using the gray world color constancy detection. Zhou et al. [5] presented a luminosity adjustment
schema to desaturate the retinal images. This will enable technique in which a luminance matrix is obtained by the
ophthalmologists to use color of the images as a reliable gamma correction of value channel in HSV color space
cue for recognizing the DR signs and avoid the various to improve the quality of individual RGB channels. For
distortions related to light distribution and color, which improving the contrast of images, contrast limited adaptive
may hinder the diagnostic results. histogram equalization (CLAHE) technique was used that

VOLUME 9, 2021 108277


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

involves a kernel based iterative process to normalize the A stacking technique of machine learning algorithms was
histogram of image pixels to avoid congestion of the pixels in presented in [1] to prepare a DR screening tool. Lesions and
a particular range, thus improving the image quality. In [7], microaneurysms are extracted and then classified using an
the authors proposed the histogram equalization-based image ensemble classifier. The model’s performance was evaluated
processing technique for fundus image enhancement and using accuracy, sensitivity, and specificity, and achieved 90%,
developed a CNN model for classification. They used a small 91%, 90%, respectively. In 2017, another improved ensemble
dataset of 400 images and achieved a sensitivity of 96.67% technique was presented by Somasundaram and Alli [23].
and specificity of 93.33%. Bhaskar and Kumar [13] proposed Machine learning bagging ensemble classifier (ML-BEC)
a technique to normalize the contrast and luminosity of the was considered for the prediction of DR. They implemented
fundus images by assuming that all the neighborhood pixels the t-distributed Stochastic neighbor embedding (t-SNE)
are independent and identical to each other. In [14], a retinal algorithm to separate the images into similar and dissimilar
enhancement technique based on Speeded up Adaptive Con- pairs. Saleh et al. [24] presented an ensemble technique for
trast Enhancement (SUACE) algorithm integrated with the DR risk assessment, which justifies the presence or absence
Tyler-coy algorithm was proposed. The SAUCE algorithm of the disease. They prepared a dominance-based rough set
uses a gray-scale image obtained by Principal Component balanced rule ensemble (DRSA-BRE) and compared their
Analysis (PCS), which was then fed into the Tyler-coy algo- works with the random forest classifier. The best sensitivity
rithm to remove the discontinuities of blood-vessel for better score achieved was near 80%. Similarly, various DR detec-
prediction results. In 2015, [15] presented an illumination tion methods have been presented in this field [25]–[27].
correction technique using a low-pass filter and Gaussian However, none of these solve the problem of non-uniform
filter. By using the low pass filter, the background of the illuminations, which can play a major role in detection of
image is normalized and then superimposed with the results proliferative and Non-proliferative DR.
of the Gaussian filter, thus removing any sort of foreground Table 1 summarizes the most relevant work from two
noise that existed earlier. Singh et al. [16] used the usual major categories for DR detection along with the used per-
histogram equalization technique for low-radiance images to formance evaluation metrics and their limitations. From the
clip away the pixel-values based on the threshold, which was presented research literature in Table 1, we can infer that most
calculated by taking the average median value of the image of the techniques focused on retinal contrast enhancement
to enhance the normalization results. They used structural and machine/deep learning models for classification without
similarity index measure and the Euclidean distance to val- addressing the non-uniform reflectance of fundus data during
idate their prediction results. Although numerous techniques image acquisition. Therefore, to alleviate these issues we
proposed methodologies for image contrast enhancement, but developed a pipeline for image illumination normalization
none of them focused on image desaturation for developing a and a novel feature extraction model for early DR detection.
DR system.
The second category depicts the various deep/ machine III. MOTIVATION
learning methodologies, which have been presented for early Since most of the proposed approaches focused mostly on
DR detection. Most of the previous work was focused on the machine learning, deep learning, and image processing tech-
development of traditional machine learning and ensemble niques to extract candidate features such as lesions, hem-
deep learning techniques. Recently, Zhou et al. [10] proposed orrhages, exudates and cotton-wool spots but they ignored
a multiple instance learning technique, which is a weakly solving the variance in scene illumination and light degrada-
supervised technique to detect DR in fundus images. Initial tion, which affects the performance and may result in biased
image processing steps such as resizing, Gaussian smoothing prediction results. In our proposed method, we have used
were implemented before feature extraction. Their detection a dataset that has multi-sourced images. Therefore, various
model was divided into two parts. First, they created a bag types of noise and distortions are encountered in the images.
of image patches for detecting lesions. Second, a pre-trained To overcome such issues, we aim to explore the research area
Alexnet [17] model was utilized for automatic feature extrac- of combining artificial intelligence and image processing to
tion. The model achieved an AUC score of 92.5%. In [18], develop a complete illumination proof diagnostic tool for DR.
an ensemble approach using deep transfer learning models to The methodology that has been applied is discussed in the
detect DR was proposed. The models including ResNet [19], following sections.
Densenet [20], Inception [21], and Xception [22] for extract-
ing features and performed extensive hyperparameter tuning IV. METHODOLOGY
to achieve better results. They gave per-class metrics where Figure 1 demonstrates the different stages of the proposed
the highest AUC of the imbalanced class was 97%, but they methodology in the form of a model pipeline. After the data
did not consider any image pre-processing technique to nor- acquisition, the image luminosity is normalized by the color
malize the images. The authors used a dataset of images that constancy based gray world algorithm. The image processing
contains spatial noise and distortions such as blurring and pipeline is shown in the figure in which the illuminant K 0
darkened corners, which require a more advanced technique from the images is used to normalize the image. The data is
to get reliable results. split into training and test sets for the stacking convolutional

108278 VOLUME 9, 2021


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

TABLE 1. A summary of the related works presented in this study.

model. Three different sub-models of CNNs are fed into common color constancy algorithms: Gray World, Shades
a single meta-learner classifier for feature extraction. The of Gray, General Gray World, and Max-RGB. In this paper,
fusion strategy of the stacking model is based on the weighted we have consider the Gray World algorithm.
majority from each of the sub-model for generating better Gray world algorithm assumes that the average surface
features for classification. Data augmentation technique is reflectance of the image is achromatic and therefore varia-
also applied to improve the diversity of images in the dataset. tions could be done by including the average pixel values and
Finally, the meta-learner classifier produces the diagnostic scaling them by a scaling factor, which is computationally
result as healthy (No DR) or unhealthy (DR). inexpensive to calculate. Gray world is also a statistical algo-
rithm, which uses less computation power. According to [29]
most of the existing algorithms are based on assumptions. For
A. LUMINOSITY NORMALIZATION USING GRAY WORLD instance, the Max-RGB color constancy algorithm assumes
ALGORITHM the presence of white patch in the image to calculate the illu-
We use a color constancy algorithm for image normalization minant, whereas the gray world algorithm uses the average
as a pre-processing step. In our experiments, we dealt with reflectance, and encourages a data-driven approach in color
a dataset that contains images from multiple sources having constancy [30].
color variations, varying illuminations, and a non-uniform As explained above, it is assumed in the gray world algo-
light distribution, which resulted in a large amount of hetero- rithm that the (R,G,B) color channels have linear values,
geneity among images. In case of retinal images, heterogene- which means that the average reflection in standard light is
ity among images can cause a major difference in appearance, gray. But it is not what we see in real life. It is based on the
for example, some part of the image gets highlighted near the hypothesis that the average of each channel (R,G,B) in an
center, but boundaries get blurred, and these non-uniformities image I is always equal, i.e., gray [31]. However, the average
can seriously affect the diagnostic results. Therefore, it is is not constant and is either greater or less than the gray
necessary to propose a color calibration methodology for value. This deviation from the original gray value gives us
these images [28]. We have implemented the color constancy the illumination change. The illuminant of the image is then
algorithm to remove the unnecessary surface reflectance and estimated in the RGB mode, which is then used to normalize
to make the color of the image invariant to such illuminations each channel of the image to transform the image under a
and other color-related aberrations. Generally, there are four canonical light resource.

VOLUME 9, 2021 108279


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 1. A diagrammatic flow of the proposed methodology and the training process.

The stepwise algorithm of this popular luminosity normal- TABLE 2. Data augmentation parameters considered for retinal image
generation.
ization schema is explained below.
STEP 1 (Pixel Level Normalization): Initially, to get the
color of the light source, pixel-level normalization is carried
out by calculating the average pixel value of each sensor
channel. Consider an image as:

I = [Rwh , Gwh , Bwh ] (1)

Rwh , Gwh , and Bwh represent the sensor channels, whereas w X̄min
βb = (5)
and h depict the image width and height, respectively. The B̄wh
mean pixel value can be calculated as: Thus, the resultant illuminant belongs to each sensor chan-
nel of the image. For example, if the image is I , then the
1X
µj = Ij (2) component of illuminant is ec , where c ∈ [R, G, B].
j STEP 3 (Scaling Individual Image Channel): The nor-
j
malized image is obtained by scaling each individual
Here, j = R, G, B. color channel by multiplying it with the scaling factor
STEP 2 (Gray World Illuminant Calculation): In the gray as:
world color correction, one of the color channels is selected
as a reference to calculate the illuminant but the intensity of R0 = Rwh × βr (6)
the resultant normalized image degrades and may hinder the G0 = Gwh × βg (7)
diagnostic results. Therefore, in the proposed method, a com-
B0 = Bwh × βb (8)
pressed color channel technique of the Gray world algorithm
is used in which the color channel with minimum average R0 , G0 , and B0 represent the normalized channels of the resul-
magnitude is selected as proposed in [32]. The scaling factors tant color normalised image.
obtained based on this magnitude can be expressed as: Figure 2 shows the normalised images obtained after
applying the gray world color constancy algorithm.
X̄min
βr = (3)
R̄wh B. ARTIFICIAL DATA GENERATION
X̄min To add more diversity to the dataset, data augmentation tech-
βg = (4)
Ḡwh nique is used in which artificial data is generated from the

108280 VOLUME 9, 2021


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 2. Results of image normalization using colour constancy algorithm. The first row shows three original images, whereas the second row shows
their corresponding colour normalized images after applying the gray world algorithm. The yellow arrow points out that features visible in original
image such as blood vessels, macula, haemorrhages are not affected after luminosity normalization.

pre-existing images. The data is generated for each mini batch array of numbers, where these numbers are the weights that
in an iterative process during model training. The applied are updated continuously. The area over which it slides is
augmentation steps include horizontal flip, width shift, height called the receptive field. In our model, we apply 3 × 3
shift, fill mode, and zoom range. Table 2 illustrates the data filters with a depth of 3 since we have colored images of size
augmentation parameters. Figure 3 shows nine generated 96 × 96 × 3. The filter convolution over an image results in
images in an iterative process during model training with a an element-wise multiplication with pixel values represented
rotation angle set to 74 degrees. as:
XX
L[m, n] = (f × h)[m, n] = h[j, k]×f [m − j, n − k]
C. CONVOLUTIONAL NEURAL NETWORK j k
The CNN models are based on the principle of layer-wise (9)
abstraction for feature learning. The complexity as well as
Here, the input image is f , the kernel is denoted by h, and
the number of features increase with the model depth. CNNs
the indices of rows and columns are represented by m and n,
follow an analogous feed-forward architecture just like an
respectively. After completion of the first convolutional layer,
artificial neural network, but they are much better in the
a feature map is generated which is the input for the second
generalization for computer vision-related problems. They
layer. We consider max pool as the pooling layer, which down
are commonly known as ConvNets and usually consist of an
sample the resulting feature maps and increases the receptive
input layer, hidden layers, and an output layers. The hidden
field on the filters [33]. To induce non-linearity to the feature
layers have some activation functions, fully connected layers,
maps, activation functions are applied. In our case, we utilize
and pooling layers. The top layers of a CNN model tend
the Exponential Linear Unit (ELU) activation function, which
to learn low-level features such as edges, color, and shapes,
is defined as:
whereas the deeper layer focuses on learning high-level (
features. x, if x > 0
Typically, CNN models are a stack of alternating con- ELU (x) = (10)
α(ex − 1), if x ≤ 0
volutions with various sizes of filters, pooling, and fully
connected layers. The difference between a fully connected Although it is computationally expensive but it converges
layer and a convolution layer is that the convolution layer faster as compared to other activation functions.
is partially connected and receives inputs from a sub-area According to [34] the presence of the extra parameter α
of the previous layer, whereas in a fully connected layer controls the saturation point for negative values and thereby
all the previous neurons are related to the next neurons for it is computationally inexpensive as compared to other func-
feature transmission [33]. A kernel or commonly known as tions. This is what differentiates it from the commonly used
the filter is a sliding window over the image, which is an Rectified Linear Unit (Relu) activation function. In a neural

VOLUME 9, 2021 108281


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 3. An illustration of data augmentation in retinal images.

network, forward and backward propagation are one of the individual biases of the sub-models is called the stacked
most important factors in determining the convergence per- generalization [35]. It is different from the usual model aver-
formance of a CNN. The forward propagation has two steps. aging methodology in terms of the fusion strategy because
First, the calculation of Z , which is determined as: the classification results are not averaged, but the final output
is decided by the weighted majority of the sub-models. In the
Z = W [l] × A[l−1] × B[l] (11) case of deep learning, multiple CNNs with different archi-
Here W is a tensor which has a filter and B is the bias term. tectures are merged before giving the final output. Different
The second step, applying the activation function as follow: ensemble techniques have been applied in similar work [1],
[24], however these were machine learning algorithms-based
A[l] = K [l] (Z [l] ) (12) techniques, such as ensemble models of K-nearest neighbor,
Naïve Bayes and Decision tree. In contrary, we have used
Here, k denotes the activation function. This process is
CNNs for stacked generalization.
followed by a backward propagation in which partial deriva-
In order to reduce the bias in machine learning, usually,
tives are calculated to update W and B for improving gradi-
the crude cross-validation techniques such as 10-fold cross-
ent descent convergence, and reducing the error. The partial
validation and Leave-one-out cross-validation are applied.
derivative can be calculated as:
However, if those techniques are applied in CNNs, the com-
∂L plexity and computational time of the model increase tremen-
DX [l] = (13)
∂A dously unlike most of the machine learning algorithms.
Here, X denotes A[L], W [L], B[L], which are activation, A CNN deals with millions of parameters during the forward
weight, and bias, respectively. The Weights are updated as: and backward propagation [24]. Thus, different fusion-based
∂l ensemble methods have been presented by researchers every
W = Wi − η (14) year to combine multiple predictions in the most optimized
∂w
way. There are various fusion strategies for ensemble models
Here, η is the learning rate, Wi is initial weight. For the including model averaging, where we combine the predic-
CNN model, the images are resized into 96 × 96 × 3 before tions from several independently trained models as adopted
feeding into the neural network. in [36]. In our case, we use the weighted majority based
fusion technique for the stacked model.
D. STACKED GENERALIZATION OF CNNs Consider an ensemble of M independent classifiers
The methodology of stacking multiple sub-models into a D1 ,. . . ,DM , with individual accuracies p1 , p2 . . . , pM . Each
single meta-learner classifier to combine the prediction prob- classifier Di produces c-dimensional vector [di,1 , . . . , di,c ]T
abilities to reduce the generalization error by deducing ∈ {0, 1}c , i = 1, . . . M , where di,j = 1 if Di labels x in ωj ,
108282 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

and 0 otherwise. The majority vote will result in an ensemble TABLE 3. Label wise depiction of dataset division used for training and
testing of the stacked CNN and other transfer learning models.
decision for class ωk if
M
X M
X
c
di,k = maxj=1 di,j (15)
i=1 i=1

When introducing weights or coefficients of importance


bi ,i = 1, 2, . . . , M , and rewriting Eq.(15) as: choose class
taken in different lighting conditions with visible illumina-
label ωk if
tion variance, color combination, camera angles, therefore,
M
X M
X an image normalization technique was implemented before
c
bi di,k = maxj=1 bi di,j (16) feature extraction.
i=1 i=1

The outputs are combined by the maximum weighted B. MODEL BUILDING


majority as shown in Eq. (16) [37]. This results in an The goal of our experiment is to make an accurate valida-
improved overall performance because the models that per- tion tool for doctors to detect DR by avoiding unnecessary
form well individually, contribute more to the final metrics reflectance properties of fundus images and making light a
compared to less performing models. A stacked generaliza- reliable factor other than unnecessary distortions. A stacked
tion is a multi-level learning model because at each level it generalization of three different CNNs was prepared and fed
aims to select the most appropriate bias for minimizing the into a meta-learner as shown in Figure 4. Out of 2471 images,
overall generalization error. 495 images are kept for testing the model and 1976 images are
Figure 4 shows the stacked model of CNNs prepared for used for training. During the stacking process, it is important
our experiments. Three different CNN architectures are pre- to train the meta-learner on a separate dataset other than the
pared to be fed into the meta-learner classifier. Therefore, data on which individual sub-networks are trained to avoid
three different copies of the input data for each of the network any sort of overfitting and bias in the results. That is why each
is made, and after the concatenation a 3-element vector of of the sub-model is trained on the same training set, however
prediction probabilities is created, which can be seen in the the test results by the meta-learner were tested on a separate
concatenate_7 layer of the stacked model from 3 different test set.
sub-models and 2-class labels after applying the sigmoid Table 3 gives the information about the class-wise distri-
function to produce result as either 0 (No DR) or 1 (DR). bution of the dataset used for our experiments. Image pro-
cessing and normalization are applied using libraries such as
V. PERFORMANCE ANALYSIS OpenCV, NumPy, and PIL and for the model development,
A. DATASET DESCRIPTION Keras and TensorFlow libraries are used.
The dataset used in our experiment was acquired from a Kag- The different CNNs which are used as sub-models are
gle competition and is a benchmark dataset for DR diagnosis comprised of many successive convolution layers, pooling
provided by EyePACS [38]. EyePACS is a web-based sys- layers, and batch normalization layers. The architecture of
tem designed to remotely help patients deal with DR issues the three sub-models is not similar in terms of the number
without the need of a doctor. It is a platform where clinicians of layers and the combination of pooling and batch normal-
could collaborate and share their work, which could be further ization. The reason for stacking different architectures is to
used for research purposes. EyePACS shared their data with achieve different results for better generalization. However,
Google and Kaggle to host a competition for tackling DR hyperparameters such as learning rate, batch size, epochs
where people could contribute to open further research areas are identical for each sub-model. The final fully connected
through using their open-source retinal database. The dataset layers are apply sigmoid function to give the binary diagnosis.
is highly imbalanced with the number of healthy images Table 4 illustrates the layer-wise ConvNet configuration of
overshadowing the number of severe and advanced stage of the first CNN which is the part of the stacking model.
DR images. The models are trained for 18 epochs with batch size of 16.
Figure 5 shows a corpus of sample retinal images obtained Accuracy was considered as the initial metric of evaluation.
from the EyePACS dataset. Since we are detecting DR and not Stochastic gradient descends (SGD) was used as the model
classifying the stages therefore, we used a subset of balanced optimizer. To prevent the model from overfitting, hyperpa-
data and the images were divided and put into two different rameter tuning was also applied but the problem of overfitting
folders of healthy and unhealthy retinal images. Figure 6 persisted. To address this issue two important techniques
explains about the feature descriptions of the retinal fundus were utilized. First, we applied a learning rate decay. The
images used in our experiments. We have used 2471 images initial learning rate is set to 0.00009 and a rate decay equiv-
and divided them as 20% for validation and 80% for training. alent to learning rate/100 is applied. Therefore, a gradual
All images are colored and kept in the original .JPEG format. decrease in the learning rate with a factor of 100 improved
Since images were not acquired by the same camera and were the gradient descend convergence. Second, two regularization

VOLUME 9, 2021 108283


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 4. An illustration of the stacked CNNs concatenated on top of the meta-learner classifier.

108284 VOLUME 9, 2021


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

loss vs. training loss graph gets stuck at an inflection point,


where the loss does not decrease and no improvement in
performance is detected. That inflection point looks like a
plateau formed in the graph. Thus, the second function we
used for regularization was to reduce the learning rate by a
factor of 100 when such plateau is reached during the training
process.

D. MODIFIED DEEP TRANSFER LEARNING MODEL


To fairly compare our findings, we implemented two different
deep transfer learning models, which are ResNet50 [19] and
VGG-16 [40]. Transfer learning models can be used in dif-
ferent ways to transfer the learned features from pretrained
models [41]. However, the most prominent work follows two
scenarios, which are known as feature extraction and fine-
FIGURE 5. Sample fundus images from the EyePACS dataset. tuning. In our experiments, we utilize the pretrained models
as feature extractors.
We used the same dataset and divided it in the similar ratio
as done for the stacked generalization CNN model. Data aug-
mentation was also applied for improving the diversity of the
data. Both ReseNet50 and VGG-16 models were fine-tuned
as it is necessary to improve the performance. In our case
we did the layer-wise fine tuning similar to [42], [43], as it
is more effective and less time consuming. We added the
fully connected layer head to ResNet50 and VGG-16, which
consists of a pooling layer, fully connected layer and the final
layer having sigmoid function to give us the binary output.
The weights of both VGG-16 and ResNet50 were frozen so
that only the fully connected layers were adjusted. Similarly,
we trained the networks for 18 epochs with a batch-size of 16.

VI. QUANTITATIVE ANALYSIS


Three major checkpoints are cleared in our experiments. First,
FIGURE 6. DR dataset for fundus image diagnosis. Figure 6 (1) shows the solving the multi-sourced dataset problem of normalizing
healthy retinal image having no signs of any type of lesions and non-uniform luminosity by desaturating images using their
hemorrhage. Figure 6 (2) shows the unhealthy retinal image having mild
stage DR because it has some lesion presence. Figure 6 (3) shows the statistical features such as mean pixel values and an optimum
presence of yellowish irregular edges known as the hard exudates in the scaling factor. Second, developing an automated detection
unhealthy retinal image. Figure 6 (4) shows the unhealthy retinal image
having severe stage of DR due to prominent presence of cotton wool
system for the normalized fundus images using an advanced
spots, which are caused due to the accumulations of the axoplasmic artificial intelligence technique known as stacked generaliza-
material in the retina. tion of CNNs, which uses the principle of weighted majority
of sub-models. Third, to support our experimental results with
techniques are also implemented which are discussed in the proof, various comparisons are drawn with the benchmark
next subsection. deep transfer learning models.

C. REGULARIZATION BY CALL-BACK FUNCTIONS A. LUMINOSITY NORMALIZATION ANALYSIS


Call-back is a set of functions that are applied to induce a The results of the proposed image illumination normaliza-
regularization effect to generalize the deep learning model tion technique are shown in Figure 2. It is visible that the
and stabilize the estimates to combate overfitting. Usually, images are color calibrated using the gray world algorithm.
the regularization techniques increase the bias and reduce the After applying this algorithm, the images are desaturated
variance of the model [39]. First, we applied a strategy known using the illuminant, which is taken as the minimum mag-
as early stopping in which the training is stopped prematurely nitude color channel. This aids in the transformation of a
as the validation loss tends to increase resulting in a steep uniformly luminous image, and thus removing the presence
increase in the loss curve and decrease in the model perfor- of the unnecessary reflectance. The saturation loss of these
mance, and thus giving us an optimal stopping point. For early images helped to reduce unnecessary hindrance like noise,
stopping, the hyper-parameter patience was set to 2. In the non-uniform light distribution, and non-ideal illuminations.
deep learning models, it is commonly seen that the validation Therefore these images are reliable and could be used for

VOLUME 9, 2021 108285


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

TABLE 4. Layerwise configuration of a single CNN architecture which is fed into the meta-learner classifier.

further analysis. In medical imaging, degradation of image significant amount of information that makes the feature
features is a major issue while implementing normalization extraction imprecise and difficult as explained in [45]. There-
techniques. The yellow arrows in Figure 2 clearly show the fore, a superior feature extraction technique known as stacked
presence of features such as blood vessels, hemorrhages, generalization of CNNs has been implemented. Since it is
retinal macula, which play a major role in decision-making a binary classification task, high values of accuracy and
for DR diagnosis. other evaluation metrics are expected. To prepare a robust
To support our arguments and provide more concrete model, hyperparameter tuning is performed, which shows the
proof for the proposed luminosity normalization technique potential fluctuation and improvement in accuracy and loss.
we have calculated the Peak signal-to-noise ratio (PSNR) The experimental results reveal that the proposed stacked
and mean squared error (MSE) of the transformed image. CNN model achieves an accuracy of 97.92% on the training
PSNR can be defined as the ratio between the maximum set with a training loss of 0.066. On the test set the model
possible power of a signal and the power of corrupting noise achieves an accuracy of 97.77% with a test loss of 0.078.
that affects the quality of its representation [44]. PSNR is Table 5 shows the evaluation metrics obtained after using var-
measured in Decibels (dB) and in most cases, a higher value ious activation functions with and without data augmentation.
of PSNR indicates that the enhanced or reconstructed image It can be clearly observed that the ELU activation function
is of superior quality. On the other hand, MSE tells us works best with data augmentation and gave better results in
about the difference in the images by computing the aver- terms of accuracy, sensitivity, and specificity.
age of the squared errors between two images. The lesser Table 6 depicts the performance metrics such as train
the value of MSE, the better the image enhancement tech- loss, test loss, train accuracy, test accuracy, and area under
nique. Mathematically, PSNR and MSE can be defined as curve (AUC) values in comparison with the other deep trans-
follows: fer learning models. Experimental results reveal that the
(2n − 1)2 proposed model achieves an AUC value of 0.9979. AUC is
PSNR = 10 log10 ( ) (17) an important performance measure that proves the model’s
MSE
M N reliability over other solutions.
1 XX Table 7 shows a report containing precision, recall, and
MSE = (k(i, j) − l(i, j))2 (18)
MN F-measure scores of all the competent models. These are
i=1 j=1
important metrics in the evaluation of a computer-aided
Here, M and N define the number of rows and columns diagnostic system. Precision score depicts the exactness and
in the image, respectively. k(i, j) represents the original refer- tells how often the predicted value is correct, whereas the
enced image and l(i, j) represents the luminosity normalised F1-measure, which is the harmonic mean of recall and pre-
image. n stands for the max value of a pixel in the image. cision, reveals the test accuracy. From Table 7, we observe
Figure 7 shows the difference between statistical values of that the proposed stacked CNN model outperforms all other
PSNR and MSE between normal gray image and gray world competitive models.
normalised image. Figure 8 shows the confusion matrix for each model,
which summarizes the predicted results and the type of errors
B. STACKED CNN MODEL ANALYSIS compactly over the test set. Although VGG-16 has a lesser
The images of our model are available in the .JPEG for- number of false negatives (FN) as compared to the stacked
mat. Since it is a lossy compression, the images lose a CNN model but the model obtains greater number of false
108286 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 7. Statistical comparative analysis between normal gray image and gray world normalized image based on
PSNR and MSE values.

TABLE 5. Performance of different activation functions in the proposed model with data augmentation.

TABLE 6. An evaluation metric report of the proposed models.

TABLE 7. Comparative analysis between the proposed model and other with a real-life problem in the medical domain, reducing false
deep transfer learning models on similar dataset.
negatives as well as achieving considerably higher values
for true positives are important. The medical domain is a
field of precision, it is important to consider such metrics of
evaluation that directly deal with the correct and incorrect
predictions. So, we have considered sensitivity and speci-
ficity, which are discussed in detail below.
positives (FP) and lesser number of true negatives (TN).
However, the proposed stacking ensemble model has a greater C. SENSITIVITY AND SPECIFICITY ANALYSIS
number of TN and zero FP, which reveals its accuracy for Sensitivity and specificity play a crucial role in the medical
both healthy (No DR) and unhealthy (Having DR) images. domain. Higher values of sensitivity and specificity prove
From Figure 8 (d) we observe that our proposed model has the reliability of a diagnostic model. Sensitivity is the abil-
got only 11 FN, which means that only 11 out of 495 patients ity of the model to successfully predict the actual posi-
are falsely predicted as not having DR. Since we are dealing tive value [46], which in our case, to correctly predict the

VOLUME 9, 2021 108287


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

FIGURE 8. Confusion matrices of the proposed model and other compared models. (a) CNN, (b) VGG-16, (c) ResNet50, and (d) stacked CNN model.

unhealthy fundus image of a patient as having DR. Mathe-


matically, it can be measured in terms of percentage as:
TP
Senstivity = × 100 (19)
TP + FN
Specificity, on the other hand, shows how accurately is
the model in detecting those people who do not have DR.
In other words, it correctly predicts the healthy fundus image.
Achieving high values of specificity may also have a business
impact as it can save time for an ophthalmologist to carry out
further tests if an earlier report is correctly predicted nega-
tive. Mathematically, specificity can be measured in terms of
percentage as:
FIGURE 9. Bar plots for evaluation metrics of the proposed stacked CNNs
TN model with VGG-16, CNN, and ResNet50.
Specificity = × 100 (20)
TN + FP
learning models during training and testing stages. However
The sensitivity, specificity, and accuracy of our proposed monitoring at testing stage is very important to see the per-
models are plotted in Figure 9, where we can see that our pro- formance on unseen data to validate the generalizability and
posed stacking ensemble CNN model achieves higher scores cross-check how well the model has learned during training.
than other competitive models. However, the sensitivity of All the models are trained for the same number of epochs on
VGG-16 is a slightly higher than other models, but due to the same dataset to minimize all the possible redundancies
more false positives, its specificity is low. and discrepancies. Figure 10 shows the convergence of the
loss curve during the testing phase. It is visible that the
D. PERFORMANCE DURING TEST PHASE proposed model outperforms all other models till the end of
We closely monitored the performance of the proposed all the iterations as its loss curve goes to the global optimum
stacked CNN model with and other competent deep transfer point of 0.078.

108288 VOLUME 9, 2021


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

TABLE 8. Performance analysis of related works on binary fundus


dataset.

TABLE 9. Performance analysis of related works on multiclass fundus


dataset.

FIGURE 10. Test stage performance analysis of the proposed stacked


CNN model with competitive transfer learning models.

TABLE 10. Class distribution of the Kaggle multi-class dataset [51].


VII. DISCUSSION
The contribution of our experiments includes the use of a
publicly available EyePACS dataset from Kaggle. The image
data is multi-sourced with various discrepancies due to var-
ious reasons such as different cameras and lighting con- only 2.2% false negatives, which proves the reliability of our
ditions. Therefore, image normalization is very important. model. Our method is also economically viable to implement
The images are pre-processed for luminosity normalization as it does not require expensive equipment/gadgets with high
using the gray world color constancy algorithm to enhance graphical processing unit (GPU) power. According to [47]
the candidate regions by reducing the unnecessary lighting sensitivity values achieved in detecting DR greater than 60%
and reflectance. To confirm and support the results of our proves to be cost-effective. Since our model was trained with
normalization step, we analyzed the enhanced images based a dataset having a lot of variances, it also proves the high
on PSNR and MSE measures. The PSNR value was improved adaptability and robustness of our model to perform accu-
as shown in Figure 7, which proves the importance and effect rately with fundus images having non-ideal illuminations.
of our color correction schema. Table 8 compares the performance of the proposed model
Researchers have presented similar work related to color on binary classification with previous work conducted on
constancy and retinal image enhancement using various tech- DR detection using similar multi-sourced datasets. To verify
niques as described in the literature review. However, an auto- the results, our proposed model has been tested on several
mated tool using these techniques has not been presented. binary and multi-class datasets. It can be observed that [50]
Most of the algorithms focused on extracting features such obtained an accuracy of 97.30%, however, the sensitivity of
as cotton wool spots, exudates, lesion presence, hemorrhage their model is lower compared to our proposed stacked model.
detection for disease diagnosis, but did not discuss luminos- Therefore, our model is superior in detecting true positives
ity normalization as a pre-processing step. The diagnostic accurately.
decision-making stage was handled by stacked generalization Table 9 compares the performance of our model with
of CNNs, which proved to be better than other competitive previous studies for multi class classification on the Kaggle
models including VGG-16, ResNet50 and CNNs. Compar- dataset [51]. The dataset has five satges of DR including:
isons are also drawn between the proposed model and other healthy, mild, moderate, severe, advanced as summarized
models in terms of accuracy, sensitivity, specificity, precision, in Table 10. The proposed model achieved the highest sensi-
recall, and F-measure. tivity and specificity values and outperformed all other mod-
There are two main theories behind developing an auto- els with a final test accuracy of 87.45%. This accuracy score
mated validation tool that could remove the non-ideal illu- is inferior to the binary classification results due to the imbal-
minations from retinal fundus images using deep learning. anced data as depicted in the given dataset. Table 11 presents
The first was to reduce the human effort in extracting manual the performance of the proposed model on various binary
features for Diagnosis and let the power of artificial intelli- and multi-class datasets in terms of accuracy and precision.
gence and image processing techniques extract and enhance Considering all metrics of Tables 8, 9, and 10 together, it can
features automatically. The second was the adaptability of be concluded that the proposed model outperforms state-of-
deep learning models to solve a variety of problems and the art models and is successful in both binary and multi-class
availability of optimization methodologies such as various classification of DR.
regularization techniques for better performance. Our major Figure 11 depicts the sensitivity and specificity of the
focus was also to reduce the number of false negatives and proposed models compared with the different machine/deep
the experimental results on unseen test data showed that learning methods carried out in the literature [2], [23].

VOLUME 9, 2021 108289


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

TABLE 11. Performance evaluation of the proposed model on the publicly available fundus datasets.

generalization-based ensemble model is prepared using three


different CNNs. The performance of image normalization is
measured using statistical metrics such as the PSNR and MSE
of the original and enhanced images. The stacked ensemble
model is an advanced technique of stacking different neural
networks whose combined results are produced based on a
fusion strategy that combines the best weights of the individ-
ual neural networks. Machine learning models are extensively
utilized to classify and detect DR in fundus images. How-
ever, these techniques require suitable pre-processing and
feature extraction methods to improve the results especially
FIGURE 11. Sensitivity and specificity based graphical analysis of the when the images are from different sources. DR images are
proposed model with the work of [2], [23].
generally taken from different cameras under different light-
ing conditions and to mitigate these effects we adopted an
efficient color constancy technique. Extensive experiments
are conducted to evaluate the performance of the proposed
model in binary as well as multi-class DR classification
tasks. Considering the obtained results using various eval-
uation metrics, we validate our model, which outperforms
state-of-art models in binary and multi-class classification
tasks.
For future work, we think of diversifying and increasing
the images in the dataset for improving the feature extraction
capabilities. Metaheuristic techniques can be used for hyper-
parameter optimization to achieve more competitive results.
The patient’s family medical history, daily diet, and nutrition
intake can be included in the dataset to provide insightful
FIGURE 12. ROC curve of the proposed stacked CNN model with AUC information for the disease.
value of 0.99.

REFERENCES
Figure 12 shows the ROC curve of the proposed model
[1] B. Antal and A. Hajdu, ‘‘An ensemble-based system for automatic screen-
for binary classification task where it obtains an AUC value ing of diabetic retinopathy,’’ Knowl.-Based Syst., vol. 60, pp. 20–27,
of 0.99. The results provided in Table 8 and Figure 12 Apr. 2014.
prove the potential of the proposed stacking deep learning [2] R. Gargeya and T. Leng, ‘‘Automated identification of diabetic retinopa-
thy using deep learning,’’ Ophthalmology, vol. 124, no. 7, pp. 962–969,
technique. Our model is able to outperform the conven- 2017.
tional methods for diagnosis. Finally, our stacked generaliza- [3] Early Treatment Diabetic Retinopathy Study Research Group, ‘‘Early
tion of CNNs achieve accuracy of 97.92% on the train set photocoagulation for diabetic retinopathy: ETDRS report number 9,’’ Oph-
thalmology, vol. 98, no. 5, pp. 766–785, 1991. [Online]. Available: https://
and 97.77% on the test set, a sensitivity of 96.86%, and a www.sciencedirect.com/science/article/abs/pii/S0161642013380117
specificity of 100% in binary classification. The Proposed [4] S. Gadkari, Q. Maskati, and B. Nayak, ‘‘Prevalence of diabetic retinopathy
model also outperforms ResNet50 in terms of accuracy and in India: The all India ophthalmological society diabetic retinopathy eye
screening study 2014,’’ Indian J. Ophthalmol., vol. 64, no. 1, p. 38, 2016.
F-measure. For multi-class classification, the model achieves [5] M. Zhou, K. Jin, S. Wang, J. Ye, and D. Qian, ‘‘Color retinal image
train and test accuracy of 96.45%, 96.30% respectively as enhancement based on luminosity and contrast adjustment,’’ IEEE Trans.
reported in Table 9. Biomed. Eng., vol. 65, no. 3, pp. 521–527, Mar. 2018.
[6] W. A. Mustafa, H. Yazid, and M. M. M. A. Kader, ‘‘Luminosity correction
using statistical features on retinal images,’’ J. Biomimetics, Biomater.
VIII. CONCLUSION AND FUTURE WORK Biomed. Eng., vol. 37, pp. 74–84, Jun. 2018.
We proposed to solve the problem of non-ideal illuminations [7] O. Deperlioglu and U. Kose, ‘‘Diagnosis of diabetic retinopathy by
using image processing and convolutional neural network,’’ in Proc. 2nd
in the retinal fundus images using the gray world algorithm Int. Symp. Multidisciplinary Stud. Innov. Technol. (ISMSIT), Oct. 2018,
and to develop an automated DR prediction system. A stack pp. 1–5.

108290 VOLUME 9, 2021


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

[8] J. Wang, Y. Bai, and B. Xia, ‘‘Simultaneous diagnosis of severity and [29] D. Liu, ‘‘Comparison analysis of color constancy algorithms,’’ Dept.
features of diabetic retinopathy in fundus photography using deep learn- Eng. Sustain. Develop., University of Gävle, Gävle, Sweden, Tech.
ing,’’ IEEE J. Biomed. Health Informat., vol. 24, no. 12, pp. 3397–3407, Rep. S-801 76, 2013.
Dec. 2020. [30] V. Agarwal, B. R. Abidi, A. Koschan, and M. A. Abidi, ‘‘An overview
[9] V. M. G. S. Gupta, S. Gupta, and P. Sengar, ‘‘Extraction of blood veins from of color constancy algorithms,’’ J. Pattern Recognit. Res., vol. 1, no. 1,
the fundus image to detect diabetic retinopathy,’’ in Proc. IEEE 1st Int. pp. 42–54, 2006.
Conf. Power Electron., Intell. Control Energy Syst. (ICPEICES), Jul. 2016, [31] G. Chen and X. Zhang, ‘‘A method to improve robustness of the gray
pp. 1–3. world algorithm,’’ in Proc. 4th Int. Conf. Comput., Mechatronics, Control
[10] L. Zhou, Y. Zhao, J. Yang, Q. Yu, and X. Xu, ‘‘Deep multiple instance Electron. Eng., 2015, pp. 243–248.
learning for automatic detection of diabetic retinopathy in retinal images,’’ [32] N. M. Kwok, D. Wang, X. Jia, S. Y. Chen, G. Fang, and Q. P. Ha,
IET Image Process., vol. 12, no. 4, pp. 563–571, Apr. 2018. ‘‘Gray world based color correction and intensity preservation for image
enhancement,’’ in Proc. 4th Int. Congr. Image Signal Process., Oct. 2011,
[11] M. Masood, T. Nazir, M. Nawaz, A. Mehmood, J. Rashid, H.-Y. Kwon,
pp. 994–998.
T. Mahmood, and A. Hussain, ‘‘A novel deep learning method for recog-
[33] S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa, ‘‘Conceptual under-
nition and classification of brain tumors from MRI images,’’ Diagnostics,
standing of convolutional neural network—A deep learning approach,’’
vol. 11, no. 5, p. 744, Apr. 2021.
Procedia Comput. Sci., vol. 132, pp. 679–688, Jan. 2018.
[12] G. T. Zago, R. V. Andreão, B. Dorizzi, and E. O. T. Salles, ‘‘Diabetic [34] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, ‘‘Activa-
retinopathy detection using red lesion localization and convolutional neural tion functions: Comparison of trends in practice and research for
networks,’’ Comput. Biol. Med., vol. 116, Jan. 2020, Art. no. 103537. deep learning,’’ 2018, arXiv:1811.03378. [Online]. Available: http://arxiv.
[13] K. U. Bhaskar and E. P. Kumar, ‘‘Extraction of hard exudates using org/abs/1811.03378
functional link artificial neural networks,’’ in Proc. IEEE Int. Advance [35] D. H. Wolpert, ‘‘Stacked generalization,’’ Neural Netw., vol. 5, no. 2,
Comput. Conf. (IACC), Jun. 2015, pp. 420–424. pp. 241–259, 1992.
[14] A. M. R. R. Bandara and P. W. G. R. M. P. B. Giragama, ‘‘A retinal image [36] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Ensembles of deep
enhancement technique for blood vessel segmentation algorithm,’’ in Proc. learning models and transfer learning for ear recognition,’’ Sensors, vol. 19,
IEEE Int. Conf. Ind. Inf. Syst. (ICIIS), Dec. 2017, pp. 1–5. no. 19, p. 4139, Sep. 2019.
[15] W. A. Mustafa, H. Yazid, and S. B. Yaacob, ‘‘Illumination correction of [37] L. I. Kuncheva, Combining Pattern Classifiers: Methods Algorithms.
retinal images using superimpose low pass and Gaussian filtering,’’ in Hoboken, NJ, USA: Wiley, 2014.
Proc. 2nd Int. Conf. Biomed. Eng. (ICoBE), Mar. 2015, pp. 1–4. [38] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy,
[16] N. Singh, L. Kaur, and K. Singh, ‘‘Histogram equalization techniques S. Venugopalan, K. Widner, T. Madams, J. Cuadros, and R. Kim, ‘‘Devel-
for enhancement of low radiance retinal images for early detection of opment and validation of a deep learning algorithm for detection of diabetic
diabetic retinopathy,’’ Eng. Sci. Technol., Int. J., vol. 22, no. 3, pp. 736–745, retinopathy in retinal fundus photographs,’’ J. Amer. Med. Assoc., vol. 316,
Jun. 2019. no. 22, pp. 2402–2410, 2016.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [39] D. Singh, V. Kumar, V. Yadav, and M. Kaur, ‘‘Deep neural network-
with deep convolutional neural networks,’’ in Proc. 25th Int. Conf. Neural based screening model for COVID-19-Infected patients using chest X-ray
Inf. Process. Syst., 2012, pp. 1097–1105. images,’’ Int. J. Pattern Recognit. Artif. Intell., vol. 35, no. 3, Mar. 2021,
Art. no. 2151004.
[18] S. Qummar, F. G. Khan, S. Shah, A. Khan, S. Shamshirband, Z. U. Rehman,
[40] D. Singh, V. Kumar, and M. Kaur, ‘‘Densely connected convolutional
I. A. Khan, and W. Jadoon, ‘‘A deep learning ensemble approach for
networks-based COVID-19 screening model,’’ Appl. Intell., vol. 51, no. 5,
diabetic retinopathy detection,’’ IEEE Access, vol. 7, pp. 150530–150539,
pp. 3044–3051, 2021, doi: 10.1007/s10489-020-02149-6.
2019. [41] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Handcrafted versus
[19] N. Gianchandani, A. Jaiswal, D. Singh, V. Kumar, and M. Kaur, ‘‘Rapid CNN features for ear recognition,’’ Symmetry, vol. 11, no. 12, p. 1493,
COVID-19 diagnosis using ensemble deep transfer learning models from Dec. 2019.
chest radiographic images,’’ J. Ambient Intell. Hum. Comput., pp. 1–13, [42] X. Li, T. Pang, B. Xiong, W. Liu, P. Liang, and T. Wang, ‘‘Convolutional
2020, doi: 10.1007/s12652-020-02669-6. neural networks based transfer learning for diabetic retinopathy fundus
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely image classification,’’ in Proc. 10th Int. Congr. Image Signal Process.,
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis. Biomed. Eng. Informat. (CISP-BMEI), Oct. 2017, pp. 1–11.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. [43] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Deep convolutional
[21] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking neural networks for unconstrained ear recognition,’’ IEEE Access, vol. 8,
the inception architecture for computer vision,’’ in Proc. IEEE Conf. pp. 170295–170310, 2020.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826. [44] N. M. W. A. Mustafa, H. Yazid, M. Jaafar, M. Zainal, and
[22] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo- A. S. Abdul-Nasir, ‘‘A review of image quality assessment (IQA):
lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), SNR, GCF, AD, NAE, PSNR, ME,’’ J. Adv. Res. Comput. Appl., vol. 7,
Jul. 2017, pp. 1251–1258. no. 1, pp. 1–7, 2017.
[23] S. Somasundaram and P. Alli, ‘‘A machine learning ensemble classifier [45] T. Nazir, A. Irtaza, Z. Shabbir, A. Javed, U. Akram, and M. T. Mahmood,
for early prediction of diabetic retinopathy,’’ J. Med. Syst., vol. 41, no. 12, ‘‘Diabetic retinopathy detection through novel tetragonal local octa pat-
pp. 1–12, Dec. 2017. terns and extreme learning machines,’’ Artif. Intell. Med., vol. 99,
[24] E. Saleh, J. Błaszczyński, A. Moreno, A. Valls, P. Romero-Aroca, Aug. 2019, Art. no. 101695.
[46] R. Trevethan, ‘‘Sensitivity, specificity, and predictive values: Foundations,
S. de la Riva-Fernández, and R. Słowiński, ‘‘Learning ensemble classifiers
pliabilities, and pitfalls in research and practice,’’ Frontiers Public Health,
for diabetic retinopathy assessment,’’ Artif. Intell. Med., vol. 85, pp. 50–63,
vol. 5, p. 307, Nov. 2017.
Apr. 2018.
[47] J. C. Javitt, J. K. Canner, R. G. Frank, D. M. Steinwachs, and A. Sommer,
[25] K. Oh, H. M. Kang, D. Leem, H. Lee, K. Y. Seo, and S. Yoon, ‘‘Early ‘‘Detecting and treating retinopathy in patients with type i diabetes melli-
detection of diabetic retinopathy based on deep learning and ultra-wide- tus: A health policy model,’’ Ophthalmology, vol. 97, no. 4, pp. 483–495,
field fundus images,’’ Sci. Rep., vol. 11, no. 1, pp. 1–9, Dec. 2021. 1990.
[26] H. Khalid, R. Schwartz, L. Nicholson, J. Huemer, M. H. El-Bradey, [48] M. M. Fraz, W. Jahangir, S. Zahid, M. M. Hamayun, and S. A. Barman,
D. A. Sim, P. J. Patel, K. Balaskas, R. D. Hamilton, P. A. Keane, and ‘‘Multiscale segmentation of exudates in retinal images using contex-
R. Rajendram, ‘‘Widefield optical coherence tomography angiography for tual cues and ensemble classification,’’ Biomed. Signal Process. Control,
early detection and objective evaluation of proliferative diabetic retinopa- vol. 35, pp. 50–62, May 2017.
thy,’’ Brit. J. Ophthalmol., vol. 105, no. 1, pp. 118–123, Jan. 2021. [49] D. J. Hemanth, O. Deperlioglu, and U. Kose, ‘‘An enhanced diabetic
[27] T. Nazir, A. Irtaza, J. Rashid, M. Nawaz, and T. Mehmood, ‘‘Diabetic retinopathy detection and classification approach using deep convolutional
retinopathy lesions detection using faster-RCNN from retinal images,’’ neural network,’’ Neural Comput. Appl., vol. 32, no. 3, pp. 707–721,
in Proc. 1st Int. Conf. Smart Syst. Emerg. Technol. (SMARTTECH), Feb. 2020.
Nov. 2020, pp. 38–42. [50] T. R. Gadekallu, N. Khare, S. Bhattacharya, S. Singh, P. K. R. Maddikunta,
[28] K. A. Goatman, A. D. Whitwam, A. Manivannan, J. A. Olson, and and G. Srivastava, ‘‘Deep neural networks to predict diabetic retinopa-
P. F. Sharp, ‘‘Colour normalisation of retinal images,’’ in Proc. Med. Image thy,’’ J. Ambient Intell. Humanized Comput., pp. 1–14, Apr. 2020, doi:
Understand. Anal., 2003, pp. 49–52. 10.1007/s12652-020-01963-7.

VOLUME 9, 2021 108291


H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models

[51] Diabetic Retinopathy Detection. Accessed: Jun. 15, 2021. [Online]. Avail- MANJIT KAUR (Member, IEEE) received the
able: https://www.kaggle.com/c/diabetic-retinopathy-detection Master of Engineering degree in information
[52] A. Rakhlin, ‘‘Diabetic retinopathy detection through integration of deep technology from Panjab University, Chandigarh,
learning classification framework,’’ BioRxiv, Jan. 2018, Art. no. 225508, Punjab, in 2011, and the Ph.D. degree in image
doi: 10.1101/225508. processing from the Thapar Institute of Engi-
[53] C. Lam, D. Yi, M. Guo, and T. Lindsey, ‘‘Automated detection of diabetic
neering and Technology, Patiala, Punjab, India,
retinopathy using deep learning,’’ AMIA Summits Transl. Sci., vol. 2018,
no. 1, p. 147, 2018.
in 2019. She is currently working as an Assis-
[54] M. Chetoui and M. A. Akhloufi, ‘‘Explainable end-to-end deep learning tant Professor with the School of Engineer-
for diabetic retinopathy detection across multiple datasets,’’ J. Med. Imag., ing and Applied Sciences, Bennett University,
vol. 7, no. 4, Aug. 2020, Art. no. 044503. Greater Noida, India. She has published more than
[55] T. Kauppi, V. Kalesnykiene, J.-K. Kamarainen, L. Lensu, I. Sorri, 27 SCI/SCIE indexed articles so far. Her research interests include wireless
A. Raninen, R. Voutilainen, H. Uusitalo, H. Kälviäinen, and J. Pietilä, ‘‘The sensor networks, digital image processing, and meta-heuristic techniques.
DIARETDB1 diabetic retinopathy database and evaluation protocol,’’ in
Proc. Brit. Mach. Vis. Conf., vol. 1, 2007, pp. 1–10.
[56] E. Decencièr, X. Zhang, G. Cazuguel, B. Lay, B. Cochener, C. Trone,
P. Gain, R. Ordonez, P. Massin, A. Erginay, and B. Charton, ‘‘Feedback HAMMAM ALSHAZLY received the B.Sc. degree
on a publicly distributed image database: The Messidor database,’’ Image in computer science from South Valley University,
Anal. Stereology, vol. 33, no. 3, pp. 231–234, 2014. Egypt, in 2006, the M.Sc. degree in computer
[57] T. Li, Y. Gao, K. Wang, S. Guo, H. Liu, and H. Kang, ‘‘Diagnostic assess- science from the University of Mumbai, India,
ment of deep learning algorithms for diabetic retinopathy screening,’’ Inf. through a scholarship from the Indian Council
Sci., vol. 501, pp. 511–522, Oct. 2019. for Cultural Relations (ICCR), in 2014, and the
[58] P. Porwal, S. Pachade, R. Kamble, M. Kokare, G. Deshmukh, Ph.D. degree in computer science from South Val-
V. Sahasrabuddhe, and F. Meriaudeau, ‘‘Indian diabetic retinopathy
ley University, in 2018. From February 2019 to
image dataset (IDRiD): A database for diabetic retinopathy screening
January 2021, he was a Postdoctoral Researcher
research,’’ Data, vol. 3, no. 3, p. 25, 2018.
[59] A. D. Hoover, V. Kouznetsova, and M. Goldbaum, ‘‘Locating blood ves- with the Institute for Neuro- and Bioinformatics,
sels in retinal images by piecewise threshold probing of a matched filter University of Lübeck, Germany. He is currently working as an Assistant
response,’’ IEEE Trans. Med. Imag., vol. 19, no. 3, pp. 203–210, Mar. 2000. Professor with the Department of Computer Science, Faculty of Comput-
[60] E. Decencière, G. Cazuguel, X. Zhang, G. Thibault, J. C. Klein, ers and Information, South Valley University. He has published articles in
F. Meyer, B. Marcotegui, G. Quellec, M. Lamard, R. Danno, and D. Elie, conferences and peer-reviewed journals, and works as a reviewer for several
‘‘TeleOphta: Machine learning and image processing methods for teleoph- journals. His research interests include deep learning, biometrics, computer
thalmology,’’ IRBM, vol. 34, no. 2, pp. 196–203, Apr. 2013. vision, machine learning, and artificial intelligence. He was awarded the
Partnership and Ownership (ParOwn) Initiative, in 2010, for a period of 6
months at Monash University, Australia. During his Ph.D. degree, he was
awarded a Fulbright Scholarship for a period of 10 months to complete part
of his research work at the University of Kansas, USA.

HARSHIT KAUSHIK is currently pursuing the


Bachelor of Technology degree in computers and ATEF ZAGUIA is currently with the Depart-
communication engineering from Manipal Univer- ment of Computer Science, College of Comput-
sity Jaipur. His research interests include pattern ers and Information Technology, Taif University,
recognition, medical image processing, machine Taif, Saudi Arabia. He has published more than
learning, computer vision, data mining, and smart 37 SCI/SCIE indexed articles so far. His research
systems. interests include wireless sensor networks, digital
image processing, and meta-heuristic techniques.

HABIB HAMAM (Senior Member, IEEE)


DILBAG SINGH (Member, IEEE) received the received the B.Eng. and M.Sc. degrees in informa-
M.Tech. degree from the Computer Science and tion processing from the Technical University of
Engineering Department, Guru Nanak Dev Uni- Munich, Germany, in 1988 and 1992, respectively,
versity, India, in 2012, and the Ph.D. degree in the Ph.D. degree in physics and applications in
computer science and engineering from Thapar telecommunications from Université de Rennes I
University, India, in 2019. He is currently working conjointly with France Telecom Graduate School,
as an Assistant Professor with Bennett University, France, in 1995, and the Postdoctoral Diploma
Greater Noida, India. He is the author or coau- degree in accreditation to supervise research in
thor of more than 32 SCI/SCI indexed journals, signal processing and telecommunications from
including refereed IEEE/ACM/Springer/Elsevier Université de Rennes I, in 2004. From 2006 to 2016, he was a Canada
journals. He has also obtained three patents, three books, and two book chap- Research Chair holder in ‘‘Optics in Information and Communication
ters, respectively. His research interests include computer vision, medical Technologies.’’ He is currently a Full Professor with the Department of
image processing, machine learning, deep learning, information security, and Electrical Engineering, Université de Moncton. His research interests include
meta-heuristic techniques. He is also acting as a Lead Guest Editor of Math- optical telecommunications, wireless communications, diffraction, fiber
ematical Problems in Engineering (Hindawi) (SCI and Scopus Indexed), components, RFID, information processing, data protection, COVID-19, and
an Executive Guest Editor of Current Medical Imaging (Bentham Science) deep learning. He is an OSA Senior Member. He is a Registered Professional
(SCIE and Scopus Indexed), and an Associate Editor of Open Transportation Engineer in New-Brunswick. He is the Editor-in-Chief of CIT-Review and an
Journal (Scopus). He is a reviewer of more than 51 well-reputed journals, Associate Editor of the IEEE Canadian Review.
such as IEEE, Elsevier, Springer, SPIE, and Taylor & Francis.

108292 VOLUME 9, 2021

You might also like