IEEE Access 2021 - Diabetic Retinopathy Diagnosis From Fundus Images Using Stacked Generalization of Deep Models
IEEE Access 2021 - Diabetic Retinopathy Diagnosis From Fundus Images Using Stacked Generalization of Deep Models
ABSTRACT Diabetic retinopathy (DR) is a diabetes complication that affects the eye and can cause damage
from mild vision problems to complete blindness. It has been observed that the eye fundus images show
various kinds of color aberrations and irrelevant illuminations, which degrade the diagnostic analysis and
may hinder the results. In this research, we present a methodology to eliminate these unnecessary reflectance
properties of the images using a novel image processing schema and a stacked deep learning technique for
the diagnosis. For the luminosity normalization of the image, the gray world color constancy algorithm is
implemented which does image desaturation and improves the overall image quality. The effectiveness of the
proposed image enhancement technique is evaluated based on the peak signal to noise ratio (PSNR) and mean
squared error (MSE) of the normalized image. To develop a deep learning based computer-aided diagnostic
system, we present a novel methodology of stacked generalization of convolution neural networks (CNN).
Three custom CNN model weights are fed on the top of a single meta-learner classifier, which combines
the most optimum weights of the three sub-neural networks to obtain superior metrics of evaluation and
robust prediction results. The proposed stacked model reports an overall test accuracy of 97.92% (binary
classification) and 87.45% (multi-class classification). Extensive experimental results in terms of accuracy,
F-measure, sensitivity, specificity, recall and precision reveal that the proposed methodology of illumination
normalization greatly facilitated the deep learning model and yields better results than various state-of-art
techniques.
INDEX TERMS Convolutional neural networks, diabetic retinopathy, early diagnosis, fundus images, gray
world algorithm, ensemble learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
108276 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models
and hemorrhages [2]. However, it is usually observed that • Scaling factor is an important step in color correction
during image the acquisition process, the fundus images show technique such as gray world in our case, therefore,
various kinds of irrelevant illuminations, non-uniform light the color channel with minimum mean is considered as
distribution, blurred or darkened candidate regions, which a reference to calculate the gray world illuminant.
subsequently affect the diagnostic process and result in biased • To automate the diagnostic process and to make predic-
predictions [5]. To detect DR, it is essential to obtain results tions using the desaturated images, a stacked general-
with high precision irrespective of any bias to avoid a wrong ization of three custom CNNs is developed, which is
judgment that may lead to a serious problem or in some fed into a single meta-learner to extract the most opti-
cases, even permanent blindness. During the fundoscopic mum weights from the sub-networks to achieve better
test, if the obtained image is highly saturated, it becomes performance. This method differs from a usual voting
difficult to carry out a proper visual assessment test even by classifier because the evaluation metrics (e.g. accuracy
a trained ophthalmologist or a clinician and hence, the pres- and mean squared error) are not averaged or voted, but
ence of non-uniform illuminations can impede correct pre- rather the meta-learner model gets multiple prediction
dictions [6]. Therefore, luminosity normalization becomes a probabilities as input, which are combined to generate
significant pre-processing aspect for a diverse set of retinal better features and thus achieve accurate results.
images. Some of the previous work have considered nor- • We consider the Exponential Linear Unit (ELU) activa-
malizing the luminosity of the retinal images using various tion function for each sub-model due to its fast conver-
statistical, mathematical, and particularly HSV color space gence and more accurate results.
based models to desaturate the image [5]–[7]. • To monitor the generalization of error and avoid con-
The previous diagnostic studies of DR can be classified ditions such as overfitting and bias-variance trade-off
into two types: 1) Automatic detection of the disease (binary), during training, techniques such as exponential learning
and b) Classification of different stages of the disease (multi- rate decay and early-stopping are also applied to give an
class). In our study, our focus is to automate the diagnos- overall regularization effect on the proposed model.
tic process and to combine the luminosity normalization • Extensive experiments and comparisons between the
pre-processing pipeline with an advanced artificial intelli- proposed model with the existing works in the diagno-
gence technique. Till now various image processing tech- sis of DR have been drawn to validate our model and
niques have been presented to detect DR by considering findings.
the definitive candidate regions such as cotton wool spots,
exudate, hemorrhages, and blood vessels, as reported in [8], The rest of the paper is organized as follows. Section II
[9]. These methods rely on manual feature extraction but, discusses the literature review of DR diagnosis. In Section III
since most of the retinal images depict non-uniform fea- we state our motivations. Image normalization and the
stacked generalization of deep CNNs model are discussed
tures, thus generalizing feature set for all images may give
in Section IV. Section V presents the experimental setup.
inappropriate diagnostic results when a large database is
considered. Section VI presents the quantitative analysis. The discussion
Various architectures of the multi-layered perceptron, con- is presented in Section VII. Finally, our conclusions and
volutional neural networks (CNN), and machine learning possible future work are presented in Section VIII.
algorithms have also been implemented for automatic dis-
ease detection [10]–[12]. However, none of these studies II. LITERATURE REVIEW
has addressed the problem of non-uniform reflectance and Different techniques have been presented by researchers to
over-saturation of a fundus image surface for developing deal with retinal image normalization, balancing luminos-
an unbiased DR diagnostic tool. Therefore, to alleviate this ity distribution, contrast normalization, and computer-aided
issue we have presented a novel color constancy technique diagnostic systems, which have proved to be of great impor-
to reduce irrelevant reflectance in fundus images and for tance in the field of retinal imaging. The literature survey of
the feature extraction of the pre-processed images, a stacked this study covers two major categories of DR works to ensure
generalization of deep CNNs is developed, which can also be that an overall view is given for better understanding. The
considered as a superior cross-validation technique for neural works of each category were evaluated based on different
networks models. performance metrics and design attributes based on the data
The main contributions of this paper can be summarized as pattern and proposed experimental design.
follows: The first category comprises of works, which solely
• We solve the non-ideal illumination and color degrada- focused on an image processing based methodology for DR
tion problems by using the gray world color constancy detection. Zhou et al. [5] presented a luminosity adjustment
schema to desaturate the retinal images. This will enable technique in which a luminance matrix is obtained by the
ophthalmologists to use color of the images as a reliable gamma correction of value channel in HSV color space
cue for recognizing the DR signs and avoid the various to improve the quality of individual RGB channels. For
distortions related to light distribution and color, which improving the contrast of images, contrast limited adaptive
may hinder the diagnostic results. histogram equalization (CLAHE) technique was used that
involves a kernel based iterative process to normalize the A stacking technique of machine learning algorithms was
histogram of image pixels to avoid congestion of the pixels in presented in [1] to prepare a DR screening tool. Lesions and
a particular range, thus improving the image quality. In [7], microaneurysms are extracted and then classified using an
the authors proposed the histogram equalization-based image ensemble classifier. The model’s performance was evaluated
processing technique for fundus image enhancement and using accuracy, sensitivity, and specificity, and achieved 90%,
developed a CNN model for classification. They used a small 91%, 90%, respectively. In 2017, another improved ensemble
dataset of 400 images and achieved a sensitivity of 96.67% technique was presented by Somasundaram and Alli [23].
and specificity of 93.33%. Bhaskar and Kumar [13] proposed Machine learning bagging ensemble classifier (ML-BEC)
a technique to normalize the contrast and luminosity of the was considered for the prediction of DR. They implemented
fundus images by assuming that all the neighborhood pixels the t-distributed Stochastic neighbor embedding (t-SNE)
are independent and identical to each other. In [14], a retinal algorithm to separate the images into similar and dissimilar
enhancement technique based on Speeded up Adaptive Con- pairs. Saleh et al. [24] presented an ensemble technique for
trast Enhancement (SUACE) algorithm integrated with the DR risk assessment, which justifies the presence or absence
Tyler-coy algorithm was proposed. The SAUCE algorithm of the disease. They prepared a dominance-based rough set
uses a gray-scale image obtained by Principal Component balanced rule ensemble (DRSA-BRE) and compared their
Analysis (PCS), which was then fed into the Tyler-coy algo- works with the random forest classifier. The best sensitivity
rithm to remove the discontinuities of blood-vessel for better score achieved was near 80%. Similarly, various DR detec-
prediction results. In 2015, [15] presented an illumination tion methods have been presented in this field [25]–[27].
correction technique using a low-pass filter and Gaussian However, none of these solve the problem of non-uniform
filter. By using the low pass filter, the background of the illuminations, which can play a major role in detection of
image is normalized and then superimposed with the results proliferative and Non-proliferative DR.
of the Gaussian filter, thus removing any sort of foreground Table 1 summarizes the most relevant work from two
noise that existed earlier. Singh et al. [16] used the usual major categories for DR detection along with the used per-
histogram equalization technique for low-radiance images to formance evaluation metrics and their limitations. From the
clip away the pixel-values based on the threshold, which was presented research literature in Table 1, we can infer that most
calculated by taking the average median value of the image of the techniques focused on retinal contrast enhancement
to enhance the normalization results. They used structural and machine/deep learning models for classification without
similarity index measure and the Euclidean distance to val- addressing the non-uniform reflectance of fundus data during
idate their prediction results. Although numerous techniques image acquisition. Therefore, to alleviate these issues we
proposed methodologies for image contrast enhancement, but developed a pipeline for image illumination normalization
none of them focused on image desaturation for developing a and a novel feature extraction model for early DR detection.
DR system.
The second category depicts the various deep/ machine III. MOTIVATION
learning methodologies, which have been presented for early Since most of the proposed approaches focused mostly on
DR detection. Most of the previous work was focused on the machine learning, deep learning, and image processing tech-
development of traditional machine learning and ensemble niques to extract candidate features such as lesions, hem-
deep learning techniques. Recently, Zhou et al. [10] proposed orrhages, exudates and cotton-wool spots but they ignored
a multiple instance learning technique, which is a weakly solving the variance in scene illumination and light degrada-
supervised technique to detect DR in fundus images. Initial tion, which affects the performance and may result in biased
image processing steps such as resizing, Gaussian smoothing prediction results. In our proposed method, we have used
were implemented before feature extraction. Their detection a dataset that has multi-sourced images. Therefore, various
model was divided into two parts. First, they created a bag types of noise and distortions are encountered in the images.
of image patches for detecting lesions. Second, a pre-trained To overcome such issues, we aim to explore the research area
Alexnet [17] model was utilized for automatic feature extrac- of combining artificial intelligence and image processing to
tion. The model achieved an AUC score of 92.5%. In [18], develop a complete illumination proof diagnostic tool for DR.
an ensemble approach using deep transfer learning models to The methodology that has been applied is discussed in the
detect DR was proposed. The models including ResNet [19], following sections.
Densenet [20], Inception [21], and Xception [22] for extract-
ing features and performed extensive hyperparameter tuning IV. METHODOLOGY
to achieve better results. They gave per-class metrics where Figure 1 demonstrates the different stages of the proposed
the highest AUC of the imbalanced class was 97%, but they methodology in the form of a model pipeline. After the data
did not consider any image pre-processing technique to nor- acquisition, the image luminosity is normalized by the color
malize the images. The authors used a dataset of images that constancy based gray world algorithm. The image processing
contains spatial noise and distortions such as blurring and pipeline is shown in the figure in which the illuminant K 0
darkened corners, which require a more advanced technique from the images is used to normalize the image. The data is
to get reliable results. split into training and test sets for the stacking convolutional
model. Three different sub-models of CNNs are fed into common color constancy algorithms: Gray World, Shades
a single meta-learner classifier for feature extraction. The of Gray, General Gray World, and Max-RGB. In this paper,
fusion strategy of the stacking model is based on the weighted we have consider the Gray World algorithm.
majority from each of the sub-model for generating better Gray world algorithm assumes that the average surface
features for classification. Data augmentation technique is reflectance of the image is achromatic and therefore varia-
also applied to improve the diversity of images in the dataset. tions could be done by including the average pixel values and
Finally, the meta-learner classifier produces the diagnostic scaling them by a scaling factor, which is computationally
result as healthy (No DR) or unhealthy (DR). inexpensive to calculate. Gray world is also a statistical algo-
rithm, which uses less computation power. According to [29]
most of the existing algorithms are based on assumptions. For
A. LUMINOSITY NORMALIZATION USING GRAY WORLD instance, the Max-RGB color constancy algorithm assumes
ALGORITHM the presence of white patch in the image to calculate the illu-
We use a color constancy algorithm for image normalization minant, whereas the gray world algorithm uses the average
as a pre-processing step. In our experiments, we dealt with reflectance, and encourages a data-driven approach in color
a dataset that contains images from multiple sources having constancy [30].
color variations, varying illuminations, and a non-uniform As explained above, it is assumed in the gray world algo-
light distribution, which resulted in a large amount of hetero- rithm that the (R,G,B) color channels have linear values,
geneity among images. In case of retinal images, heterogene- which means that the average reflection in standard light is
ity among images can cause a major difference in appearance, gray. But it is not what we see in real life. It is based on the
for example, some part of the image gets highlighted near the hypothesis that the average of each channel (R,G,B) in an
center, but boundaries get blurred, and these non-uniformities image I is always equal, i.e., gray [31]. However, the average
can seriously affect the diagnostic results. Therefore, it is is not constant and is either greater or less than the gray
necessary to propose a color calibration methodology for value. This deviation from the original gray value gives us
these images [28]. We have implemented the color constancy the illumination change. The illuminant of the image is then
algorithm to remove the unnecessary surface reflectance and estimated in the RGB mode, which is then used to normalize
to make the color of the image invariant to such illuminations each channel of the image to transform the image under a
and other color-related aberrations. Generally, there are four canonical light resource.
FIGURE 1. A diagrammatic flow of the proposed methodology and the training process.
The stepwise algorithm of this popular luminosity normal- TABLE 2. Data augmentation parameters considered for retinal image
generation.
ization schema is explained below.
STEP 1 (Pixel Level Normalization): Initially, to get the
color of the light source, pixel-level normalization is carried
out by calculating the average pixel value of each sensor
channel. Consider an image as:
Rwh , Gwh , and Bwh represent the sensor channels, whereas w X̄min
βb = (5)
and h depict the image width and height, respectively. The B̄wh
mean pixel value can be calculated as: Thus, the resultant illuminant belongs to each sensor chan-
nel of the image. For example, if the image is I , then the
1X
µj = Ij (2) component of illuminant is ec , where c ∈ [R, G, B].
j STEP 3 (Scaling Individual Image Channel): The nor-
j
malized image is obtained by scaling each individual
Here, j = R, G, B. color channel by multiplying it with the scaling factor
STEP 2 (Gray World Illuminant Calculation): In the gray as:
world color correction, one of the color channels is selected
as a reference to calculate the illuminant but the intensity of R0 = Rwh × βr (6)
the resultant normalized image degrades and may hinder the G0 = Gwh × βg (7)
diagnostic results. Therefore, in the proposed method, a com-
B0 = Bwh × βb (8)
pressed color channel technique of the Gray world algorithm
is used in which the color channel with minimum average R0 , G0 , and B0 represent the normalized channels of the resul-
magnitude is selected as proposed in [32]. The scaling factors tant color normalised image.
obtained based on this magnitude can be expressed as: Figure 2 shows the normalised images obtained after
applying the gray world color constancy algorithm.
X̄min
βr = (3)
R̄wh B. ARTIFICIAL DATA GENERATION
X̄min To add more diversity to the dataset, data augmentation tech-
βg = (4)
Ḡwh nique is used in which artificial data is generated from the
FIGURE 2. Results of image normalization using colour constancy algorithm. The first row shows three original images, whereas the second row shows
their corresponding colour normalized images after applying the gray world algorithm. The yellow arrow points out that features visible in original
image such as blood vessels, macula, haemorrhages are not affected after luminosity normalization.
pre-existing images. The data is generated for each mini batch array of numbers, where these numbers are the weights that
in an iterative process during model training. The applied are updated continuously. The area over which it slides is
augmentation steps include horizontal flip, width shift, height called the receptive field. In our model, we apply 3 × 3
shift, fill mode, and zoom range. Table 2 illustrates the data filters with a depth of 3 since we have colored images of size
augmentation parameters. Figure 3 shows nine generated 96 × 96 × 3. The filter convolution over an image results in
images in an iterative process during model training with a an element-wise multiplication with pixel values represented
rotation angle set to 74 degrees. as:
XX
L[m, n] = (f × h)[m, n] = h[j, k]×f [m − j, n − k]
C. CONVOLUTIONAL NEURAL NETWORK j k
The CNN models are based on the principle of layer-wise (9)
abstraction for feature learning. The complexity as well as
Here, the input image is f , the kernel is denoted by h, and
the number of features increase with the model depth. CNNs
the indices of rows and columns are represented by m and n,
follow an analogous feed-forward architecture just like an
respectively. After completion of the first convolutional layer,
artificial neural network, but they are much better in the
a feature map is generated which is the input for the second
generalization for computer vision-related problems. They
layer. We consider max pool as the pooling layer, which down
are commonly known as ConvNets and usually consist of an
sample the resulting feature maps and increases the receptive
input layer, hidden layers, and an output layers. The hidden
field on the filters [33]. To induce non-linearity to the feature
layers have some activation functions, fully connected layers,
maps, activation functions are applied. In our case, we utilize
and pooling layers. The top layers of a CNN model tend
the Exponential Linear Unit (ELU) activation function, which
to learn low-level features such as edges, color, and shapes,
is defined as:
whereas the deeper layer focuses on learning high-level (
features. x, if x > 0
Typically, CNN models are a stack of alternating con- ELU (x) = (10)
α(ex − 1), if x ≤ 0
volutions with various sizes of filters, pooling, and fully
connected layers. The difference between a fully connected Although it is computationally expensive but it converges
layer and a convolution layer is that the convolution layer faster as compared to other activation functions.
is partially connected and receives inputs from a sub-area According to [34] the presence of the extra parameter α
of the previous layer, whereas in a fully connected layer controls the saturation point for negative values and thereby
all the previous neurons are related to the next neurons for it is computationally inexpensive as compared to other func-
feature transmission [33]. A kernel or commonly known as tions. This is what differentiates it from the commonly used
the filter is a sliding window over the image, which is an Rectified Linear Unit (Relu) activation function. In a neural
network, forward and backward propagation are one of the individual biases of the sub-models is called the stacked
most important factors in determining the convergence per- generalization [35]. It is different from the usual model aver-
formance of a CNN. The forward propagation has two steps. aging methodology in terms of the fusion strategy because
First, the calculation of Z , which is determined as: the classification results are not averaged, but the final output
is decided by the weighted majority of the sub-models. In the
Z = W [l] × A[l−1] × B[l] (11) case of deep learning, multiple CNNs with different archi-
Here W is a tensor which has a filter and B is the bias term. tectures are merged before giving the final output. Different
The second step, applying the activation function as follow: ensemble techniques have been applied in similar work [1],
[24], however these were machine learning algorithms-based
A[l] = K [l] (Z [l] ) (12) techniques, such as ensemble models of K-nearest neighbor,
Naïve Bayes and Decision tree. In contrary, we have used
Here, k denotes the activation function. This process is
CNNs for stacked generalization.
followed by a backward propagation in which partial deriva-
In order to reduce the bias in machine learning, usually,
tives are calculated to update W and B for improving gradi-
the crude cross-validation techniques such as 10-fold cross-
ent descent convergence, and reducing the error. The partial
validation and Leave-one-out cross-validation are applied.
derivative can be calculated as:
However, if those techniques are applied in CNNs, the com-
∂L plexity and computational time of the model increase tremen-
DX [l] = (13)
∂A dously unlike most of the machine learning algorithms.
Here, X denotes A[L], W [L], B[L], which are activation, A CNN deals with millions of parameters during the forward
weight, and bias, respectively. The Weights are updated as: and backward propagation [24]. Thus, different fusion-based
∂l ensemble methods have been presented by researchers every
W = Wi − η (14) year to combine multiple predictions in the most optimized
∂w
way. There are various fusion strategies for ensemble models
Here, η is the learning rate, Wi is initial weight. For the including model averaging, where we combine the predic-
CNN model, the images are resized into 96 × 96 × 3 before tions from several independently trained models as adopted
feeding into the neural network. in [36]. In our case, we use the weighted majority based
fusion technique for the stacked model.
D. STACKED GENERALIZATION OF CNNs Consider an ensemble of M independent classifiers
The methodology of stacking multiple sub-models into a D1 ,. . . ,DM , with individual accuracies p1 , p2 . . . , pM . Each
single meta-learner classifier to combine the prediction prob- classifier Di produces c-dimensional vector [di,1 , . . . , di,c ]T
abilities to reduce the generalization error by deducing ∈ {0, 1}c , i = 1, . . . M , where di,j = 1 if Di labels x in ωj ,
108282 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models
and 0 otherwise. The majority vote will result in an ensemble TABLE 3. Label wise depiction of dataset division used for training and
testing of the stacked CNN and other transfer learning models.
decision for class ωk if
M
X M
X
c
di,k = maxj=1 di,j (15)
i=1 i=1
FIGURE 4. An illustration of the stacked CNNs concatenated on top of the meta-learner classifier.
TABLE 4. Layerwise configuration of a single CNN architecture which is fed into the meta-learner classifier.
further analysis. In medical imaging, degradation of image significant amount of information that makes the feature
features is a major issue while implementing normalization extraction imprecise and difficult as explained in [45]. There-
techniques. The yellow arrows in Figure 2 clearly show the fore, a superior feature extraction technique known as stacked
presence of features such as blood vessels, hemorrhages, generalization of CNNs has been implemented. Since it is
retinal macula, which play a major role in decision-making a binary classification task, high values of accuracy and
for DR diagnosis. other evaluation metrics are expected. To prepare a robust
To support our arguments and provide more concrete model, hyperparameter tuning is performed, which shows the
proof for the proposed luminosity normalization technique potential fluctuation and improvement in accuracy and loss.
we have calculated the Peak signal-to-noise ratio (PSNR) The experimental results reveal that the proposed stacked
and mean squared error (MSE) of the transformed image. CNN model achieves an accuracy of 97.92% on the training
PSNR can be defined as the ratio between the maximum set with a training loss of 0.066. On the test set the model
possible power of a signal and the power of corrupting noise achieves an accuracy of 97.77% with a test loss of 0.078.
that affects the quality of its representation [44]. PSNR is Table 5 shows the evaluation metrics obtained after using var-
measured in Decibels (dB) and in most cases, a higher value ious activation functions with and without data augmentation.
of PSNR indicates that the enhanced or reconstructed image It can be clearly observed that the ELU activation function
is of superior quality. On the other hand, MSE tells us works best with data augmentation and gave better results in
about the difference in the images by computing the aver- terms of accuracy, sensitivity, and specificity.
age of the squared errors between two images. The lesser Table 6 depicts the performance metrics such as train
the value of MSE, the better the image enhancement tech- loss, test loss, train accuracy, test accuracy, and area under
nique. Mathematically, PSNR and MSE can be defined as curve (AUC) values in comparison with the other deep trans-
follows: fer learning models. Experimental results reveal that the
(2n − 1)2 proposed model achieves an AUC value of 0.9979. AUC is
PSNR = 10 log10 ( ) (17) an important performance measure that proves the model’s
MSE
M N reliability over other solutions.
1 XX Table 7 shows a report containing precision, recall, and
MSE = (k(i, j) − l(i, j))2 (18)
MN F-measure scores of all the competent models. These are
i=1 j=1
important metrics in the evaluation of a computer-aided
Here, M and N define the number of rows and columns diagnostic system. Precision score depicts the exactness and
in the image, respectively. k(i, j) represents the original refer- tells how often the predicted value is correct, whereas the
enced image and l(i, j) represents the luminosity normalised F1-measure, which is the harmonic mean of recall and pre-
image. n stands for the max value of a pixel in the image. cision, reveals the test accuracy. From Table 7, we observe
Figure 7 shows the difference between statistical values of that the proposed stacked CNN model outperforms all other
PSNR and MSE between normal gray image and gray world competitive models.
normalised image. Figure 8 shows the confusion matrix for each model,
which summarizes the predicted results and the type of errors
B. STACKED CNN MODEL ANALYSIS compactly over the test set. Although VGG-16 has a lesser
The images of our model are available in the .JPEG for- number of false negatives (FN) as compared to the stacked
mat. Since it is a lossy compression, the images lose a CNN model but the model obtains greater number of false
108286 VOLUME 9, 2021
H. Kaushik et al.: DR Diagnosis From Fundus Images Using Stacked Generalization of Deep Models
FIGURE 7. Statistical comparative analysis between normal gray image and gray world normalized image based on
PSNR and MSE values.
TABLE 5. Performance of different activation functions in the proposed model with data augmentation.
TABLE 7. Comparative analysis between the proposed model and other with a real-life problem in the medical domain, reducing false
deep transfer learning models on similar dataset.
negatives as well as achieving considerably higher values
for true positives are important. The medical domain is a
field of precision, it is important to consider such metrics of
evaluation that directly deal with the correct and incorrect
predictions. So, we have considered sensitivity and speci-
ficity, which are discussed in detail below.
positives (FP) and lesser number of true negatives (TN).
However, the proposed stacking ensemble model has a greater C. SENSITIVITY AND SPECIFICITY ANALYSIS
number of TN and zero FP, which reveals its accuracy for Sensitivity and specificity play a crucial role in the medical
both healthy (No DR) and unhealthy (Having DR) images. domain. Higher values of sensitivity and specificity prove
From Figure 8 (d) we observe that our proposed model has the reliability of a diagnostic model. Sensitivity is the abil-
got only 11 FN, which means that only 11 out of 495 patients ity of the model to successfully predict the actual posi-
are falsely predicted as not having DR. Since we are dealing tive value [46], which in our case, to correctly predict the
FIGURE 8. Confusion matrices of the proposed model and other compared models. (a) CNN, (b) VGG-16, (c) ResNet50, and (d) stacked CNN model.
TABLE 11. Performance evaluation of the proposed model on the publicly available fundus datasets.
REFERENCES
Figure 12 shows the ROC curve of the proposed model
[1] B. Antal and A. Hajdu, ‘‘An ensemble-based system for automatic screen-
for binary classification task where it obtains an AUC value ing of diabetic retinopathy,’’ Knowl.-Based Syst., vol. 60, pp. 20–27,
of 0.99. The results provided in Table 8 and Figure 12 Apr. 2014.
prove the potential of the proposed stacking deep learning [2] R. Gargeya and T. Leng, ‘‘Automated identification of diabetic retinopa-
thy using deep learning,’’ Ophthalmology, vol. 124, no. 7, pp. 962–969,
technique. Our model is able to outperform the conven- 2017.
tional methods for diagnosis. Finally, our stacked generaliza- [3] Early Treatment Diabetic Retinopathy Study Research Group, ‘‘Early
tion of CNNs achieve accuracy of 97.92% on the train set photocoagulation for diabetic retinopathy: ETDRS report number 9,’’ Oph-
thalmology, vol. 98, no. 5, pp. 766–785, 1991. [Online]. Available: https://
and 97.77% on the test set, a sensitivity of 96.86%, and a www.sciencedirect.com/science/article/abs/pii/S0161642013380117
specificity of 100% in binary classification. The Proposed [4] S. Gadkari, Q. Maskati, and B. Nayak, ‘‘Prevalence of diabetic retinopathy
model also outperforms ResNet50 in terms of accuracy and in India: The all India ophthalmological society diabetic retinopathy eye
screening study 2014,’’ Indian J. Ophthalmol., vol. 64, no. 1, p. 38, 2016.
F-measure. For multi-class classification, the model achieves [5] M. Zhou, K. Jin, S. Wang, J. Ye, and D. Qian, ‘‘Color retinal image
train and test accuracy of 96.45%, 96.30% respectively as enhancement based on luminosity and contrast adjustment,’’ IEEE Trans.
reported in Table 9. Biomed. Eng., vol. 65, no. 3, pp. 521–527, Mar. 2018.
[6] W. A. Mustafa, H. Yazid, and M. M. M. A. Kader, ‘‘Luminosity correction
using statistical features on retinal images,’’ J. Biomimetics, Biomater.
VIII. CONCLUSION AND FUTURE WORK Biomed. Eng., vol. 37, pp. 74–84, Jun. 2018.
We proposed to solve the problem of non-ideal illuminations [7] O. Deperlioglu and U. Kose, ‘‘Diagnosis of diabetic retinopathy by
using image processing and convolutional neural network,’’ in Proc. 2nd
in the retinal fundus images using the gray world algorithm Int. Symp. Multidisciplinary Stud. Innov. Technol. (ISMSIT), Oct. 2018,
and to develop an automated DR prediction system. A stack pp. 1–5.
[8] J. Wang, Y. Bai, and B. Xia, ‘‘Simultaneous diagnosis of severity and [29] D. Liu, ‘‘Comparison analysis of color constancy algorithms,’’ Dept.
features of diabetic retinopathy in fundus photography using deep learn- Eng. Sustain. Develop., University of Gävle, Gävle, Sweden, Tech.
ing,’’ IEEE J. Biomed. Health Informat., vol. 24, no. 12, pp. 3397–3407, Rep. S-801 76, 2013.
Dec. 2020. [30] V. Agarwal, B. R. Abidi, A. Koschan, and M. A. Abidi, ‘‘An overview
[9] V. M. G. S. Gupta, S. Gupta, and P. Sengar, ‘‘Extraction of blood veins from of color constancy algorithms,’’ J. Pattern Recognit. Res., vol. 1, no. 1,
the fundus image to detect diabetic retinopathy,’’ in Proc. IEEE 1st Int. pp. 42–54, 2006.
Conf. Power Electron., Intell. Control Energy Syst. (ICPEICES), Jul. 2016, [31] G. Chen and X. Zhang, ‘‘A method to improve robustness of the gray
pp. 1–3. world algorithm,’’ in Proc. 4th Int. Conf. Comput., Mechatronics, Control
[10] L. Zhou, Y. Zhao, J. Yang, Q. Yu, and X. Xu, ‘‘Deep multiple instance Electron. Eng., 2015, pp. 243–248.
learning for automatic detection of diabetic retinopathy in retinal images,’’ [32] N. M. Kwok, D. Wang, X. Jia, S. Y. Chen, G. Fang, and Q. P. Ha,
IET Image Process., vol. 12, no. 4, pp. 563–571, Apr. 2018. ‘‘Gray world based color correction and intensity preservation for image
enhancement,’’ in Proc. 4th Int. Congr. Image Signal Process., Oct. 2011,
[11] M. Masood, T. Nazir, M. Nawaz, A. Mehmood, J. Rashid, H.-Y. Kwon,
pp. 994–998.
T. Mahmood, and A. Hussain, ‘‘A novel deep learning method for recog-
[33] S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa, ‘‘Conceptual under-
nition and classification of brain tumors from MRI images,’’ Diagnostics,
standing of convolutional neural network—A deep learning approach,’’
vol. 11, no. 5, p. 744, Apr. 2021.
Procedia Comput. Sci., vol. 132, pp. 679–688, Jan. 2018.
[12] G. T. Zago, R. V. Andreão, B. Dorizzi, and E. O. T. Salles, ‘‘Diabetic [34] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, ‘‘Activa-
retinopathy detection using red lesion localization and convolutional neural tion functions: Comparison of trends in practice and research for
networks,’’ Comput. Biol. Med., vol. 116, Jan. 2020, Art. no. 103537. deep learning,’’ 2018, arXiv:1811.03378. [Online]. Available: http://arxiv.
[13] K. U. Bhaskar and E. P. Kumar, ‘‘Extraction of hard exudates using org/abs/1811.03378
functional link artificial neural networks,’’ in Proc. IEEE Int. Advance [35] D. H. Wolpert, ‘‘Stacked generalization,’’ Neural Netw., vol. 5, no. 2,
Comput. Conf. (IACC), Jun. 2015, pp. 420–424. pp. 241–259, 1992.
[14] A. M. R. R. Bandara and P. W. G. R. M. P. B. Giragama, ‘‘A retinal image [36] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Ensembles of deep
enhancement technique for blood vessel segmentation algorithm,’’ in Proc. learning models and transfer learning for ear recognition,’’ Sensors, vol. 19,
IEEE Int. Conf. Ind. Inf. Syst. (ICIIS), Dec. 2017, pp. 1–5. no. 19, p. 4139, Sep. 2019.
[15] W. A. Mustafa, H. Yazid, and S. B. Yaacob, ‘‘Illumination correction of [37] L. I. Kuncheva, Combining Pattern Classifiers: Methods Algorithms.
retinal images using superimpose low pass and Gaussian filtering,’’ in Hoboken, NJ, USA: Wiley, 2014.
Proc. 2nd Int. Conf. Biomed. Eng. (ICoBE), Mar. 2015, pp. 1–4. [38] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy,
[16] N. Singh, L. Kaur, and K. Singh, ‘‘Histogram equalization techniques S. Venugopalan, K. Widner, T. Madams, J. Cuadros, and R. Kim, ‘‘Devel-
for enhancement of low radiance retinal images for early detection of opment and validation of a deep learning algorithm for detection of diabetic
diabetic retinopathy,’’ Eng. Sci. Technol., Int. J., vol. 22, no. 3, pp. 736–745, retinopathy in retinal fundus photographs,’’ J. Amer. Med. Assoc., vol. 316,
Jun. 2019. no. 22, pp. 2402–2410, 2016.
[17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [39] D. Singh, V. Kumar, V. Yadav, and M. Kaur, ‘‘Deep neural network-
with deep convolutional neural networks,’’ in Proc. 25th Int. Conf. Neural based screening model for COVID-19-Infected patients using chest X-ray
Inf. Process. Syst., 2012, pp. 1097–1105. images,’’ Int. J. Pattern Recognit. Artif. Intell., vol. 35, no. 3, Mar. 2021,
Art. no. 2151004.
[18] S. Qummar, F. G. Khan, S. Shah, A. Khan, S. Shamshirband, Z. U. Rehman,
[40] D. Singh, V. Kumar, and M. Kaur, ‘‘Densely connected convolutional
I. A. Khan, and W. Jadoon, ‘‘A deep learning ensemble approach for
networks-based COVID-19 screening model,’’ Appl. Intell., vol. 51, no. 5,
diabetic retinopathy detection,’’ IEEE Access, vol. 7, pp. 150530–150539,
pp. 3044–3051, 2021, doi: 10.1007/s10489-020-02149-6.
2019. [41] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Handcrafted versus
[19] N. Gianchandani, A. Jaiswal, D. Singh, V. Kumar, and M. Kaur, ‘‘Rapid CNN features for ear recognition,’’ Symmetry, vol. 11, no. 12, p. 1493,
COVID-19 diagnosis using ensemble deep transfer learning models from Dec. 2019.
chest radiographic images,’’ J. Ambient Intell. Hum. Comput., pp. 1–13, [42] X. Li, T. Pang, B. Xiong, W. Liu, P. Liang, and T. Wang, ‘‘Convolutional
2020, doi: 10.1007/s12652-020-02669-6. neural networks based transfer learning for diabetic retinopathy fundus
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely image classification,’’ in Proc. 10th Int. Congr. Image Signal Process.,
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis. Biomed. Eng. Informat. (CISP-BMEI), Oct. 2017, pp. 1–11.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. [43] H. Alshazly, C. Linse, E. Barth, and T. Martinetz, ‘‘Deep convolutional
[21] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking neural networks for unconstrained ear recognition,’’ IEEE Access, vol. 8,
the inception architecture for computer vision,’’ in Proc. IEEE Conf. pp. 170295–170310, 2020.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826. [44] N. M. W. A. Mustafa, H. Yazid, M. Jaafar, M. Zainal, and
[22] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo- A. S. Abdul-Nasir, ‘‘A review of image quality assessment (IQA):
lutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), SNR, GCF, AD, NAE, PSNR, ME,’’ J. Adv. Res. Comput. Appl., vol. 7,
Jul. 2017, pp. 1251–1258. no. 1, pp. 1–7, 2017.
[23] S. Somasundaram and P. Alli, ‘‘A machine learning ensemble classifier [45] T. Nazir, A. Irtaza, Z. Shabbir, A. Javed, U. Akram, and M. T. Mahmood,
for early prediction of diabetic retinopathy,’’ J. Med. Syst., vol. 41, no. 12, ‘‘Diabetic retinopathy detection through novel tetragonal local octa pat-
pp. 1–12, Dec. 2017. terns and extreme learning machines,’’ Artif. Intell. Med., vol. 99,
[24] E. Saleh, J. Błaszczyński, A. Moreno, A. Valls, P. Romero-Aroca, Aug. 2019, Art. no. 101695.
[46] R. Trevethan, ‘‘Sensitivity, specificity, and predictive values: Foundations,
S. de la Riva-Fernández, and R. Słowiński, ‘‘Learning ensemble classifiers
pliabilities, and pitfalls in research and practice,’’ Frontiers Public Health,
for diabetic retinopathy assessment,’’ Artif. Intell. Med., vol. 85, pp. 50–63,
vol. 5, p. 307, Nov. 2017.
Apr. 2018.
[47] J. C. Javitt, J. K. Canner, R. G. Frank, D. M. Steinwachs, and A. Sommer,
[25] K. Oh, H. M. Kang, D. Leem, H. Lee, K. Y. Seo, and S. Yoon, ‘‘Early ‘‘Detecting and treating retinopathy in patients with type i diabetes melli-
detection of diabetic retinopathy based on deep learning and ultra-wide- tus: A health policy model,’’ Ophthalmology, vol. 97, no. 4, pp. 483–495,
field fundus images,’’ Sci. Rep., vol. 11, no. 1, pp. 1–9, Dec. 2021. 1990.
[26] H. Khalid, R. Schwartz, L. Nicholson, J. Huemer, M. H. El-Bradey, [48] M. M. Fraz, W. Jahangir, S. Zahid, M. M. Hamayun, and S. A. Barman,
D. A. Sim, P. J. Patel, K. Balaskas, R. D. Hamilton, P. A. Keane, and ‘‘Multiscale segmentation of exudates in retinal images using contex-
R. Rajendram, ‘‘Widefield optical coherence tomography angiography for tual cues and ensemble classification,’’ Biomed. Signal Process. Control,
early detection and objective evaluation of proliferative diabetic retinopa- vol. 35, pp. 50–62, May 2017.
thy,’’ Brit. J. Ophthalmol., vol. 105, no. 1, pp. 118–123, Jan. 2021. [49] D. J. Hemanth, O. Deperlioglu, and U. Kose, ‘‘An enhanced diabetic
[27] T. Nazir, A. Irtaza, J. Rashid, M. Nawaz, and T. Mehmood, ‘‘Diabetic retinopathy detection and classification approach using deep convolutional
retinopathy lesions detection using faster-RCNN from retinal images,’’ neural network,’’ Neural Comput. Appl., vol. 32, no. 3, pp. 707–721,
in Proc. 1st Int. Conf. Smart Syst. Emerg. Technol. (SMARTTECH), Feb. 2020.
Nov. 2020, pp. 38–42. [50] T. R. Gadekallu, N. Khare, S. Bhattacharya, S. Singh, P. K. R. Maddikunta,
[28] K. A. Goatman, A. D. Whitwam, A. Manivannan, J. A. Olson, and and G. Srivastava, ‘‘Deep neural networks to predict diabetic retinopa-
P. F. Sharp, ‘‘Colour normalisation of retinal images,’’ in Proc. Med. Image thy,’’ J. Ambient Intell. Humanized Comput., pp. 1–14, Apr. 2020, doi:
Understand. Anal., 2003, pp. 49–52. 10.1007/s12652-020-01963-7.
[51] Diabetic Retinopathy Detection. Accessed: Jun. 15, 2021. [Online]. Avail- MANJIT KAUR (Member, IEEE) received the
able: https://www.kaggle.com/c/diabetic-retinopathy-detection Master of Engineering degree in information
[52] A. Rakhlin, ‘‘Diabetic retinopathy detection through integration of deep technology from Panjab University, Chandigarh,
learning classification framework,’’ BioRxiv, Jan. 2018, Art. no. 225508, Punjab, in 2011, and the Ph.D. degree in image
doi: 10.1101/225508. processing from the Thapar Institute of Engi-
[53] C. Lam, D. Yi, M. Guo, and T. Lindsey, ‘‘Automated detection of diabetic
neering and Technology, Patiala, Punjab, India,
retinopathy using deep learning,’’ AMIA Summits Transl. Sci., vol. 2018,
no. 1, p. 147, 2018.
in 2019. She is currently working as an Assis-
[54] M. Chetoui and M. A. Akhloufi, ‘‘Explainable end-to-end deep learning tant Professor with the School of Engineer-
for diabetic retinopathy detection across multiple datasets,’’ J. Med. Imag., ing and Applied Sciences, Bennett University,
vol. 7, no. 4, Aug. 2020, Art. no. 044503. Greater Noida, India. She has published more than
[55] T. Kauppi, V. Kalesnykiene, J.-K. Kamarainen, L. Lensu, I. Sorri, 27 SCI/SCIE indexed articles so far. Her research interests include wireless
A. Raninen, R. Voutilainen, H. Uusitalo, H. Kälviäinen, and J. Pietilä, ‘‘The sensor networks, digital image processing, and meta-heuristic techniques.
DIARETDB1 diabetic retinopathy database and evaluation protocol,’’ in
Proc. Brit. Mach. Vis. Conf., vol. 1, 2007, pp. 1–10.
[56] E. Decencièr, X. Zhang, G. Cazuguel, B. Lay, B. Cochener, C. Trone,
P. Gain, R. Ordonez, P. Massin, A. Erginay, and B. Charton, ‘‘Feedback HAMMAM ALSHAZLY received the B.Sc. degree
on a publicly distributed image database: The Messidor database,’’ Image in computer science from South Valley University,
Anal. Stereology, vol. 33, no. 3, pp. 231–234, 2014. Egypt, in 2006, the M.Sc. degree in computer
[57] T. Li, Y. Gao, K. Wang, S. Guo, H. Liu, and H. Kang, ‘‘Diagnostic assess- science from the University of Mumbai, India,
ment of deep learning algorithms for diabetic retinopathy screening,’’ Inf. through a scholarship from the Indian Council
Sci., vol. 501, pp. 511–522, Oct. 2019. for Cultural Relations (ICCR), in 2014, and the
[58] P. Porwal, S. Pachade, R. Kamble, M. Kokare, G. Deshmukh, Ph.D. degree in computer science from South Val-
V. Sahasrabuddhe, and F. Meriaudeau, ‘‘Indian diabetic retinopathy
ley University, in 2018. From February 2019 to
image dataset (IDRiD): A database for diabetic retinopathy screening
January 2021, he was a Postdoctoral Researcher
research,’’ Data, vol. 3, no. 3, p. 25, 2018.
[59] A. D. Hoover, V. Kouznetsova, and M. Goldbaum, ‘‘Locating blood ves- with the Institute for Neuro- and Bioinformatics,
sels in retinal images by piecewise threshold probing of a matched filter University of Lübeck, Germany. He is currently working as an Assistant
response,’’ IEEE Trans. Med. Imag., vol. 19, no. 3, pp. 203–210, Mar. 2000. Professor with the Department of Computer Science, Faculty of Comput-
[60] E. Decencière, G. Cazuguel, X. Zhang, G. Thibault, J. C. Klein, ers and Information, South Valley University. He has published articles in
F. Meyer, B. Marcotegui, G. Quellec, M. Lamard, R. Danno, and D. Elie, conferences and peer-reviewed journals, and works as a reviewer for several
‘‘TeleOphta: Machine learning and image processing methods for teleoph- journals. His research interests include deep learning, biometrics, computer
thalmology,’’ IRBM, vol. 34, no. 2, pp. 196–203, Apr. 2013. vision, machine learning, and artificial intelligence. He was awarded the
Partnership and Ownership (ParOwn) Initiative, in 2010, for a period of 6
months at Monash University, Australia. During his Ph.D. degree, he was
awarded a Fulbright Scholarship for a period of 10 months to complete part
of his research work at the University of Kansas, USA.