A Survey of Deep Learning Models Final 2.0
A Survey of Deep Learning Models Final 2.0
areas
Alberto Nogales,1 Álvaro García-Tejedor,1 Diana Monge,2 Juan Serrano Vara1 and
Cristina Antón2
1
CEIEC, Research Institute, Universidad Francisco de Vitoria, Ctra. M-515 Pozuelo-Maja-
dahonda km 1800, 28223 Pozuelo de Alarcón, Spain
[email protected], [email protected], juanjose.se-
[email protected]
2
Faculty of Medicine, Research Institute, Universidad Francisco de Vitoria, Ctra. M-515 Po-
zuelo-Majadahonda km 1800, 28223 Pozuelo de Alarcón, Spain
[email protected], [email protected]
1 Introduction
In the medical domain, the areas where DL techniques have been most used are re-
lated to image diagnosis [11] and the analysis and classification of biomedical and clin-
ical data, [12] and [13]. However, DL models have also been used to develop tools that
help to segment the population, according to risk levels and adapt healthcare to each
defined profile, letting patients’ needs be anticipated. It has also been used for other
purposes, such as developing public health, environmental and labor plans, including
educational programs that can help prevent diseases; making predictions via disease
probability and prognosis studies; evaluating quality management services and pro-
grams; optimizing teleservices and strengthening self-care and permitting decision
making based on real data, [14] and [15].
This paper sets out a systematic review of the articles published in the medical field
in which DL techniques have been applied. To do this, a methodology was first defined
to semi-automatically obtain the relevant articles, eliminating those that were not per-
tinent to the scope of this study or whose impact on the scientific community was less.
This methodology was based on a search for the best-known scientific sources, as well
as applying important inclusive and exclusive quality criteria from the fields of medi-
cine and computer science. After filtering the initial material, the contributions of the
126 selected articles were statistically analysed and 64 were described. The analysis
revealed in which medical fields more studies have been carried out and which DL
models are the most used. Although there are other reviews in medicine and Deep
Learning like [16], [17] and [18] the aim of the present one should be a source of ref-
erence for physicians to know which use cases have been solved in their field. The
potential for computer scientists could be finding under-exploited niches. For that pur-
pose a deep statistical and graphical analysis is provided alongside a set of citations.
This article is organised in the following sections: Section 2 summarises the DL
models used today and introduces the data types from the field of medicine. Section 3
details the methodology used to obtain the articles selected. Section 4 contains the in-
4
depth study of the articles that were selected at the end, setting out their theoretical
foundations, contributions and applications. Finally, Section 5 presents the conclusions
obtained in this research study.
2 State of art
Starting in 2011, after the creation of AlexNet [19], the number of studies about deep
learning published in medical bibliographic databases has progressively increased. Fig-
ure 2 shows the number of annual deep learning publications on PubMed, which prac-
tically doubled every year since 2015, except for 2017. And if we bear in mind the
number of total publications in the 2000-2020 period, we can see that two-thirds of
these are from 2019 and 2020.
7000 6272
6000 5393
5000
4000 3016
3000
2000
639
1000 47 44 51 47 57 62 85 99 105106107117146205262382 133
0
2000 2005 2010 2015 2020
Fig. 2. Distribution by the publication year of the deep learning articles indexed in Pub-
Med from 2000 to 2020 (n=2817).
DL techniques are based on multiple models and architectures, although the effec-
tiveness of all of them is directly related to the nature and quality of the data with which
they will be trained. This section describes the data types that are commonly used in
medicine, as well as the architectures and models that best adapt to them.
In the medical field, the data types found may be structured, images, texts, time se-
ries, Electronic Health Records (EHRs) and graphs.
Structured data is defined [20] as: “any set of data values conforming to a common
schema or type”, basically data arranged in tables, such as databases or CSV or Excel
files. They follow a row and column structure, the latter with a header. Columns define
the characteristics of the individuals and rows, the values taken by the individuals for
the characteristics in question. Images are obtained from medical tests like x-rays, scans
and retinal fundus images. Texts include all written information used to monitor pa-
tients, such as their medical records and reports. Time series are electrocardiograms
(ECGs) or electroencephalography (EEGs). Here, the information is a set of repeated
observations of a single unit or individual at regular intervals over a large number of
5
observations [21]. EHRs are a specific data structure in the medical field, which in-
cludes full patient information in diverse formats, including images or text. Finally,
graphs can be a special way of modelling medical information, for example, the con-
nections (edges) between different brain zones (nodes). In conclusion, depending on
the nature of the data, one DL model or another will be most effective, as detailed in
the following classification.
Due to their structure and operating method, they can identify specific characteristics
(for example, a tumor) in a delocalized way, meaning independently of its position on
the image. The different capacities of these networks can be controlled by varying their
depth and breadth. They also make strong and mostly correct assumptions about the
nature of images (namely, stationarity of statistics and locality of pixel dependencies)
[22]. As CNN is the most developed architecture in Deep Learning, we can find modi-
fication as 3D-CNNs or Graph CNNs [23], [24]. Therefore, these models are being used
as an aid to medical diagnosis in fields like radiology for tasks like lesions classifica-
tion, image segmentation or detection of the abnormalities in the medical tests.
Recurrent neural networks (RNNs): They are defined in [25] as a network that can
process a sequence of arbitrary length by recursively applying a transition function to
1
https://en.wikipedia.org/wiki/Convolutional_neural_network
6
its internal hidden state vector ht obtained from the input sequence. The use of RNNs
has become widespread, primarily due to their great utility for processing data whose
type is a time series. The main feature of RNNs is that the output of all or some of their
neurons is in turn connected to the inputs of neurons in the same or a previous layer,
letting the network gain knowledge of the previous state (memory), meaning they be-
come equipped with a sort of time meaning. Figure 4 shows an example of RNN where
the neurons are interconnected. As the main capability that differentiates these models
from other is saving previous states, they have mainly been used in medical tests whose
information can only be understood by analyzing temporal values like biomedical sig-
nals. So, applications of RNN can be found in the area of cardiology or neurology where
tests like electrocardiogram or electroencephalogram are used [26], [27].
There are several types of recurring networks, the most widely used being Long
Short-Term Memory networks (LSTM). LSTM networks arose due to the problem re-
ferred to as long-term dependencies. According to [28], LSTMs can learn to bridge time
intervals over 1000 steps even when there are noisy, incompressible input sequences,
without loss of short time lag capabilities.
The second learning type (unsupervised learning) uses non-labelled input data, that
is, there is no a priori knowledge and the results to be obtained from processing the
input data are unknown [29]. These neural networks can learn to organize information
without providing an error calculation to evaluate the possible solution.
Deep autoencoders (AUD) are included within this group. This model, defined in
[30], is a special type of feedforward neural network where both the input and the output
are the same and is composed of two chained blocks. The first one, the decoder, reduces
the size of the input data until the features that univocally characterised the input data
are condensed into a small piece of data (the code). The second one, the decoder, up-
samples that piece of data until the input data is reconstructed. Figure 5 shows the main
feature of the autoencoder: the input and output layers are both the same size the output
should replicate the input, while the hidden layers are smaller sized, as the input patterns
2
https://missinglink.ai/guides/neural-network-concepts/recurrent-neural-network-glossary-uses-
types-basic-structure/
7
are progressively coded and decoded throughout the process.Their capability to extract
the fundamental features of the input has caused them to be used mainly to reduce data
dimensionality, but also to reduce noise in input data (such as images). They are often
used for data (image and signal) reconstruction, denoising or augmentation [31], [32],
[33]. These tasks can be considered to belong to the computer science field mostly but
are useful in medicine. Applications in the medical field include segmentation, detec-
tion and classification in images that are difficult to manage due to its size or that need
to be improved in terms of resolution, [34], [35], [36].
In addition to the two learning types described above, there are also architectures
implemented through mixed learning types (supervised and unsupervised) called semi-
supervised. Generative adversarial networks (GAN) would fit into this class.
GAN is an architecture composed of two neural networks, a generator and a discrim-
inator or classifier, that compete between them in an adversarial training process [37].
The set, as a whole, can learn to imitate any data distribution. The generative network
will be in charge of generating instances that belong to the data distribution (a specific
data structure, such as images) realistic enough to deceive the second, whose job is to
discern between real and generated data structures. The discriminator estimates the
probability of this generated data to belong to the data distribution (authentic) or not
(fake). As the discriminator classifies the generated data as fake, the generator learns
to generate instances closer to the data distribution. By following this process, both
models improve the way they perform. In cases of scarce data, GAN can be used to
generate synthetic instances of different classes. They are also applied in data recon-
struction like signal denoising or image reconstruction. For example, cleaning up arti-
facts in electroencephalographic tests [38]. They have also been used in dataset manip-
ulations like image superresolution (ontaining more detailed radiographs) and segmen-
tation (resonance images where different elements are tagged) or creation of new syn-
thetic instances (in cases where the training dataset is not enough) [39], [40], [41].
3
https://en.wikipedia.org/wiki/Autoencoder
8
To define the search strategy, we used the Medical Subject Headings (MeSH) terms, a
terminological vocabulary for science articles. In our case, MeSH headings were deep
learning and medicine. A complete list of terms under these headings can be found in
Appendix A. We also added open terms from the medical and computational sciences
fields that were not mutually exclusive. The terms used in medicine were: clinical de-
cision making, image analysis, image processing, medicine, health care and health. The
terms in computational sciences were: machine learning, artificial intelligence, bioin-
formatics, feature learning, feature representation, supervised learning, unsupervised
learning, neural networks, deep neural networks, convolutional neural networks, con-
volution, deep autoencoders, autoencoder, deep belief networks, generative adversarial
networks, recurrent neural networks and LSTM. The search and title and abstract ex-
traction period of the papers were at maximum until 15th September of 2020.
search engines: Science Direct, PubMed, Europe PubMed Central, Web of Science
(WOS) and EBSCO Discovery.
4
https://pandas.pydata.org/
5
https://numpy.org/
6
https://docs.python.org/2/library/xml.etree.elementtree.html
10
data used in the selected publications; nature of the data used in the study; use of repli-
cator or booster (obtain several data sets from original data set by resampling on sample
space) concerning the initial dataset; the DL models implemented in these publications;
comparison between the model used and the data type; therapeutic area of the field of
medicine in which the research will be applied; the purpose of the model in medicine;
whether or not there was validation with external databases; and description of how the
development would be applied to clinical practice.
4 Results
A total of 7640 papers were obtained from the consultations made (Scopus 997, Sci-
enceDirect 105, XML from Astrophysics Data System 345, XML from EUROPE PMC
1489, WOS 501, Medline 1799, Cinahl and PsyArticles 411, Central Citation Export
125, Embase 1868) where, after eliminating duplicates, 3493 papers remained that we
used to start the first screening by title and abstract. In this phase 1570 papers were
ruled out, due to not meeting the criterion of “being a deep learning study developed
on medicine,” and 1923 were passed on to the second screening. Of the 1923 papers in
the second screening, we could not obtain the full texts of 239, so 1684 were analysed,
of which 126 studies met the inclusion criteria for review.
Of the articles discarded in the second screening, 647 were excluded because the
journal in which they were published had an impact index lower than Q1 at the moment
the search was done.
And of the 911 remaining, 516 were ruled out due to not meeting these criteria: di-
vision of data for training and validation, description of the model to be replicated, and
comparison with other baseline models. A further 177 were discarded due to using a
non-representative sample of the study population, 32 for not specifying the expansion
model for the initial dataset, 186 papers due to being scientific specific areas, 121 bio-
technology and 65 medical engineering.
At the end of our selection process, the number of papers considered most relevant
was 126. Table 1 summarises the main causes for the exclusion of papers in the second
screening.
Table 1. Reasons for exclusion of articles in the second screening.
The distribution by the publication year of the 7640 papers obtained corroborated
the rising trend, especially since 2014, as can be seen in Figure 7. After that year, the
number of publications from one year to another is observed to double except in the
last two years. This could be a consequence of an increase in the evaluation criteria due
to the large number of people working on Deep Learning. It is noteworthy mentioning
that in the graph, the number of 2020 publications is only through September.
12
Fig. 7. Distribution by the publication year of the papers obtained without duplicates
(n=3493)
ology Binary classification of CNN Computed tomography (CT) Three datasets: 224,316, 92%
posteroanterior chest x-ray 112,120 and 15,783
ology Automatically evaluate the CNN MRI images 1064 brain images of autism AU
quality of multicenter struc- patients and healthy con-
tural brain MRI images trols. MRI data from 110
multiple sclerosis patient
halmology Image quality in the context of CNN Fundus images 7000 colour fundus images Acc
diabetic retinopathy
hinolaryngology Automated Plysomnography CNN+LSTM Electroencephalography, elec- 42,560 hours of PSG data F1-
scoring trooculography, and electromy- from 5213 patients
ography data
ocrinology Automatic diagnosis and sever- CNN Facial photographs 2148 photographs at differ- 90.7
ity-classification model for ac- ent severity levels
romegaly
halmology Diagnosis of Age-related Mac- CNN AREDS (age related eye disease 130,000 fundus images 94.9
ular Degeneration study) image 98.3
halmology Predict age and sex from reti- CNN Fundus images 219,302 from normal partici- AU
nal fundus images pants without hypertension,
diabetes mellitus (DM), and
any smoking history
20
ology Abnormality detection in chest ra- CNN Radiographs 112,120 frontal view chest AU
diographs radiographs from 30,805 pa- 0.95
tients and 17,202 frontal and
view chest radiographs with
a binary class label for nor-
mal vs abnormal
ology Classify white blood cells CNN Leukocyte images 5,000 images from local 94.2
hospital 95.1
94.4
diology Predict Stroke Patient Mortal- MLP 11 variables 15,099 stroke patients with AU
ity primaryInternational Classi-
fication of Diseases diagnos-
tic codes
eral Predict patients’ hospital mor- RNN Public electronic health record 32604 unique ICU admis- Sen
tality database. Fifteen physiological sions
measurements.
MD age-related Macular Degeneration, CAD Computer Aided Diagnosis, CNN Convolutional Neural Network, MRI Magnetic Resonance Images, PE
Computed Tomography, OCT Optical Coherence Tomography, D dimensions, AUC Area Under the Curve, MSE Mean Squared Error, RMSE Root M
Coefficient
4.3 Statistics and analysis of the studies included
At the end of the screening process, we had obtained 126 papers. At this point, we
verified the rising trend of journals with deep learning papers for medicine. Figure 8 is
a bar chart showing the distribution of the 126 papers by year of publication, where one
can observe the increasing trend in the number of publications in recent years. From
2016 to 2018, this number more than tripled. This fits with the historical process be-
cause although the term Deep Learning was coined by Hinton with his seminal work
[10], in 2006, the big milestone is considered to be AlexNet for image recognition in
2012, [19]. The smaller number of papers that were published in the last two years
corroborates what has been shown in Fig. 7.
8
https://www.natureindex.com/annual-tables/2019/country/all
22
One of the largest determinants when deciding which deep learning models to use is
the nature of the data they will be working with. Figure 9 is a pie chart depicting the
distribution of the data types showing that the majority of models —90.5% (114 pa-
pers)— work with images and only a small percentage —4.8% (6)— work with time
series for example [44, 96, 102] or with structured data —3.2% (4)— like [56, 101,
108]. Only one paper, [95], works with two data types, in which radiographs were used
along with their medical descriptions and only another that uses graphs. This infor-
mation is supported by data published by the National Institute of Health (NIH) where
funding in cancer is in the top positions. Considering that most tests in cancer diagnosis
are related to medical imaging, there should be a wide range of this type of data. For
example, imaging test in the US has greatly increased in recent years, [110].
It is worth mentioning that 44.8% (60) models used boosters to expand the sample
size that they started with initially [45-53, 57, 62, 79, 81, 87-92, 94-98, 102, 107] Fig-
ure 10.
Fig. 10. Distribution of models by whether or not they used boosters to expand the sample size
(n=126).
The data type used is directly related to the type of model developed. As can be seen
in the pie chart in Figure 11, the most used network, at a large margin, is CNN, in 75.4%
of the cases (95 papers). This makes sense, as the CNNs normally use images for work-
ing. Then there is a set of papers that uses Autoencoders, 15.6% (20). The rest of the
architectures are primarily used at the same percentage: MLP and RNN, 4.1% (5 times
each) and GAN, 0,8% (1). [55, 62, 65, 76] uses Autoencoders in the area of neurology
for the particular cases of schizophrenia and autism or lung and breast cancer. In the
case of MLP a particular marginal space Deep Learning model for diagnosis, stratifi-
cation and treatment planning for patients that have an aortic valve implanted. Related
to RNN [44, 53, 109], vital signs are used alongside some patient’s information. Finally,
GAN is only used once. Thus, one can speculate with the hypothesis that, in a near
future, CNN models will be part of the diagnosis system. Also, a front is opened in the
study of Autoencoders mostly used for image segmentation. In the case of RNNs, the
difficulty of obtaining this type of medical data is an obstacle to its evolution.
24
Fig. 11. Distribution of the models by deep learning architecture type (n=126).
To support the conclusion obtained in the last step, Figure 12 shows a bubble dia-
gram where the X-axis represents the model types and the Y-axis the data types. The
largest bubble, 69.84% (88 papers), represents the CNN models developed from im-
ages. Then, there are 11.89% (15 papers) using Autoencoders with images. The use of
CNN with time series, MLP with images, MLP with structured data, RNN with struc-
tured data and RNN with time series have two cases each. There is only one paper
where CNN is used with text and images [95] or graphs, MLP with time series and
images with RNN or GAN. Other cases did not arise in this survey. These results sup-
port what has been concluded in the previous paragraph. It is also remarkable that the
use of different data sources as the text seems extra information to guide training stages
in Deep Learning models. It should also be highlighted that Natural Language Pro-
cessing is nowadays a hot topic in Deep Learning with the greatest improvements.
25
Fig. 12. Representation of the relationship between the data type and the model architecture used
(n=126).
The therapeutic areas in which most papers are published are oncology (32.5%, 41),
followed by cardiology (11.9%, 15) and neuropsychiatry with 11.1% (14), Figure 13.
Standing out in oncology is the Computer-aided diagnosis (CAD) to help physicians to
classify models of disease (histology) and facilitate image diagnosis of the tumors
which includes mammography, Computer Axial Tomography (CT) and magnetic res-
onance. The development in Deep Learning for medical imaging can be seen in the
wide range of areas where it has been applied: ophthalmology, pneumology or derma-
tology. There are also curious researches in diseases like malaria, a very common in-
fectious disease in developing or third world countries.
As we can see in Table 2, breast cancer screening and diagnosis support is one of
the main objectives [60, 65, 66, 77, 81, 82, 85, 88, 90], followed by the development of
CAD in lung cancer [76, 87, 94].
In cardiology, the majority of the papers are about support to diagnosis using images
from different tools like ultrasound [45, 56], magnetic resonance [79] and myocardial
or cerebral arterial perfusion [61, 74]. [44] uses electronic health records to predict pos-
sible future heart failure onset via a time series.
In neuropsychiatry, the aim of many studies is the diagnosis [50, 55, 62, 63, 83, 100]
but also, we can find studies to predict disease evolution [72, 108], to allow patients to
write through their eyes movement [96].
Ophthalmology is another therapeutic area where deep learning has developed many
models with ocular fundus images to detect retinal diseases like age-related macular
degeneration [48, 63], hemorrhages [64], microaneurysms [49], diabetic retinopathy
[59, 77, 101] and rhegmatogenous retinal detachment [97] and glaucoma [50].
26
Other therapeutic areas found in this review are radio image 8.7%, ophthalmology
7.9% or traumatology 4.8% for more details see Fig. 13 and Table 2.
As seen, diagnosis-based models were the objective of 82.5% (104) of the studies,
while 3.2% (4) centred on monitoring treatment, and the rest, 14.3% (18) have miscel-
laneous topics as their objectives, such as disease classification, robotic surgery and
prediction of prognosis, Figure 14. As medicine is a field where the wrong decisions
could have irreversible consequences, most of the work with DL methods are applied
to diagnosis. The controversy is about letting machines decide, so their role nowadays
is aiding physicians in taking better and faster decisions and this can be done mainly at
the diagnostic stage.
Most of the models exposed in this survey have an accuracy of 80% or more. So, it
can be concluded that the performance of the different models in different areas and
use cases are quite good. Only in particular cases in the field of anesthesia using pa-
tients’ biosignal, detecting autism with MRI or keen osteoarthritis from radiographies
have an accuracy under 80%. Most of the papers used Accuracy and AUC as a metric
to measure the performance of the models.
Metrics can be grouped taking into account some charactertiscs. Accuracy is the
simplest one and uses the correct predictions, unlike error metrics as MSE or Log loss
error. In the particular case of medicine, it is very useful to use the false positives and
negatives: Sensitivity, Specificity and F-score or F-measure. This is related to what is
highlighted in [111] about the economic impact and risk in diagnosing a healthy patient
as sick. Related to this, there are graphical representations as AUC and Precision-Recall
curves. When using images, Hammoude distance, Position error, Mean corner distance
error and Dice coefficient are used. Finally, concordance correlation coefficient which
is an agreement between two variables.
27
Only 25.4% (31) of the 126 papers were validated with databases different than the
initial dataset, and only two studies (1.6%) detailed its application to clinical practice,
Figure 15. These studies validated their results in different databases than the initial
dataset were: [51], [57], [74], [82], [86], [92] , [94], [98-107], [94]. And only two [82,
101] describes the application of the model in the current clinical practice.
Fig. 15. Representation of the percentage of models validated with databases different than the
initial dataset (n=126), and distribution of the articles depending on whether their application to
common clinical practice is described (n=126).
28
The purpose of our study was to conduct a comprehensive review of the state-of-the-
art in the application of deep learning techniques in the field of medicine. A methodol-
ogy was defined for selecting a series of papers that could be considered representative.
This methodology started with the search in different sources of scientific knowledge,
obtaining 4505 initial papers. This number was progressively refined by eliminating
duplicates and articles not in this field, as well as other exclusion criteria defined by
computer scientists and physicians. At the end of the process, 126 papers were selected
and briefly summarised and analysed from a qualitative and quantitative perspective.
The most straightforward conclusion that can be drawn is that deep learning techniques
are widespread in the oncology discipline. Given that here the most used data for diag-
nosis are images, and that convolutional models are directly related to the treatment of
these images, it is logical that most deep learning applications found during the review
use this type of architecture. The next relevant areas are cardiology, ophthalmology,
and neuropsychiatry, where images also play a prominent role in diagnosis.
One of the main limitations of the study was the need to discard papers published in
JCR impact quartiles below Q1. This was because the large volume of references to be
included did not permit a correct description of all studies. This is why the objectives
of this research team that conducted the study include writing a second paper that would
complete and allow for the determination of whether or not the quality of the studies
published differs depending on the quartile in which they are published. On the other
hand, because this is a review in which various disciplines converge (computer science
and all medical specialties) and despite the careful methodological process, there may
be published studies for which we did not have access. We also found no information
in the publications about the models used by companies such as Google, Intel, Mi-
crosoft, Philips and Siemens, probably due to the confidentiality of the data and the
patents of the models.
However, it is worth noting that two types of neural applications are significantly
absent or underrepresented in the results obtained from this study. The prediction and
diagnosis of a patient's medical evolution, mortality risk, or the emergence of diseases
through the analysis of discrete/continuous signals (historical vital signs, EEG/ECG
data, etc.) have not been widely used in successful scenarios. Preventive medicine fo-
cused on the early detection of potentially dangerous situations will use these analysis
techniques to produce real-time alarms associated with previously analyzsed patterns
during normal-life situations. NLP (Natural Language Processing) using NMT (Neural
Machine Translation) models is also poorly represented in the medical domain, com-
pared with the relevance that processing of human communications is having within
artificial intelligence and applied linguistics areas: speech-to-text conversion, transla-
tion, summarizsation, disambiguation, understanding and generation of Natural Lan-
guage. It is foreseeable that in the coming years, applications related to human lan-
guage, whether written or spoken, will colonizse the medical domain. A large amount
of this type of data still unprocessed (medical records among them) and the possibility
of using them in combination with other data (numbers and images) will favor the
29
development of multimodal neural applications and will facilitate medical tasks not
directly related to the diagnosis.
We have also compared our work with other reviews of deep learning in medicine
published over the past five years. These documents were obtained from MEDLINE
and, after the screening, 72 of them were considered. Their conclusions roughly corre-
spond to the areas and applications highlighted in this systematic review. The largest
difference found is that none of the publications follow the methodological expectations
of the Cochrane reviews. They commonly lacked a definition of inclusion criteria that
add the characteristics that must be detailed in the papers that describe the implemen-
tation of deep learning models in medicine. From the point of view of computer sci-
ences, it is worth mentioning that data types were not considered, which however was
done in the present paper.
None of the articles included in our review was conducted in Spain, which may be
because current clinical data protection laws make it difficult to implement DL models,
as well as the lack of a common structure in electronic medical records between differ-
ent healthcare centers. We should also mention how the data from medical records are
recorded and structured, because the majority of the reports are written in open text,
with no encoded data that would permit a suitable extraction of variables, or enough
detail to be able to develop deep learning models that could predict the risk or progres-
sion of diseases following patients’ characteristics, combining this with sociodemo-
graphic population data.
To conclude, a high number of studies published in the Q1 did not meet the defined
quality criteria. Further, the process to replicate the sample was not always detailed,
and we found it quite surprising that the sizes of the initial datasets could be so small,
when consider that the basis of AI is big data. The lack of information in the papers
about the validation of the models developed with external databases and the absence
of descriptions of how the results could be used in routine clinical practice should be
emphasized [112]. It may be necessary to reach a consensus on quality criteria for the
studies and papers about deep learning in medicine.
Acknowledgments
We would like to thank Jaime Pérez Palomera, Borja García Lamas, Ignacio Moll
and Pedro Chazarra from CEIEC and Marina Diaz Fernández from the Universidad de
Francisco de Vitoria Library for their support during the course of this research.
30
References
1. Djedouboum, A. C., Ari, A., Adamou, A., Gueroui, A. M., Mohamadou, A., & Aliouat, Z.
(2018). Big Data Collection in Large-Scale Wireless Sensor Networks. Sensors, 18(12),
4474.
2. Sánchez-Mendiola. M., Martinez-Franco, A.I., García-Nocett, D.F. and Cervantes-Pérez, F.
(2018). Informática Biomédica: Big Data y Analítica en las Ciencias de la Salud. Third
Edition. Elsevier.
3. Normandeau, K. (2013). Beyond volume, variety and velocity is the issue of big data verac-
ity. Inside big data.
4. Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Malaysia;
Pearson Education Limited.
5. Samuel, A. L. (1988). Some studies in machine learning using the game of checkers. II re-
cent progress. In Computer Games I (pp. 366-400). Springer, New York, NY.
6. Domingos, P. M. (2012). A few useful things to know about machine learning. Commun.
acm, 55(10), 78-87.
7. Müller, B., Reinhardt, J., & Strickland, M. T. (2012). Neural networks: an introduction.
Springer Science & Business Media.
8. Hecht-Nielsen, R. (1988). Neurocomputing: picking the human brain. IEEE spec-
trum, 25(3), 36-41.
9. Salman, S., & Liu, X. (2019). Overfitting mechanism and avoidance in deep neural net-
works. arXiv preprint arXiv:1901.06566.
10. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
11. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., &
Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image
analysis, 42, 60-88.
12. Pham, T. T. (2018). Applying Machine Learning for Automated Classification of Biomedi-
cal Data in Subject-Independent Settings. Springer.
13. Zhou, S. K., Greenspan, H., & Shen, D. (Eds.). (2017). Deep learning for medical image
analysis. Academic Press.
14. Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., & Wang, Y. (2017). Artificial intelli-
gence in healthcare: past, present and future. Stroke and vascular neurology, 2(4), 230-243.
15. Miller, D. D., & Brown, E. W. (2018). Artificial intelligence in medical practice: the ques-
tion to the answer? The American journal of medicine, 131(2), 129-133.
16. Jang, H. J., & Cho, K. O. (2019). Applications of deep learning for the analysis of medical
data. Archives of pharmacal research, 1-13.
17. Bakator, M., & Radosav, D. (2018). Deep learning and medical diagnosis: A review of lit-
erature. Multimodal Technologies and Interaction, 2(3), 47.
18. Lundervold, A. S., & Lundervold, A. (2019). An overview of deep learning in medical im-
aging focusing on MRI. Zeitschrift für Medizinische Physik, 29(2), 102-127.
19. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems (pp.
1097-1105).
20. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. In Pro-
ceedings of the 2003 ACM SIGMOD international conference on Management of data (pp.
337-348). ACM.
21. Velicer, W. F., & Molenaar, P. C. (2012). Time series analysis for psychological re-
search. Handbook of Psychology, Second Edition, 2.
31
22. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel,
L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural compu-
tation, 1(4), 541-551.
23. Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human
action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1),
221-231.
24. Kipf, T. N. & Welling, M. (2016). Semi-Supervised Classification with Graph Convolu-
tional Networks. 5th International Conference on Learning Representations.
25. Elman, J. (1990). Finding Structure in Time. Cognitive Science, 14, 179—211.
26. Saadatnejad, S., Oveisi, M., & Hashemi, M. (2019). LSTM-based ECG classification for
continuous monitoring on personal wearable devices. IEEE journal of biomedical and health
informatics, 24(2), 515-523.
27. León, J., Escobar, J. J., Ortiz, A., Ortega, J., González, J., Martín-Smith, P., & Damas, M.
(2020). Deep learning for EEG-based Motor Imagery classification: Accuracy-cost trade-
off. Plos one, 15(6), e0234178.
28. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computa-
tion, 9(8), 1735-1780.
29. Sathya, R., & Abraham, A. (2013). Comparison of supervised and unsupervised learning
algorithms for pattern classification. International Journal of Advanced Research in Artifi-
cial Intelligence, 2(2), 34-38.
30. Ballard, D. H. (1987, July). Modular Learning in Neural Networks. In AAAI (pp. 279-284).
31. Saravanan, S., & Juliet, S. (2020). Deep Medical Image Reconstruction with Autoencoders
using Deep Boltzmann Machine Training. EAI Endorsed Transactions on Pervasive Health
and Technology, 166360.
32. Gondara, L. (2016, December). Medical image denoising using convolutional denoising au-
toencoders. In 2016 IEEE 16th International Conference on Data Mining Workshops
(ICDMW) (pp. 241-246). IEEE.
33. Pesteie, M., Abolmaesumi, P., & Rohling, R. N. (2019). Adaptive augmentation of medical
data using independently conditional variational auto-encoders. IEEE Transactions on Med-
ical Imaging, 38(12), 2807-2820.
34. Evan, M. Y., Iglesias, J. E., Dalca, A. V., & Sabuncu, M. R. (2020, January). An Auto-
Encoder Strategy for Adaptive Image Segmentation. In Medical Imaging with Deep Learn-
ing.
35. Uzunova, H., Schultz, S., Handels, H., & Ehrhardt, J. (2019). Unsupervised pathology de-
tection in medical images using conditional variational autoencoders. International journal
of computer assisted radiology and surgery, 14(3), 451-461.
36. Chen, M., Shi, X., Zhang, Y., Wu, D., & Guizani, M. (2017). Deep features learning for
medical image analysis with convolutional autoencoder neural network. IEEE Transactions
on Big Data.
37. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Cour-
ville, A.C., & Bengio, Y. (2014). Generative Adversarial Nets. NIPS.
38. Luo, T. J., Fan, Y., Chen, L., Guo, G., & Zhou, C. (2020). EEG Signal Reconstruction Using
a Generative Adversarial Network With Wasserstein Distance and Temporal-Spatial-Fre-
quency Loss. Frontiers in Neuroinformatics, 14.
39. Gupta, R., Sharma, A., & Kumar, A. (2020). Super-Resolution using GANs for Medical
Imaging. Procedia Computer Science, 173, 28-35.
40. Zhang, C., Song, Y., Liu, S., Lill, S., Wang, C., Tang, Z., ... & Cai, W. (2018, December).
MS-GAN: GAN-based semantic segmentation of multiple sclerosis lesions in brain
32
magnetic resonance imaging. In 2018 Digital Image Computing: Techniques and Applica-
tions (DICTA) (pp. 1-8). IEEE.
41. Sandfort, V., Yan, K., Pickhardt, P. J., & Summers, R. M. (2019). Data augmentation using
generative adversarial networks (CycleGAN) to improve generalizability in CT segmenta-
tion tasks. Scientific reports, 9(1), 1-9.
42. Suthaharan, S. (2016). Machine learning models and algorithms for big data classifica-
tion. Integr. Ser. Inf. Syst, 36, 1-12.
43. Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for
systematic reviews and meta-analyses: the PRISMA statement. Annals of internal medi-
cine, 151(4), 264-269.
44. Choi, E., Schuetz, A., Stewart, W. F. & Sun, J. (2017). Using recurrent neural network mod-
els for early detection of heart failure onset. JAMIA, 24, 361-370.
45. Carneiro, G., Nascimento, J. C. & Freitas, A. (2012). The Segmentation of the Left Ventricle
of the Heart From Ultrasound Data Using Deep Learning Architectures and Derivative-
Based Search Methods. IEEE Trans. Image Processing, 21, 968-982.
46. Chaudhari, A.S., Fang, Z., Kogan, F., Wood, J., Stevens, K.J., Gibbons, E.K., Lee, J.H.,
Gold, G., & Hargreaves, B.A. (2018). Super-resolution musculoskeletal MRI using deep
learning. Magnetic resonance in medicine, 80 5, 2139-2154.
47. Saltz, J.H., Gupta, R., Hou, L., Kurç, T.M., Singh, P., Nguyen, V.M., Samaras, D., Shroyer,
K.R., Zhao, T., Batiste, R., Arnam, J.S., Shmulevich, I., Rao, A.U., Lazar, A.J., Sharma, A.,
& Thorsson, V. (2018). Spatial Organization and Molecular Correlation of Tumor-Infiltrat-
ing Lymphocytes Using Deep Learning on Pathology Images. Cell reports, 23 1, 181-
193.e7.
48. Apostolopoulos, S., Ciller, C., Zanet, S.D., Wolf, S., & Sznitman, R. (2016). RetiNet: Au-
tomatic AMD identification in OCT volumetric data. CoRR, abs/1610.03628.
49. Lam, C., Yu, C.Y., Huang, L., & Rubin, D.Y. (2018). Retinal Lesion Detection With Deep
Learning Using Image Patches. Investigative ophthalmology & visual science.
50. Choi, H., Ha, S., Im, H.J., Paek, S.H., & Lee, D.S. (2017). Refining diagnosis of Parkinson's
disease with deep learning-based interpretation of dopamine transporter imaging. Neu-
roImage: Clinical.
51. Rajaraman, S., Antani, S.K., Poostchi, M., Silamut, K., Hossain, M.A., Maude, R.J., Jaeger,
S., & Thoma, G.R. (2018). Pre-trained convolutional neural networks as feature extractors
toward improved malaria parasite detection in thin blood smear images. PeerJ.
52. Nielsen, A.B., Hansen, M.B., Tietze, A., & Mouridsen, K. (2018). Prediction of Tissue Out-
come and Assessment of Treatment Effect in Acute Ischemic Stroke Using Deep Learn-
ing. Stroke, 49 6, 1394-1401.
53. Lee, H., Ryu, H., Chung, E., & Jung, C. (2017). Prediction of Bispectral Index during Tar-
get-controlled Infusion of Propofol and Remifentanil: A Deep Learning Approach. Anesthe-
siology, 128 3, 492-501.
54. Ning, Z., Luo, J., Li, Y.J., Han, S., Feng, Q., Xu, Y., Chen, W., Chen, T., & Zhang, Y.
(2018). Pattern Classification for Gastrointestinal Stromal Tumors by Integration of Radi-
omics and Deep Convolutional Features. IEEE Journal of Biomedical and Health Informat-
ics, 23, 1181-1191.
55. Zeng, L., Wang, H., Hu, P., Yang, B., Pu, W., Shen, H., Chen, X., Liu, Z., Yin, H., Tan, Q.,
Wang, K., & Hu, D. (2018). Multi-Site Diagnostic Classification of Schizophrenia Using
Discriminant Deep Learning with Functional Connectivity MRI. EBioMedicine.
56. Ghesu, F.C., Georgescu, B., Zheng, Y., Hornegger, J., & Comaniciu, D. (2015). Marginal
Space Deep Learning: Efficient Architecture for Detection in Volumetric Image
Data. MICCAI.
33
57. Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., & Mougiakakou, S.G.
(2016). Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolu-
tional Neural Network. IEEE Transactions on Medical Imaging, 35, 1207-1216.
58. Yasaka, K., Akai, H., Kunimatsu, A., Abe, O., & Kiryu, S. (2017). Liver Fibrosis: Deep
Convolutional Neural Network for Staging by Using Gadoxetic Acid-enhanced Hepatobili-
ary Phase MR Images. Radiology, 287 1, 146-155.
59. Leibig, C., Allken, V., Ayhan, M.S., Berens, P., & Wahl, S. (2017). Leveraging uncertainty
information from deep neural networks for disease detection. Scientific Reports.
60. Kooi, T., Litjens, G.J., Ginneken, B.V., Gubern-Mérida, A., Sánchez, C.I., Mann, R.M.,
Heeten, A.D., & Karssemeijer, N. (2017). Large scale deep learning for computer aided de-
tection of mammographic lesions. Medical image analysis, 35, 303-312.
61. Kim, E., Kim, H., Han, K., Kang, B.J., Sohn, Y., Woo, O.H., & Lee, C.W. (2018). Applying
Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Prelimi-
nary Study. Scientific Reports.
62. Heinsfeld, A.S., Franco, A.R., Craddock, R.C., Buchweitz, A., & Meneguzzi, F. (2018).
Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neu-
roImage: Clinical.
63. Suk, H., Lee, S., & Shen, D. (2014). Hierarchical feature representation and multimodal
fusion with deep learning for AD/MCI diagnosis. NeuroImage, 101, 569-582.
64. Grinsven, M.J., Ginneken, B.V., Hoyng, C.B., Theelen, T., & Sánchez, C.I. (2016). Fast
Convolutional Neural Network Training Using Selective Data Sampling: Application to
Haemorrhage Detection in Color Fundus Images. IEEE Transactions on Medical Imaging,
35, 1273-1284.
65. Wang, J., Yang, X., Cai, H., Tan, W., Jin, C., & Li, L. (2016). Discrimination of Breast
Cancer with Microcalcifications on Mammography by Deep Learning. Scientific reports.
66. Kooi, T., Ginneken, B.V., Karssemeijer, N., & Heeten, A.D. (2017). Discriminating solitary
cysts from soft tissue lesions in mammography using a pretrained deep convolutional neural
network. Medical physics, 44 3, 1017-1027.
67. Fu, H., Cheng, J., Xu, Y., Zhang, C., Wong, D.W., Liu, J., & Cao, X. (2018). Disc-Aware
Ensemble Network for Glaucoma Screening From Fundus Image. IEEE Transactions on
Medical Imaging, 37, 2493-2501.
68. Yu, C., Yang, S., Kim, W., Jung, J., Chung, K., Lee, S.W., & Oh, B. (2018). Acral melanoma
detection using a convolutional neural network for dermoscopy images. PloS one.
69. Wang, J., Ding, H., Bidgoli, F.A., Zhou, B., Iribarren, C., Molloi, S.Y., & Baldi, P. (2017).
Detecting Cardiovascular Disease from Mammograms With Deep Learning. IEEE Transac-
tions on Medical Imaging, 36, 1172-1181.
70. Iakovidis, D.K., Georgakopoulos, S.V., Vasilakakis, M., Koulaouzidis, A., & Plagianakos,
V.P. (2018). Detecting and Locating Gastrointestinal Anomalies Using Deep Learning and
Iterative Cluster Unification. IEEE Transactions on Medical Imaging, 37, 2196-2210.
71. Han, S.S., Park, G.H., Lim, W., Kim, M.S., Na, J.I., Park, I., & Chang, S.E. (2018). Deep
neural networks show an equivalent and often superior performance to dermatologists in
onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-
based convolutional deep neural network. PloS one.
72. Burgh, H.K., Schmidt, R., Westeneng, H.J., Reus, M.A., Berg, L.H., & Heuvel, M.P. (2017).
Deep learning predictions of survival based on MRI in amyotrophic lateral sclerosis. Neu-
roImage: Clinical.
73. Lee, C.S., Baughman, D.M., & Lee, A.Y. (2016). Deep learning is effective for the classifi-
cation of OCT images of normal versus Age-related Macular Degeneration. Ophthalmology.
Retina, 1 4, 322-327.
34
74. Betancur, J., Commandeur, F., Motlagh, M.E., Sharir, T., Einstein, A.J., Bokhari, S.I., Fish,
M., Ruddy, T.D., Kaufmann, P.A., Sinusas, A.J., Miller, E.J., Bateman, T.M., Dorbala, S.,
Carli, M.F., Germano, G., Otaki, Y., Tamarappoo, B., Dey, D., Berman, D.S., & Slomka,
P.J. (2018). Deep Learning for Prediction of Obstructive Disease From Fast Myocardial
Perfusion SPECT: A Multicenter Study. JACC. Cardiovascular imaging, 11 11, 1654-1663.
75. Quellec, G., Charrière, K., Boudi, Y., Cochener, B. & Lamard, M. (2017). Deep image min-
ing for diabetic retinopathy screening. Medical Image Analysis, 39, 178-193.
76. Cheng, J., Ni, D., Chou, Y., Qin, J., Tiu, C.P., Chang, Y., Huang, C., Shen, D., & Chen, C.
(2016). Computer-Aided Diagnosis with Deep Learning Architecture: Applications to
Breast Lesions in US Images and Pulmonary Nodules in CT Scans. Scientific reports.
77. Song, Y., Zhang, Y., Yan, X., Liu, H., Zhou, M., Hu, B., & Yang, G. (2018). Computer-
aided diagnosis of prostate cancer using a deep convolutional neural network from multipar-
ametric MRI. Journal of magnetic resonance imaging: JMRI, 48 6, 1570-1577.
78. Li, S., Wei, J., Chan, H., Helvie, M.A., Roubidoux, M.A., Lu, Y., Zhou, C., Hadjiiski, L.M.,
& Samala, R.K. (2018). Computer-aided assessment of breast density: comparison of super-
vised deep learning and feature-based statistical learning. Physics in medicine and biology,
63 2, 025005.
79. Ngo, T.A., Lu, Z., & Carneiro, G. (2017). Combining deep learning and level set for the
automated segmentation of the left ventricle of the heart from cardiac cine magnetic reso-
nance. Medical image analysis, 35, 159-171.
80. Zhang, J., Xia, Y., Wu, Q., & Xie, Y. (2017). Classification of Medical Images and Illustra-
tions in the Biomedical Literature Using Synergic Deep Learning. ArXiv, abs/1706.09092.
81. Araújo, T., Aresta, G., Castro, E.V., Rouco, J., Aguiar, P.D., Eloy, C., Polónia, A., & Cam-
pilho, A. (2017). Classification of breast cancer histology images using Convolutional Neu-
ral Networks. PloS one.
82. Han, Z., Wei, B., Zheng, Y., Yin, Y., Li, K., & Li, S. (2017). Breast Cancer Multi-classifi-
cation from Histopathological Images with Structured Deep Learning Model. Scientific Re-
ports.
83. Zhao, Y., Dong, Q., Zhang, S., Zhang, W., Chen, H., Jiang, X., Guo, L., Hu, X., Han, J., &
Liu, T. (2018). Automatic Recognition of fMRI-Derived Functional Networks Using 3-D
Convolutional Neural Networks. IEEE Transactions on Biomedical Engineering, 65, 1975-
1984.
84. Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., & Saarakkala, S. (2018). Automatic Knee
Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach. Sci-
entific Reports.
85. Lee, J., & Nishikawa, R.M. (2018). Automated mammographic breast density estimation
using a fully convolutional network. Medical physics, 45 3, 1178-1190.
86. Esses, S.J., Lu, X., Zhao, T., Shanbhogue, K.P., Dane, B., Bruno, M., & Chandarana, H.
(2018). Automated image quality evaluation of T2 -weighted liver MRI utilizing deep learn-
ing architecture. Journal of magnetic resonance imaging: JMRI, 47 3, 723-728.
87. Serj, M.F., Lavi, B., Hoff, G., & Valls, D.P. (2018). A Deep Convolutional Neural Network
for Lung Cancer Diagnostic. CoRR, abs/1804.08170.
88. Yap, M.H., Pons, G., Martí, J., Ganau, S., Sentís, M., Zwiggelaar, R., Davison, A.K., &
Martí, R. (2018). Automated Breast Ultrasound Lesions Detection Using Convolutional
Neural Networks. IEEE Journal of Biomedical and Health Informatics, 22, 1218-1226.
89. Du, X., Kurmann, T., Chang, P., Allan, M., Ourselin, S., Sznitman, R., Kelly, J.D., &
Stoyanov, D. (2018). Articulated Multi-Instrument 2-D Pose Estimation Using Fully Con-
volutional Networks. IEEE Transactions on Medical Imaging.
35
90. Kim, K.H., Choi, S.H., & Park, S. (2017). Improving Arterial Spin Labeling by Using Deep
Learning. Radiology, 287 2, 658-666.
91. Song, Y., Zhang, L., Chen, S., Ni, D., Lei, B.Y., & Wang, T. (2015). Accurate Segmentation
of Cervical Cytoplasm and Nuclei Based on Multiscale Convolutional Network and Graph
Partitioning. IEEE Transactions on Biomedical Engineering, 62, 2421-2433.
92. Zhang, Z., Liang, X., Dong, X., Xie, Y., & Cao, G. (2018). A Sparse-View CT Reconstruc-
tion Method Based on Combination of DenseNet and Deconvolution. IEEE Transactions on
Medical Imaging, 37, 1407-1417.
93. Xue, Y., Zhang, R., Deng, Y., Chen, K., & Jiang, T. (2017). A preliminary examination of
the diagnostic value of deep learning in hip osteoarthritis. PloS one.
94. Khosravan, N., Celik, H., Turkbey, B., Jones, E., Wood, B.J., & Bagci, U. (2019). A collab-
orative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional
model, and deep learning. Medical image analysis, 51, 101-115.
95. Shin, H. C., Lu, L., Kim, L., Seff, A., Yao, J., & Summers, R. M. (2016). Interleaved
Text/Image Deep Mining on a Large-Scale Radiology Database for Automated Image Inter-
pretation. Journal of Machine Learning Research, 17(1-31), 2.
96. Liu, M., Wu, W., Gu, Z., Yu, Z., Qi, F. & Li, Y. (2018). Deep learning based on Batch
Normalization for P300 signal detection. Neurocomputing, 275, 288-297.
97. Ohsugi H., Hitoshi Tabuchi H., Enno H. & Ishitobi N. (2017) Accuracy of deep learning, a
machine learning technology, using ultra-wide-field fundus ophthalmoscopy for detecting
rhegmatogenous retinal detachment. Scientific Reports,7, 9425
98. Chen, C. M., Huang, Y. S., Fang, P. W., Liang, C. W., & Chang, R. F. (2020). A computer‐
aided diagnosis system for differentiation and delineation of malignant regions on whole‐
slide prostate histopathology image using spatial statistics and multidimensional Dense-
Net. Medical Physics, 47(3), 1021-1033
99. Jang, R., Kim, N., Jang, M., Lee, K. H., Lee, S. M., Lee, K. H., ... & Seo, J. B. (2020).
Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Us-
ing Chest X-Ray Images From Multiple Centers. JMIR Medical Informatics, 8(8), e18089.
100. Sujit, S. J., Coronado, I., Kamali, A., Narayana, P. A., & Gabr, R. E. (2019). Automated
image quality evaluation of structural brain MRI using an ensemble of deep learning net-
works. Journal of Magnetic Resonance Imaging, 50(4), 1260-1267.
101. Saha, S. K., Fernando, B., Cuadros, J., Xiao, D., & Kanagasingam, Y. (2018). Automated
quality assessment of colour fundus images for diabetic retinopathy screening in telemedi-
cine. Journal of digital imaging, 31(6), 869-878.
102. Zhang, L., Fabbri, D., Upender, R., & Kent, D. (2019). Automated sleep stage scoring of the
Sleep Heart Health Study using deep neural networks. Sleep, 42(11), zsz159.
103. Kong, Y., Kong, X., He, C., Liu, C., Wang, L., Su, L., ... & Cheng, R. (2020). Constructing
an automatic diagnosis and severity-classification model for acromegaly using facial photo-
graphs by deep learning. Journal of hematology & oncology, 13(1), 1-4.
104. Das, A., Rad, P., Choo, K. K. R., Nouhi, B., Lish, J., & Martel, J. (2019). Distributed ma-
chine learning cloud teleophthalmology IoT for predicting AMD disease progression. Fu-
ture Generation Computer Systems, 93, 486-498
105. Kim, Y. D., Noh, K. J., Byun, S. J., Lee, S., Kim, T., Sunwoo, L., & Park, S. J. (2020).
Effects of Hypertension, Diabetes, and Smoking on Age and Sex Prediction from Retinal
Fundus Images. Scientific reports, 10(1), 1-14.
106. Pan, I., Agarwal, S., & Merck, D. (2019). Generalizable inter-institutional classification of
abnormal chest radiographs using efficient convolutional neural networks. Journal of digital
imaging, 32(5), 888-896
36
107. Zhang, C., Wu, S., Lu, Z., Shen, Y., Wang, J., Huang, P., ... & Xue, J. (2020). Hybrid Ad-
versarial‐Discriminative Network for Leukocyte Classification in Leukemia. Medical Phy-
sics.
108. Cheon, S., Kim, J., & Lim, J. (2019). The use of deep learning to predict stroke patient
mortality. International journal of environmental research and public health, 16(11), 1876.
109. Yu, R., Zheng, Y., Zhang, R., Jiang, Y., & Poon, C. C. (2019). Using a multi-task recurrent
neural network with attention mechanisms to predict hospital mortality of patients. IEEE
Journal of Biomedical and Health Informatics, 24(2), 486-492.
110. Smith-Bindman, R., Kwan, M. L., Marlow, E. C., Theis, M. K., Bolch, W., Cheng, S. Y., &
Pole, J. D. (2019). Trends in use of medical imaging in US health care systems and in On-
tario, Canada, 2000-2016. Jama, 322(9), 843-856.
111. Lee, Y., Kwon, J. M., Lee, Y., Park, H., Cho, H., & Park, J. (2018). Deep learning in the
medical domain: predicting cardiac arrest using deep learning. Acute and critical
care, 33(3), 117.
112. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., & Yang, G. Z.
(2016). Deep learning for health informatics. IEEE journal of biomedical and health infor-
matics, 21(1), 4-21.
37
MeSH “Deep Learning” includes the following terms: Learning, Deep/ Hierarchical
Learning/ Learning, Hierarchical.
MeSH “Medicine” includes the following terms: Specialties, Medical/ Specialties,
Medical/ Medical Specialties/ Specialty, Medical/ Addiction Medicine/ Adoles-
cent Medicine/ Aerospace Medicine/ Allergy and Immunology/ Anesthesiology/ Bari-
atric Medicine/ Behavioral Medicine/ Clinical Medicine/ Evidence-Based Medicine/
Precision Medicine/ Community Medicine/ Dermatology/ Disaster Medicine/ Emer-
gency Medicine/ Pediatric Emergency Medicine/ Forensic Medicine/ General Practice/
Family Practice/ Genetics, Medical/ Geography, Medical/ Geriatrics/ Global Health/
Hospital Medicine/ Integrative Medicine Internal Medicine (Cardiology, Endocrinol-
ogy, Gastroenterology, Hematology, Infectious Disease Medicine, Medical Oncology,
Nephrology, Pulmonary Medicine, Rheumatology, Sleep Medicine Specialty)/ Mili-
tary Medicine/ Molecular Medicine/ Naval Medicine/ Neurology/ Neuropathology/
Neurotology/ Osteopathic Medicine/ Palliative Medicine/ Pathology (Forensic Pathol-
ogy, Neuropathology, Pathology, Clinical, Pathology, Molecular, Pathology, Surgical,
Telepathology)/ Pediatrics (Neonatology, Pediatric Emergency Medicine, Perinatol-
ogy, Perioperative Medicine)/ Physical and Rehabilitation Medicine/ Rehabilitation
/Psychiatry (Adolescent Psychiatry, Biological Psychiatry, Child Psychiatry, Commu-
nity Psychiatry, Forensic Psychiatry, Geriatric Psychiatry, Military Psychiatry, Neuro-
psychiatry)/ Public Health (Epidemiology, Preventive Medicine)/ Radiology (Nu-
clear Medicine, Radiation Oncology, Radiology, Interventional)/ Regenerative Medi-
cine/ Reproductive Medicine (Andrology, Gynecology)/ Social Medicine/ Specialties,
Surgical (Colorectal Surgery, General Surgery, Gynecology, Neurosurgery, Obstetrics,
Ophthalmology, Orthognathic Surgery, Orthopedics, Otolaryngology, Surgery, Plastic,
Surgical Oncology, Thoracic Surgery)/ Traumatology/ Urology/ Sports Medicine/ Tel-
emedicine/ Theranostic Nanomedicine/ Travel Medicine/ Tropical Medicine/ Vacci-
nology/ Venereology/ Wilderness Medicine.
38
[1] Zhao, W., Yang, J., Sun, Y., Li, C., Wu, W., Jin, L., ... & Hua, Y. (2018). 3D deep learning
from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas. Can-
cer research, 78(24), 6881-6889.
[2] Zhao, X., Wang, X., Xia, W., Li, Q., Zhou, L., Li, Q., ... & Wang, W. (2020). A cross-modal
3D deep learning for accurate lymph node metastasis prediction in clinical stage T1 lung adeno-
carcinoma. Lung Cancer.
[3] Shen, Y., Li, X., Liang, X., Xu, H., Li, C., Yu, Y., & Qiu, B. (2020). A deep‐learning‐based
approach for adenoid hypertrophy diagnosis. Medical Physics.
[4] Hoseini, F., Shahbahrami, A., & Bayat, P. (2018). An efficient implementation of deep con-
volutional neural networks for MRI segmentation. Journal of digital imaging, 31(5), 738-747.
[5] Pranav, R., Park, A., Irvin, J., Chute, C., Bereket, M., Domenico, M., ... & Patel, B. N. (2020).
AppendiXNet: Deep Learning for Diagnosis of Appendicitis from A Small Dataset of CT Exams
Using Video Pretraining. Scientific Reports (Nature Publisher Group), 10(1).
[6] Yıldırım, Ö., Pławiak, P., Tan, R. S., & Acharya, U. R. (2018). Arrhythmia detection using
deep convolutional neural network with long duration ECG signals. Computers in biology and
medicine, 102, 411-420.
[7] Schwyzer, M., Martini, K., Benz, D. C., Burger, I. A., Ferraro, D. A., Kudura, K., & Messerli,
M. (2020). Artificial intelligence for detecting small FDG-positive lung nodules in digital
PET/CT: impact of image reconstructions on diagnostic performance. European Radiol-
ogy, 30(4), 2031-2040.
[8] Vakanski, A., Xian, M., & Freer, P. E. (2020). Attention-Enriched Deep Learning Model for
Breast Tumor Segmentation in Ultrasound Images. Ultrasound in Medicine & Biology, 46(10),
2819-2833.
[9] Vakanski, A., Xian, M., & Freer, P. E. (2020). Attention-Enriched Deep Learning Model for
Breast Tumor Segmentation in Ultrasound Images. Ultrasound in Medicine & Biology, 46(10),
2819-2833.
[10] Candemir, S., White, R. D., Demirer, M., Gupta, V., Bigelow, M. T., Prevedello, L. M., &
Erdal, B. S. (2020). Automated coronary artery atherosclerosis detection and weakly supervised
localization on coronary CT angiography with a deep 3-dimensional convolutional neural net-
work. Computerized Medical Imaging and Graphics, 101721.
[11] Stoean, C., Stoean, R., Atencia, M., Abdar, M., Velázquez-Pérez, L., Khosravi, A., & Joya,
G. (2020). Automated Detection of Presymptomatic Conditions in Spinocerebellar Ataxia Type
2 Using Monte Carlo Dropout and Deep Neural Network Techniques with
[12] van den Heuvel, T. L., Petros, H., Santini, S., de Korte, C. L., & van Ginneken, B. (2019).
Automated fetal head detection and circumference estimation from free-hand ultrasound sweeps
using deep learning in resource-limited countries. Ultrasound in medicine & biology, 45(3), 773-
785.
[13] Zhuge, Y., Ning, H., Mathen, P., Cheng, J. Y., Krauze, A. V., Camphausen, K., & Miller,
R. W. (2020). Automated glioma grading on conventional MRI images using deep convolutional
neural networks. Medical Physics.
39
[14] Ma, X., Wei, J., Zhou, C., Helvie, M. A., Chan, H. P., Hadjiiski, L. M., & Lu, Y. (2019).
Automated pectoral muscle identification on MLO‐view mammograms: Comparison of deep
neural network to conventional computer vision. Medical physics, 46(5), 2103-2114.
[15] Zabihollahy, F., Schieda, N., Krishna Jeyaraj, S., & Ukwatta, E. (2019). Automated seg-
mentation of prostate zonal anatomy on T2‐weighted (T2W) and apparent diffusion coefficient
(ADC) map MR images using U‐Nets. Medical physics, 46(7), 3078-3090.
[16] Ferreira, P. F., Martin, R. R., Scott, A. D., Khalique, Z., Yang, G., Nielles‐Vallespin, S., ...
& Firmin, D. N. (2020). Automating in vivo cardiac diffusion tensor postprocessing with deep
learning–based segmentation. Magnetic Resonance in Medicine.
[17] Zhong, T., Huang, X., Tang, F., Liang, S., Deng, X., & Zhang, Y. (2019). Boosting‐based
cascaded convolutional neural networks for the segmentation of CT organs‐at‐risk in nasopha-
ryngeal carcinoma. Medical physics, 46(12), 5602-5611.
[18] Byra, M., Galperin, M., Ojeda‐Fournier, H., Olson, L., O'Boyle, M., Comstock, C., & Andre,
M. (2019). Breast mass classification in sonography with transfer learning using a deep convo-
lutional neural network and color conversion. Medical physics, 46(2), 746-755.
[19] Agrawal, V., Udupa, J., Tong, Y., & Torigian, D. (2020). BRR‐Net: A tandem architectural
CNN–RNN for automatic body region localization in CT images. Medical Physics.
[20] Bi, X., Li, S., Xiao, B., Li, Y., Wang, G., & Ma, X. (2020). Computer aided Alzheimer's
disease diagnosis by an unsupervised deep learning technology. Neurocomputing, 392, 296-304.
[21] Song, Y., Zhang, Y. D., Yan, X., Liu, H., Zhou, M., Hu, B., & Yang, G. (2018). Computer‐
aided diagnosis of prostate cancer using a deep convolutional neural network from multiparamet-
ric MRI. Journal of Magnetic Resonance Imaging, 48(6), 1570-1577.
[22] Stember, J. N., Chang, P., Stember, D. M., Liu, M., Grinband, J., Filippi, C. G., & Jam-
bawalikar, S. (2019). Convolutional neural networks for the detection and measurement of cere-
bral aneurysms on magnetic resonance angiography. Journal of digital imaging, 32(5), 808-815
[23] Ko, H., Chung, H., Kang, W. S., Kim, K. W., Shin, Y., Kang, S. J., ... & Lee, J. (2020).
COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest
CT Image: Model Development and Validation. Journal of Medical Internet Research, 22(6),
e19569.
[24] Kulkarni, P. M., Robinson, E. J., Pradhan, J. S., Gartrell-Corrado, R. D., Rohr, B. R., Trager,
M. H., & Rizk, E. M. (2020). Deep learning based on standard H&E images of primary melanoma
tumors identifies patients at risk for visceral recurrence and death. Clinical Cancer Re-
search, 26(5), 1126-1134.
[25] Bychkov, D., Linder, N., Turkki, R., Nordling, S., Kovanen, P. E., Verrill, C., ... & Lundin,
J. (2018). Deep learning based tissue analysis predicts outcome in colorectal cancer. Scientific
reports, 8(1), 1-11.
[26] Antico, M., Sasazawa, F., Dunnhofer, M., Camps, S. M., Jaiprakash, A. T., Pandey, A. K.,
& Fontanarosa, D. (2020). Deep learning-based femoral cartilage automatic segmentation in ul-
trasound imaging for guidance in robotic knee arthroscopy. Ultrasound in Medicine & Biol-
ogy, 46(2), 422-435.
[27] Scannell, C. M., Veta, M., Villa, A. D., Sammut, E. C., Lee, J., Breeuwer, M., & Chiribiri,
A. (2020). Deep‐Learning‐Based Preprocessing for Quantitative Myocardial Perfusion
MRI. Journal of Magnetic Resonance Imaging, 51(6), 1689-1696.
[28] Yuan, Y., Qin, W., Ibragimov, B., Zhang, G., Han, B., Meng, M. Q. H., & Xing, L. (2019).
Densely Connected Neural Network With Unbalanced Discriminant and Category Sensitive
40
Constraints for Polyp Recognition. IEEE Transactions on Automation Science and Engineer-
ing, 17(2), 574-583.
[29] Papandrianos, N., Papageorgiou, E., Anagnostis, A., & Papageorgiou, K. (2020). Efficient
Bone Metastasis Diagnosis in Bone Scintigraphy Using a Fast Convolutional Neural Network
Architecture. Diagnostics, 10(8), 532.
[30] Park, H., Lee, H. J., Kim, H. G., Ro, Y. M., Shin, D., Lee, S. R., ... & Kong, M. (2019).
Endometrium segmentation on transvaginal ultrasound image using key‐point discrimina-
tor. Medical physics, 46(9), 3974-3984.
[31] Estrada, S., Lu, R., Conjeti, S., Orozco‐Ruiz, X., Panos‐Willuhn, J., Breteler, M. M., &
Reuter, M. (2020). FatSegNet: A fully automated deep learning pipeline for adipose tissue seg-
mentation on abdominal dixon MRI. Magnetic Resonance in Medicine, 83(4), 1471-1483.
[32] Wang, J., Chen, X., Lu, H., Zhang, L., Pan, J., Bao, Y., ... & Qian, D. (2020). Feature‐shared
adaptive‐boost deep learning for invasiveness classification of pulmonary subsolid nodules in CT
images. Medical Physics, 47(4), 1738-1749.
[33] Berhane, H., Scott, M., Elbaz, M., Jarvis, K., McCarthy, P., Carr, J., ... & Rigsby, C. K.
(2020). Fully automated 3D aortic segmentation of 4D flow MRI for hemodynamic analysis us-
ing deep learning. Magnetic resonance in medicine.
[34] Ornek, A. H., Ceylan, M., & Ervural, S. (2019). Health status detection of neonates using
infrared thermography and deep convolutional neural networks. Infrared Physics & Technol-
ogy, 103, 103044.
[35] Porter, E., Fuentes, P., Siddiqui, Z., Thompson, A., Levitin, R., Solis, D., ... & Guerrero, T.
(2020). Hippocampus Segmentation on non‐Contrast CT using Deep Learning. Medical Physics.
[36] Kudva, V., Prasad, K., & Guruvare, S. (2019). Hybrid Transfer Learning for Classification
of Uterine Cervix Images for Cervical Cancer Screening. Journal of Digital Imaging, 1-13.
[37] Meisel, C., & Bailey, K. A. (2019). Identifying signal-dependent information about the
preictal state: A comparison across ECoG, EEG and EKG using deep learning. EBioMedi-
cine, 45, 422-431
[38] Verburg, E., Wolterink, J. M., de Waard, S. N., Išgum, I., van Gils, C. H., Veldhuis, W. B.,
& Gilhuijs, K. G. (2019). Knowledge‐based and deep learning‐based automated chest wall seg-
mentation in magnetic resonance images of extremely dense breasts. Medical physics, 46(10),
4405-4416.
[39] Hussein, S., Kandel, P., Bolan, C. W., Wallace, M. B., & Bagci, U. (2019). Lung and pan-
creatic tumor characterization in the deep learning era: novel supervised and unsupervised learn-
ing approaches. IEEE transactions on medical imaging, 38(8), 1777-1787.
[40] Wang, Q., Shen, F., Shen, L., Huang, J., & Sheng, W. (2019). Lung nodule detection in CT
images using a raw patch-based convolutional neural network. Journal of digital imaging, 32(6),
971-979.
[41] Park, B., Park, H., Lee, S. M., Seo, J. B., & Kim, N. (2019). Lung segmentation on HRCT
and volumetric CT for diffuse interstitial lung disease using deep convolutional neural net-
works. Journal of Digital Imaging, 32(6), 1019-1026.
[42] Mutasa, S., Chang, P. D., Ruzal-Shapiro, C., & Ayyala, R. (2018). MABAL: a novel deep-
learning architecture for machine-assisted bone age labeling. Journal of digital imaging, 31(4),
513-519.
41
[43] Apiparakoon, T., Rakratchatakul, N., Chantadisai, M., Vutrapongwatana, U., Kingpetch, K.,
Sirisalipoch, S., ... & Chuangsuwanich, E. (2020). MaligNet: Semisupervised Learning for Bone
Lesion Instance Segmentation Using Bone Scintigraphy. IEEE Access, 8, 27047-27066.
[44] Nie, D., Lu, J., Zhang, H., Adeli, E., Wang, J., Yu, Z., ... & Shen, D. (2019). Multi-channel
3D deep feature learning for survival time prediction of brain tumor patients using multi-modal
neuroimages. Scientific reports, 9(1), 1-14.
[45] Kats, L., Vered, M., Blumer, S., & Kats, E. (2020). Neural Network Detection and Segmen-
tation of Mental Foramen in Panoramic Imaging. Journal of Clinical Pediatric Dentistry, 44(3),
168-173.
[46] Yu-Heng, L., Wei-Ning, C., Te-Cheng, H., Lin, C., Tsao, Y., & Semon, W. (2020). Overall
survival prediction of non-small cell lung cancer by integrating microarray and clinical data with
deep learning. Scientific Reports (Nature Publisher Group), 10(1).
[47] Fu, Y., Lei, Y., Wang, T., Tian, S., Patel, P., Jani, A. B., ... & Yang, X. (2020). Pelvic multi‐
organ segmentation on cone‐beam CT for prostate adaptive radiotherapy. Medical Physics.
[48] Nobashi, T., Zacharias, C., Ellis, J. K., Ferri, V., Koran, M. E., Franc, B. L., ... & Davidzon,
G. A. (2020). Performance comparison of individual and ensemble CNN models for the classifi-
cation of brain 18F-FDG-PET scans. Journal of digital imaging, 33(2), 447-455.
[49] Dreizin, D., Zhou, Y., Zhang, Y., Tirada, N., & Yuille, A. L. (2020). Performance of a deep
learning algorithm for automated segmentation and quantification of traumatic pelvic hematomas
on CT. Journal of Digital Imaging, 33(1), 243-251.
[50] Yeh, H. Y., Chao, C. T., Lai, Y. P., & Chen, H. W. (2020). Predicting the Associations
between Meridians and Chinese Traditional Medicine Using a Cost-Sensitive Graph Convolu-
tional Neural Network. International Journal of Environmental Research and Public
Health, 17(3), 740.
[51] Yuan, Y., Qin, W., Buyyounouski, M., Ibragimov, B., Hancock, S., Han, B., & Xing, L.
(2019). Prostate cancer classification with multiparametric MRI transfer learning model. Medical
physics, 46(2), 756-765.
[52] Chibuta, S., & Acar, A. C. (2020). Real-time Malaria Parasite Screening in Thick Blood
Smears for Low-Resource Setting. Journal of Digital Imaging, 1-13.
[53] Østvik, A., Smistad, E., Aase, S. A., Haugen, B. O., & Lovstakken, L. (2019). Real-time
standard view classification in transthoracic echocardiography using convolutional neural net-
works. Ultrasound in medicine & biology, 45(2), 374-384
[54] Zeiser, F. A., da Costa, C. A., Zonta, T., Marques, N. M., Roehe, A. V., Moreno, M., & da
Rosa Righi, R. (2020). Segmentation of Masses on Mammograms Using Data Augmentation and
Deep Learning. Journal of Digital Imaging, 1-11.
[55] Cheon, S., Kim, J., & Lim, J. (2019). The use of deep learning to predict stroke patient
mortality. International journal of environmental research and public health, 16(11), 1876.
[56] Barragán‐Montero, A. M., Nguyen, D., Lu, W., Lin, M. H., Norouzi‐Kandalan, R., Geets,
X., ... & Jiang, S. (2019). Three‐dimensional dose prediction for lung IMRT patients with deep
neural networks: robust learning from heterogeneous beam configurations. Medical phys-
ics, 46(8), 3679-3691.
[57] Zhou, C., Fan, H., & Li, Z. (2019). Tonguenet: Accurate localization and segmentation for
tongue images using deep neural networks. IEEE Access, 7, 148779-148789.
42
[58] Yu, R., Zheng, Y., Zhang, R., Jiang, Y., & Poon, C. C. (2019). Using a multi-task recurrent
neural network with attention mechanisms to predict hospital mortality of patients. IEEE Journal
of Biomedical and Health Informatics, 24(2), 486-492.
[59] Bohara, G., Sadeghnejad Barkousaraie, A., Jiang, S., & Nguyen, D. (2020). Using deep
learning to predict beam‐tunable Pareto optimal dose distribution for intensity‐modulated radia-
tion therapy. Medical Physics.
[60] Zhou, J., Luo, L. Y., Dou, Q., Chen, H., Chen, C., Li, G. J., ... & Heng, P. A. (2019). Weakly
supervised 3D deep learning for breast cancer classification and localization of the lesions in MR
images. Journal of Magnetic Resonance Imaging, 50(4), 1144-1151.
[61] Narayana, P. A., Coronado, I., Sujit, S. J., Wolinsky, J. S., Lublin, F. D., & Gabr, R. E.
(2020). Deep‐learning‐based neural tissue segmentation of MRI in multiple sclerosis: Effect of
training set size. Journal of Magnetic Resonance Imaging, 51(5), 1487-1496