Mathematics 11 04937 v2
Mathematics 11 04937 v2
Review
Analysis of Colorectal and Gastric Cancer Classification:
A Mathematical Insight Utilizing Traditional Machine
Learning Classifiers
Hari Mohan Rai * and Joon Yoo *
Abstract: Cancer remains a formidable global health challenge, claiming millions of lives annually.
Timely and accurate cancer diagnosis is imperative. While numerous reviews have explored cancer
classification using machine learning and deep learning techniques, scant literature focuses on tra-
ditional ML methods. In this manuscript, we undertake a comprehensive review of colorectal and
gastric cancer detection specifically employing traditional ML classifiers. This review emphasizes the
mathematical underpinnings of cancer detection, encompassing preprocessing techniques, feature ex-
traction, machine learning classifiers, and performance assessment metrics. We provide mathematical
formulations for these key components. Our analysis is limited to peer-reviewed articles published
between 2017 and 2023, exclusively considering medical imaging datasets. Benchmark and publicly
available imaging datasets for colorectal and gastric cancers are presented. This review synthesizes
findings from 20 articles on colorectal cancer and 16 on gastric cancer, culminating in a total of
36 research articles. A significant focus is placed on mathematical formulations for commonly used
preprocessing techniques, features, ML classifiers, and assessment metrics. Crucially, we introduce
our optimized methodology for the detection of both colorectal and gastric cancers. Our performance
metrics analysis reveals remarkable results: 100% accuracy in both cancer types, but with the lowest
sensitivity recorded at 43.1% for gastric cancer.
Citation: Rai, H.M.; Yoo, J. Analysis
of Colorectal and Gastric Cancer
Keywords: traditional machine learning; cancer detection; colorectal cancer; gastric cancer;
Classification: A Mathematical mathematical formulation; preprocessing; feature extraction
Insight Utilizing Traditional Machine
Learning Classifiers. Mathematics MSC: 68T07
2023, 11, 4937. https://doi.org/
10.3390/math11244937
formation but rather tend to involve the proliferation of abnormal blood cells that circulate
within the body and may not form solid masses as seen in other types of cancer. Cancer
arises from genetic anomalies that disrupt the regulation of cellular proliferation. These
genetic anomalies compromise the natural control mechanisms that prevent excessive cell
proliferation. The body has inherent mechanisms designed to remove cells that possess
damaged DNA, but, in certain cases, these fail, allowing abnormal cells to thrive and
potentially develop into tumors, disrupting regular bodily functions; these defenses can
diminish with age or due to various factors [6].
Each instance of cancer exhibits a distinct genetic modification that evolves as the
tumor grows. Tumors often showcase a diversity of genetic mutations across various cells
existing within the same cluster. Genetic abnormalities primarily affect three types of genes:
DNA repair genes, proto-oncogenes, and tumor suppressor genes. Proto-oncogenes are
typically immersed in healthy cell division and proliferation. The transformation of these
genes into oncogenes, brought on by specific alterations or increased activity, fuels uncon-
trolled cell growth and plays a role in cancer development. Meanwhile, tumor suppressor
genes meticulously manage cellular division while imposing restraints on unbridled and
unregulated cellular proliferation, and mutations in these genes disable their inhibitory
function, increasing the risk of cancer. Mutations in DNA repair genes are significant in
rectifying DNA damage, and these genes can lead to the accumulation of further genetic
abnormalities, making cells more prone to developing cancer. Metastasis is the movement
of cancer cells from the initial site to new parts. It includes cell detachment, local tissue
invasion, blood or lymph system entry, and growth in distant tissues [7,8]. Understanding
the genetic and cellular mechanisms underlying cancer development and metastasis is
crucial for improving diagnostics, developing effective treatments, and advancing cancer
research. Researchers can work toward better strategies for prevention, early detection,
and targeted therapies by unraveling the intricacies of cancer at the molecular level. The
early diagnosis of cancer developments across different body areas requires accurate and
automated computerized techniques. While numerous researchers have made significant
strides in cancer detection, there remains substantial scope for improvement in this field. In
this manuscript, we have scrutinized colorectal and gastric cancers employing conventional
ML techniques solely based on medical imaging datasets. Medical images offer finer and
more specific details compared to other medical data sources.
Literature Review
This section provides an evaluative comparison of the most recent review articles
available, analyzing current review articles dedicated to the utilization of machine learning
and deep learning classifiers for cancer detection across diverse types. The objective is to
summarize the positive aspects and limitations of these review articles, as per the review
presented, on various cancer types. The papers selected for analysis include those that cover
more than two cancer types, are peer-reviewed, and were published between 2019 and 2023.
This present study extends our prior works [9,10] by providing an extensive review that
now encompasses seven distinct cancer types. Levine et al. (2019) [9] focused on cutaneous,
mammary, pulmonary, and various other malignant conditions, emphasizing radiological
practices and diagnostic workflows. The study detailed the construction and deployment of
a convolutional neural network for medical image analysis. However, limitations included a
relative underemphasis on malignancy detection, sparse literature sources, and examination
of a limited set of performance parameters. Huang et al. (2020) [10] explored prostatic,
mammary, gastric, colorectal, solid, and non-solid malignancies. The study presented a
comparative analysis of artificial intelligence algorithms and human pathologists in terms
of prognostic and diagnostic performance across various cancer classifications. However,
limitations included a lack of literature for each malignancy category, the absence of
consideration for machine learning and deep learning classifiers, and a lack of an in-depth
literature review. Saba (2020) [11] examined mammary, encephalic, pulmonary, hepatic,
cutaneous, and leukemic cancers, offering concise explanations of benchmark datasets and
Mathematics 2023, 11, 4937 3 of 40
Identification
Records identified through Additional records identified
database searching through other sources
(n = 571) (n = 0)
Studies included
(n =36)
Included
Studies included in
scoping review
(n =36)
Figure
Figure 1. PRISMA
1. PRISMA flow diagram
flow diagram for the literature
for the literature selection process.
selection process.
Figure
Figure2.2.Parameters
Parametersgoverning
governingthethe
inclusion and
inclusion exclusion
and of research
exclusion articles
of research in the
articles selection
in the pro-process.
selection
cess.
2.1.2. Exclusion Criteria
2.1.2. Exclusion Criteria
The exclusion criteria, a pivotal aspect of the research review process for cancer
The exclusion
detection, servedcriteria, a pivotalfilter
as a strategic aspect toofensure
the research review process
the selection for cancer de-
of high-quality, pertinent
tection, served as a strategic filter to ensure the selection of high-quality,
articles. Omitting conference papers and book chapters was a deliberate choice pertinent articles.
to uphold a
Omitting conference papers and book chapters was a deliberate choice to uphold a supe-
superior standard, guided by the in-depth scrutiny and comprehensive nature typically
rior standard, guided by the in-depth scrutiny and comprehensive nature typically asso-
associated with peer-reviewed journal articles. Additionally, the requirement for digital
ciated with peer-reviewed journal articles. Additionally, the requirement for digital object
object identifiers (DOIs) within the selected studies aimed to guarantee the reliability and
identifiers (DOIs) within the selected studies aimed to guarantee the reliability and acces-
accessibility of the articles, facilitating easy citation, retrieval, and verification processes.
sibility of the articles, facilitating easy citation, retrieval, and verification processes. The
The temporal
temporal boundaryboundary
set the set thewithin
scope scopeawithin
specificatimeframe,
specific timeframe, excluding
excluding research research
pub-
lished before 2017 or after 2023, with the intention of focusing on the most recent advance- recent
published before 2017 or after 2023, with the intention of focusing on the most
advancements
ments within the within thecancer
field of field ofdetection.
cancer detection.
LanguageLanguage
limitationslimitations were incorporated,
were incorporated, al-
allowing only English publications to ensure a consistent
lowing only English publications to ensure a consistent understanding and analysis. understanding and analysis.
Moreover,the
Moreover, theexclusion
exclusionofofdeepdeeplearning
learning classifiers
classifiers in favor
in favor of traditional
of traditional machine
machine learning
learn-
methods
ing methods aligned
alignedwith thethe
with specific
specificobjective
objectiveofofassessing
assessingthe theperformance
performanceand andeffectiveness
effec-
of the latter
tiveness of thein cancer
latter detection.
in cancer By By
detection. narrowing
narrowing thethefocus
focusexclusively
exclusively to tocolorectal
colorectal and
gastric
and cancers,
gastric cancers,thethe
exclusion
exclusioncriteria
criteriaaimed
aimed to ensure
ensureaaconcentrated
concentrated andand comprehensive
comprehen-
sive analysis
analysis across
across these
these specific
specific high-mortality
high-mortality cancer
cancer types.
types. Thisapproach
This approachfacilitated
facilitatedaadeeper
deeper understanding
understanding of theofefficacy
the efficacy of traditional
of traditional machine
machine learningmethods
learning methods in the the con-
context of
text of different
different cancercancer
types.types.
To
Toilluminate
illuminatethe research
the research hotspots,
hotspots,we wehavehave
detailed the quantity
detailed of literature
the quantity refer- refer-
of literature
ences
encespertaining
pertainingannually
annually to to
each cancer
each cancercategory (colorectal
category (colorectaland gastric), along along
and gastric), with thewith the
cumulative total, visually represented in Figure 3. This visual aid is designed
cumulative total, visually represented in Figure 3. This visual aid is designed to aid readers to aid read-
ers in identifying
in identifying pertinentliterature
pertinent literature related
relatedtotothese
thesespecific
specific cancer
cancer categories, fostering
categories, fostering a
a more nuanced analysis within the specified years.
more nuanced analysis within the specified years.
Table 1. Benchmark and public medical imaging datasets for colorectal and gastric cancer with
download links.
2.3. Preprocessing
In cancer detection, preprocessing is essential to prepare data for analysis and clas-
sification. It refines diverse data types, like medical images and genetic and clinical data,
addressing noise and inconsistencies. Medical image preprocessing includes noise reduc-
tion, enhancement, normalization, and format standardization. Augmentation enhances
data diversity. Quality preprocessed data improves cancer detection model performance.
Common tasks include noise reduction, data cleaning, transformation, normalization, and
standardization. Preprocessing optimizes data for analysis, contributing to effective cancer
diagnosis. Key preprocessing techniques are summarized in Table 2.
Preprocessing
Formula Description
Technique
Ifiltered ( A, B) epitomizes the clean image
pixel at location ( A, B). I ( A − x, B − y) is the
pixel significance at location ( A − x, B − y)
N N
Image Filtering Ifiltered ( A, B) = ∑ ∑ I ( A − x, B − y) · K ( x, y) in the original image. K ( x, y) is the value of
x =− N y=− N the convolution kernel at location ( x, y). The
summation is performed over a window of
size (2N + 1) × (2N + 1) centered at ( A, B).
Idenoised represents the denoised image.
E( Idenoised ) is the data fidelity term, which
measures how well the denoised image
Image Denoising Idenoised = argmin( E( Idenoised ) + R( Idenoised )) matches the noisy input image. R( Idenoised )
is the regularization term, which imposes a
prior on the structure of the denoised
image [21].
Filteredvalue represents the resulting value
after applying Gaussian filtering. x and y are
−( x2 +y2 )
Gaussian Filtering 1 the spatial coordinates. σ is the standard
Filteredvalue = (2πσ 2) ∗ e
2σ2
deviation, controlling the amount of
smoothing or blurring.
PixelOP is the enhanced pixel value, derived
from Pixel IP in the input image. Min IP and
Max IP are the minimum and maximum pixel
Contrast Enhancement IP − Min IP
Pixel OP = (Pixel
Max IP − Min IP )
∗ ( MaxOP − MinOP ) + MinOP values in the input image. MinOP and
of Images (CEI)
MaxOP represent the desired minimum and
maximum pixel values in the output
image [22].
where T is the transformation operator, v is
Linear Transformation T (v) = Av the input vector, and A is a matrix defining
the transformation.
O( A, B) is the enhanced output pixel at
Contrast Limited ( A, B) using contrast-enhancing
Adaptive Histogram O( A, B) = T ( I ( A, B)) transformation function T (·) based on pixel
Equalization (CLAHE) intensity using cumulative distribution
function (CDF).
X [m] represents the DCT coefficient at
frequency index m. x [k ] is the input signal. N
Discrete Cosine N −1
π (2k+1)m
X [m] = ∑ x [k] · cos is the number of samples in the signal. The
Transform (DCT) 2N
k =0 summation is performed over all samples in
the signal
W ( x, y) is the DWT coefficient, ( I ( a, b)) is the
Wavelet Transform N −1 M −1
W ( x, y) = ∑ ∑ I ( a, b) · ψx,y ( a, b) pixel value at ( a, b), and ψx,y ( a, b) is the 2D
(WT)
a =0 b =0 wavelet function.
Mathematics 2023, 11, 4937 9 of 40
Table 2. Cont.
Preprocessing
Formula Description
Technique
Grayvalue is the converted gray value from
RGB channels (Redvalue , Greenvalue , Bluevalue ).
RGB to Gray Gray_value = (0.2989 ∗ Redvalue ) +
Coefficients 0.2989, 0.5870, and 0.1140 are
Conversion (RGBG) (0.5870 ∗ Greenvalue ) + (0.1140 ∗ Bluevalue )
weights assigned to the R, G, and B channels,
respectively [23].
The cropped image Icropped is obtained by
Cropping (ROI) Icropped = I [y : y + h, x : x + w] cropping the input image I at coordinates
( x, y) with width w and height h.
Gmax n o
Energy = ∑ [ h i ]2 (3)
i =1
are some important GLCM features, along with their mathematical formulas as provided
in Equations (5)–(10).
Here, (x, y) pairs typically refer to the intensity values of adjacent or neighboring pixels.
Sum of Squares Variance (SSV): SSV quantifies the variance in gray levels within
the texture.
SSV = ∑( x − µ)2 ∗ GLCM ( x, y) (5)
x,y
Inverse Different Moment (IDM): IDM measures the local homogeneity and is higher
for textures with similar gray levels.
1
IDM = ∑ 1 + (x − y)2 ∗ GLCM(x, y) (6)
x,y
Correlation (Corr): Correlation quantifies the linear dependency between pixel values
in the texture. It spans from −1 to 1, with 1 signifying flawless positive correlation.
Inverse Difference (ID): ID measures the local homogeneity and is higher for textures
with similar gray levels at different positions.
GLCM( x, y)
ID = ∑ 1 + |(x − y)| (10)
x,y
C ( x, y)
SRE = ∑ x2
(11)
x,y
Here, ( x, y) are gray levels, and C ( x, y) is the co-occurrence matrix value reflecting the
frequency of each gray-level combination.
Long Run Emphasis (LRE): LRE assesses the presence of extended runs marked by
higher gray-level values [28].
Run Length Nonuniformity (RLN): RLN evaluates the irregularity in the lengths
of runs.
C ( x, y)
RLN = ∑ (14)
x,y y2
C ( x, y)
RP = ∑ N2
(15)
x,y
Run Entropy (RE): RE calculates the entropy of run lengths and gray levels.
Low Gray-Level Run Emphasis (LGRE): LGRE accentuates shorter runs with lower
gray-level values.
C ( x, y) N+1
LRGE = ∑ 2
, for y ≤ (17)
x,y y 2
High Gray-Level Run Emphasis (HGRE): HGRE highlights longer runs with higher
gray-level values.
N+1
HRGE = ∑ C ( x, y) ∗ y2 , for y > (18)
x,y 2
Short Run Low Gray-Level Emphasis (SRLGLE): SRLGLE highlights shorter runs
that contain lower gray-level values.
C ( x, y) N+1
SRLGLE = ∑ 2 2
, for x, y ≤
2
(19)
x,y (x ∗ y
Short Run High Gray-Level Emphasis (SRHGLE): SRHGLE highlights shorter runs
that contain higher gray-level values.
C ( x, y) ∗ x2 N+1 N+1
SRHGLE = ∑ y 2
, for x, y ≤
2
, y>
2
(20)
x,y
Long Run Low Gray-Level Emphasis (LRLGLE): LRLGLE emphasizes longer runs
featuring lower gray-level values.
C ( x,y)
N+1 N+1
LRLGLE = ∑ x2
y2
, for x >
2
, y≤
2
(21)
x,y
2 N+1
LRHRGLE = ∑ C(x, y) ∗ x2 ∗ y , for x, y >
2
(22)
x,y
Mathematics 2023, 11, 4937 12 of 40
N C ( x, y)
Coars = ∑x=g 1 (∆x )2
(23)
Ng refers to the highest achievable discrete intensity level within the image.
Contrast (NGTD): Quantifies the contrast or sharpness in the texture.
N N
Contrast NGTD = ∑x=g 1 ∑y=g 1 C(x, y) ∗ |x − y| (24)
N N P( x, y)
Complexity = ∑x=g 1 ∑y=g 1 1 + |x − y|2 (26)
These features provide a detailed analysis of texture patterns in images, making them
valuable for various applications, including image classification, quality control, and texture
discrimination in fields such as geology, material science, and medical imaging.
The “1-NN Convergence Proof” states that, as your dataset grows infinitely large,
the 1-Nearest Neighbor (1-NN) classifier’s error will not be more than twice the error of
/
𝑑𝑖𝑠𝑡(𝑥, 𝑧) = |𝑥 − 𝑧 | (28)
Mathematics 2023, 11, 4937 The “1-NN Convergence Proof” states that, as your dataset grows infinitely large, 13 ofthe
40
1-Nearest Neighbor (1-NN) classifier’s error will not be more than twice the error of the
Bayes optimal classifier, which represents the best possible classification performance.
the
ThisBayes optimal
also holds forclassifier,
k-NN with which
largerrepresents
values ofthe best
k. It possiblethe
highlights classification performance.
ability of the K-Nearest
This also holds
Neighbors for k-NN
algorithm with larger
to approach values performance
optimal of k. It highlights
withthe ability ofdata
increasing [29]. As 𝑛
the K-Nearest
Neighbors infinity, 𝑍to approach
approachesalgorithm converges to 𝑍 , and
optimal performance with increasing
the probability of differentdata As𝑍n
[29].for
labels
approaches
when returning (𝑍 ZNN
infinity, )’s label
converges to Zt , and
is described the probability
in Equation of different labels for Zt when
(29) [30].
returning ( ZNN )’s label is described in Equation (29) [30].
∈ = 𝑃(𝑦 ∗ |𝑍 )(1 − 𝑃(𝑦 ∗ |𝑍 )) + 𝑃(𝑦 ∗ |𝑍 )(1 − 𝑃(𝑦 ∗ |𝑍 )) ≤ (1 − 𝑃(𝑦 ∗ |𝑍 )) + (1 − 𝑃(𝑦 ∗ |𝑍 ))
(29)
∈ NN = P(y∗| Zt )(1=−2(1
P(y−∗|𝑃(𝑦
ZNN∗))
|𝑍+))P=(y2∗|∈ZNN )(1 − P(y∗| Zt )) ≤ (1 − P(y∗| ZNN )) + (1 − P(y∗| Zt ))
(29)
Here, BO is=the − P(yoptimal
2(1Bayes ∗| Zt )) = 2 ∈ BO If the test point and its nearest neighbor are
classifier.
indistinguishable, misclassification
Here, BO is the occurs if they
Bayes optimal classifier. If thehave
test different
point andlabels. This probability
its nearest is
neighbor are
outlined in Equation
indistinguishable, (30) and Figureoccurs
misclassification 4 [29,31].
if they have different labels. This probability is
outlined in Equation (30) and Figure 4 [29,31].
1 − 𝑝(𝑠| 𝑥) 𝑝(𝑠| 𝑥) + 𝑝(𝑠| 𝑥) 1 − 𝑝(𝑠| 𝑥) = 2𝑝(𝑠| 𝑥) 1 − 𝑝(𝑠| 𝑥) (30)
1 − prepresents
Equation((30) (s| x )) p(s| xthe p(s| x )(1 − p(s| xprobability
) +misclassification )) = 2p(s| x )( 1 − pthe
when (s| xtest
)) point and(30)
its
nearest neighbor have differing labels.
Figure 4. Probabilistic analysis of misclassification for identical test point and nearest neighbor sce-
Figure 4. Probabilistic analysis of misclassification for identical test point and nearest neighbor scenario.
nario.
Equation (30) represents the misclassification probability when the test point and its
2.5.2. Multilayered Perceptron (MLP)
nearest neighbor have differing labels.
In contrast to static kernels, neural network units have adaptable internal parameters
2.5.2.
for anMultilayered Perceptron
adjustable structure. (MLP)
A perceptron, inspired by biological neurons, comprises three
components: (i) to
In contrast weighted edges neural
static kernels, for individual
networkmultiplications, (ii) a summation
units have adaptable unit for
internal parameters
calculating
for the sum,
an adjustable and (iii)Aan
structure. activation inspired
perceptron, unit applying a non-linear
by biological function
neurons, [32–34].three
comprises The
single-layer unit
components: function involves
(i) weighted edges fora individual
linear combination passed through
multiplications, a non-linear
(ii) a summation unitacti-
for
vation, represented
calculating the sum,byand Equation
(iii) an(31) and Figure
activation unit5applying
[33,34]. a non-linear function [32–34].
The single-layer unit function involves a linear combination passed through a non-linear
activation, represented by Equation (31) and Figure 5 [33,34].
𝑦 ( )𝑓 = 𝑤 ( ) + 𝑤 ( )𝑥 (31)
!
N
y (1) f = w0 (1) + ∑ w j (1) x j (31)
j =1
1
⎡𝑥 ⎤
⎢ ⎥
.
𝑥=⎢ ⎥ (33)
⎢ . ⎥
Mathematics 2023, 11, 4937 ⎢ . ⎥ 14 of 40
⎣𝑥 ⎦
(a) (b)
Figure 5. Contrasts (a) biological neurons, showcasing intricate neural architecture, with (b) artifi-
Figure 5. Contrasts (a) biological neurons, showcasing intricate neural architecture, with (b) ar-
cial perceptrons in neural networks, depicting simplified representations and emphasizing struc-
tificial perceptrons in neural networks, depicting simplified representations and emphasizing
tural differences.
structural differences.
The vector representation comprises(1input values 𝑥 𝑡𝑜 (1)𝑥 , and an additional (ele-
In a single-layer neural network unit, y ) f is the output, w0 is the(bias, ) and ∑ N 1)
j=1 w j ( x)j
ment of 1. Internal parameters of single-layer units include
is the weighted sum of inputs. In general, we compute U1 units as feature bias 𝑤 , and weights 𝑤
transformations ,in
( )
through 𝑤 , . These
learning models, parameters
described form the(32)
in an Equation column of a matrix 𝑊 ( ) with dimensions
𝑗𝑡ℎ [33,34].
(𝑁 + 1) × 𝑈 , as demonstrated in Equation (34) below [34]:
(1) (1)
model( x, w) = w0 +( y) 1 ( x())w1 + · · · + ( )yU1 ( x ) wU1 (32)
⎡𝑤 , 𝑤, ... 𝑤 , ⎤
The input vector x can be denoted ⎢ as
( ) represented
( )
. . . in
( ) ⎥
𝑤 Equation (33) [33,34].
𝑊 = ⎢𝑤 , 𝑤, ,
⎥ (34)
⋮
⎢ ( ) ⋮ ⋮ ⋮ ⎥
(1 )
( )
⎣𝑤 , 𝑤 , ... 𝑤 , ⎦
x
1
Notably, the matrix–vector product x= 𝑊 . 𝑥
encompasses all linear combinations (33)
within our 𝑈 units as given in Equation (35)
.
[33].
.
x N( )
(𝑊 𝑥) = 𝑤 (, ) + 𝑤 , 𝑥 , 𝑗 = 1, … , 𝑈 (35)
The vector representation comprises input values x1 to x N , and an additional element
of 1. We extend
Internal the activation
parameters function 𝑓 units
of single-layer to handle a general
include
(𝑑
1) × 1 vector 𝑣 in
bias w0,j
(1)
and weights w1,jEquation
through
(36)
(1) [34]:
w N,j . These parameters form the jth column of a matrix W (1) with dimensions ( N + 1) × U1 ,
as demonstrated in Equation (34) below [34]:
Notably, the matrix–vector product W1T x encompasses all linear combinations within
our U1 units as given in Equation (35) [33].
N
∑ wn,j xn ,
(1) (1)
W1T x = w0,j + j = 1, . . . , U1 (35)
j
n =1
23, 11, x FOR PEER REVIEW 15 of 42
𝑓(𝑣 )
𝑓(𝑣) = ⋮
We extend the activation function f to handle a general d × 1(36)
vector v in
Equation (36) [34]:
𝑓(𝑣 )
In Equation (37), 𝑓(𝑊 𝑥) is a 𝑈 × 1 vector containing
f (v1 ) all 𝑈 single-layer units
..
[33,34]: f (v) = . (36)
f (vd )
( ) ( )
𝑓(𝑊 𝑥) =(37),
𝑓 𝑤 , 1+
Tx a𝑤U1, ×𝑥 1 , 𝑗= 1, … , 𝑈 all U1 single-layer
(37)
In Equation f W is vector containing units [33,34]:
!
N
anT x𝐿-layer
The mathematical expression for = f unit +∑
w0,j in a general
wn,j xn multilayer perceptron,
(1) (1)
f W 1 , j = 1, . . . , U1 (37)
j
built recursively from single-layer units, is given by Equation
n =1 (38) [33,34].
The mathematical expression ( for
) an L-layer unit in a general multilayer perceptron,
built recursively ( )
from single-layer units,( is) given
( ) by Equation (38) [33,34].
𝑦 ( ) (𝑥) = 𝑓 𝑤 + 𝑤 𝑓 (𝑥) (38)
U ( L −1)
( L ) ( L −1)
∑
( L) ( L)
y ( x ) = f w0 + wi fi ( x ) (38)
i =1
2.5.3. Support Vector Machine (SVM)
2.5.3. Support
SVMs, employed Vector Machine
for regression (SVM)
and classification tasks, stand out in supervised ma-
chine learning for their precision with complex datasets.classification
SVMs, employed for regression and Particularlytasks, standinout
effective in supervised
binary
classification, SVMs aim to discover an optimal hyperplane, maximizing the boundary in binary
machine learning for their precision with complex datasets. Particularly effective
classification, SVMs aim to discover an optimal hyperplane, maximizing the boundary
between classes. Serving as a linear classifier, SVMs build on the perceptron introduced
between classes. Serving as a linear classifier, SVMs build on the perceptron introduced by
by Rosenblatt in 1958 [35–37]. Unlike perceptrons, SVMs identify the hyperplane (H) with
Rosenblatt in 1958 [35–37]. Unlike perceptrons, SVMs identify the hyperplane (H) with the
the maximum separation
maximummargin, defined
separation in defined
margin, Equation (39).
in Equation (39).
ℎ(𝑥) = sign(𝑤 + 𝑏) (39)
h( x ) = sign w Tx + b (39)
The SVM classifies in {+1, −1}, emphasizing the key concept of finding a hyperplane
with maximum margin The𝜎.SVM classifies
Figure 6 illustrates −1},importance,
in {+1, this emphasizing with
the key
theconcept
marginof expressed
finding a hyperplane
with
in Equation (40) [35] maximum margin σ. Figure 6 illustrates this importance, with the margin expressed
in Equation (40) [35]
𝜎 = 𝑚𝑖𝑛 𝑤.
σ= 𝑥 min w.x j (40) (40)
( , ) ( x j ,y j )eD
Figure 6. Separating Hyperplanes and Maximum Margin Hyperplane in Support Vector Machines.
Figure 6. Separating Hyperplanes and Maximum Margin Hyperplane in Support Vector Machines.
Scaling
invariance enables flexible adjustment of u and δ. Smart value selection
Scaling invariance enables flexible adjustment of 𝑢 and δ. Smart value selection en-
ensures min u T + δ= 1),
x δ| = 1 introduced
, introduced as equality
an equality constraint
sures (min |𝑢 𝑥 + as an constraint in theinobjective
the objective per
per Equa-
∈x∈ D
Equation
tion (43) (43)
[37]:[37]:
[max
max 1 · 1|u⋅||𝑢| = min
2 = min |u||𝑢 | min
2 = u> 𝑢
= min u 𝒖] (43)
(43)
u,δ , u,δ , u,δ ,
Utilizingthe
Utilizing thefact that f𝑓(𝑧)
factthat (z) == z𝑧2 isismonotonically
monotonically increasing
increasing for for z𝑧 ≥
≥ 00 and
and ||𝑢|
u |≥≥00,,
where 𝑢 maximizing |𝑢| also maximizes > 𝑢 𝐮. This reformulates
where u maximizing |u |2 also maximizes u u. This reformulates the optimization problem the optimization
inproblem
Equation in(44),
Equation
and a(44), and a diagram
structural structuralofdiagram of a multi-SVM
a multi-SVM has been visualized
has been visualized in Figure 7.in
Figure 7.
minmin u>𝑢u 𝐮subject to ∀to
subject u T𝑦xi(𝑢
i, yi ∀𝑖, + δ𝑥 +≥δ) 0, ≥min
0, u T x|𝑢i +𝑥δ +=δ|1= 1
min (44)
u,δ
,
i (44)
Figure7.7.Structural
Figure Structuraldiagram
diagramofofthe
themulti-class
multi-classsupport
supportvector
vectormachine
machine(SVM).
(SVM).
2.5.4.
2.5.4.Bayes
Bayesand
andNaive
NaiveBayes
Bayes(NB)(NB)Classifier
Classifier
The
The Bayes classifier, an ideal algorithm, assigns
Bayes classifier, an ideal algorithm, assigns class
class labels
labels based on class probabil-
probabili-
ities
ties given observed features and prior knowledge. It predicts the
given observed features and prior knowledge. It predicts theclass
classwith
withthe
thehighest
highest
estimated
estimatedprobability, often
probability, used
often as as
used a benchmark
a benchmark butbut
requiring complete
requiring knowledge
complete of un-
knowledge of
derlying probability
underlying distributions.
probability To estimate
distributions. P(y|𝑃(𝑦|𝑥̅
To estimate x ) for)the
forBayes classifier,
the Bayes the common
classifier, the com-
approach is maximum
mon approach likelihood
is maximum estimation
likelihood (MLE),
estimation especially
(MLE), for the
especially fordiscrete variable
the discrete y,
varia-
as outlined in Equation (45) [37]:
ble y, as outlined in Equation (45) [37]:
∑m∑ I ( x𝐼(𝑥= x=∧𝑥̅ y∧i 𝑦= =y)𝑦)
| x ) =) =k=1 n k
P(y𝑃(𝑦|𝑥̅ (45)
(45)
∑i=∑1 I ( x𝐼(𝑥
i ==x )𝑥̅ )
Naive Bayes addresses MLE’s limitations with sparse data by assuming feature inde-
pendence. It estimates 𝑃(𝑦) and 𝑃(𝑥̅ |𝑦) instead of 𝑃(𝑦|𝑥̅ ) using Bayes’ rule (Equation
(46)) [37]:
𝑃(𝑥̅ |𝑦)𝑃(𝑦)
Mathematics 2023, 11, 4937 17 of 40
Naive Bayes addresses MLE’s limitations with sparse data by assuming feature indepen-
dence. It estimates P(y) and P( x |y) instead of P(y| x ) using Bayes’ rule (Equation (46)) [37]:
P( x |y) P(y)
P(y| x ) = (46)
P( x )
Generative learning estimates P(y) and P( x |y), with P(y) resembling tallying occur-
rences for discrete binary values (Equation (47)).
∑in=1 I (yi = c)
P(y = c) = = πc (47)
n
To simplify estimation, the Naive Bayes (NB) assumption is introduced, a key element
of the NB classifier. It assumes feature independence given the class label, formalized in
Equation (48) for P( x |y).
d
P( x |y) = ∏ P ( xα | y ) (48)
α=1
Here, xα is the value of feature α, assuming feature values, given class label y, are
entirely independent. Despite potential complex relationships, NB classifiers are effec-
tive. The Bayes classifier, defined in Equation (49), further simplifies to (50) due to P( x )
independence from y, and using logarithmic property, it can be expressed as (51).
n
h( x ) = argmaxy ∏ P( xα |y) P(y) (50)
α=1
n
h( x ) = argmaxy ∑ log( P( xα |y)) + log( P(y)) (51)
α=1
Estimating log( P( xα |y)) is straightforward for one dimension. P(y) remains unaf-
fected and is calculated independently. In Gaussian NB, where features are continuous
( xα ∈ R), P( xα |y) follows a Gaussian distribution (Equation (52)). This assumes each fea-
ture ( xα ) follows a class-conditional Gaussian distribution with mean µαc and variance
σ2αc (Equations (53) and (54)), using parameter estimates in the Gaussian NB classifier for
each class [37]. !
1 ( xα − µαc )2
P ( xα | y = d ) = p exp − (52)
2πσ2αc 2σ2αc
1 n
nd k∑
µαc = I (yk = d) xiα (53)
=1
1 n
nd k∑
σ2αc = I (yk = d)( xiα − µαc )2 (54)
=1
Now, by taking the logarithm of the product of Equation (57), we obtain Equation (73):
!
m m
log ∏ P(yk | xk , w) = − ∑ log 1 + e−yk w xk
T
(57)
k =1 k =1
To find the MLE for w, we aim to minimize the function provided in Equation (58):
m m
∑ log
Tx
= argmin(w) ∑ log 1 + e−yk w xk
T
w MLE = argmax (w) − 1 + e−yk w k (58)
k =1 k =1
Minimizing the function in Equation (58) is our goal, achieved through gradient
descent on the negative log likelihood in Equation (59).
m
∑
T
L(w) = log 1 + e−yk w xk (59)
k =1
m
w MAP = argmin ∑ log 1 + e−yk w xk + λw T w
T
(61)
w
k =1
k
I (D) = ∑ q m (1 − q m ) (62)
m =1
| DL | |D |
GT ( D ) = G ( D ) + R GT ( D R ) (63)
|D| T L |D|
𝐾𝐿(𝑝||𝑞) = 𝑝 𝑙𝑜𝑔 > 0 ← 𝐾𝐿 − 𝐷𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒
𝑞
1
= 𝑝 𝑙𝑜𝑔(𝑝 ) − 𝑝 𝑙𝑜𝑔(𝑞 ), 𝑤ℎ𝑒𝑟𝑒 𝑞 =
𝑐
Figure8.8.Binary
Figure BinaryDecision
DecisionTree
Treewith
withSole
SoleStorage
StorageofofClass
ClassLabels.
Labels.
Entropy in Decision
ID3 Algorithm: The Trees: Entropy stops
ID3 algorithm in decision trees measures
tree-building when alldisorder using
labels are the class
same
fractions. Minimizing entropy aligns with a uniform distribution, promoting random-
or no more attributes can split further. If all share the same label, a leaf with that label is
ness. KL-Divergence
created. KL( p||qattributes
If no more splitting ) gauges exist,
the closeness p tomost
of the
a leaf with a uniform distribution
frequent q, as in
label is generated
Equation (64).
(Equation (65)) [39].
c p
KL( p||q𝑖𝑓
) ∃ 𝑦⃗ 𝑠.
=𝑡.∑ ∀(𝑥,
pn𝑦)log ∈
qn 𝑆,
n
> 0 ←𝑦 KL= 𝑦⃗,
−𝑟𝑒𝑡𝑢𝑟𝑛 𝑙𝑒𝑎𝑓 𝑤𝑖𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 𝑦⃗
Divergence
𝐼𝐷3(𝑆): n =1 (65)
𝑖𝑓 ∃ 𝑥⃗ 𝑠. 𝑡. ∀(𝑥, 𝑦) ∈ 𝑆, 𝑥 = 𝑥⃗ 𝑟𝑒𝑡𝑢𝑟𝑛 𝑙𝑒𝑎𝑓 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑚𝑜𝑑𝑒 (𝑦: (𝑥, 𝑦) ∈ 𝑆)
= ∑ pn log( pn ) − pn log(qn ), where qn = 1c
n
CART (Classification and Regression Trees): CART (classification and regression
= ∑ pn log( pn ) + pn log(c) (64)
trees) is suitable for continuous
n labels (𝑦 ∈ 𝑅), using the squared loss function (Equation
(66)). It efficiently = ∑ pthe
finds n logbest + log
( pn )split (c)∑ pn , where
(attribute log(c) ← by
and threshold) constant, ∑ pn =
minimizing 1 average
the
n n n
squared difference from the average label 𝑦 [37].
maxKL( p||q) = max∑ pn log( pn ) = min−∑ pn log( pn ) = min H (s) ← Entropy
p p n p n p
1
𝐿(𝑆) = (𝑦 − 𝑦⃗) ← 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑓𝑟𝑜𝑚 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑙𝑎𝑏𝑒𝑙 (66)
|𝑆| ID3 Algorithm: The ID3 algorithm stops tree-building when all labels are the same
( , )∈
or no more attributes can split further. If all share the same label, a leaf with that label is
where 𝑦If⃗ no
created. ∑ splitting
= |more 𝑦 ← 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑙𝑎𝑏𝑒𝑙
attributes exist, a leaf with the most frequent label is generated
| ( , )∈
(Equation (65)) [39].
→ → →
(
i f2.5.7. Ensemble
∃ y s.t. ∀( x, y) Classifier
∈ S, y =(EC)
y , return lea f with label y
ID3(S) : → → (65)
i f ∃ x Ensemble
s.t. ∀( x, y)classifiers
∈ S, x =represent
x returnaleasophisticated
f without modeclass
(y of
: (machine
x, y) ∈ S)learning techniques
aimed at enhancing the precision and resilience of predictive models. Their fundamental
CART (Classification and Regression Trees): CART (classification and regression trees)
is suitable for continuous labels (yi ∈ R), using the squared loss function (Equation (66)). It
efficiently finds the best split (attribute and threshold) by minimizing the average squared
difference from the average label ys [37].
1 → 2
L(S) = ∑
|S| (i,j)∈S
y − yS ← Average squared difference from average label (66)
→ 1
|S| ∑(i,j)∈S
where yS = y ← average label
→ 2 → 2
h
2
i → → 2
E ( f k ( x ) − b) = E ( f k ( x ) − f ( x )) + E ( f ( x ) − c ()) + E ( c ( x ) − d( x )) (67)
| {z } | {z } | {z } | {z }
Error Variance Bias Noise
In Equation (67), we decompose the error into four components: “Error”, “Variance”,
“Bias”, and “Noise”. Our primary objective is to minimize the “Variance” term, which is
expressed as Equation (68):
→ 2
E[( f ( x ) − f ( x )) ] (68)
| k {z }
Variance
Here, each prediction serves as an input to the loss function. The function `(H) can be
expressed by Equation (71).
nl
l(H) = ∑ ( H (xi )) = l ( H ( x1 ), . . . , H ( xn )) (71)
i=1
This approximation enables the utilization of boosting as long as there exists a method,
denoted as A, capable of solving Equation (72).
n
∂l
ht+1 = argminh∈ H ∑ h( x ) (72)
∂[ H ( x )]
i =1 | {z i }
ri
Mathematics 2023, 11, 4937 21 of 40
GBRT minimizes the loss by iteratively adding weak learners to the ensemble. Pseudo-
code is in Algorithm 2 [41].
The gradient function ri needed to find the optimal weak learner is computed using
Equation (75).
∂L
ri = = − y i e − yi H ( xi ) (75)
∂H ( xi )
Mathematics 2023, 11, 4937 22 of 40
− ∑ wi e −α + ∑ wi eα = 0 e = ∑ wi (80)
i:h( xi )yi =1 i:h( xi )yi 6=1 i:h( xi )yi =−1
For further simplification, with ε representing the sum over misclassified examples, as
given in Equation (81):
−(1 − )e−α + eα = 0 (81)
Solving for α, as shown in Equation (82):
1−
e2α = (82)
1 1−
ln α= (83)
2
The optimal step size α, derived from the closed-form solution in (83), facilitates
rapid convergence in AdaBoost. After each step Ht+1 = Ht + αh, recalculating and re-
normalizing all weights is crucial for the algorithm’s progression. The pseudo-code for
AdaBoost Ensemble classifier is presented in Algorithm 3 [37,41].
Mathematics 2023, 11, 4937 23 of 40
Ht+1 = Ht + αh
−αh( xi )yi
∀ i : w i ← wi e 1
2e(1−e) 2
else
return (Ht )
end
return H
end
Figure9.9.Confusion
Figure ConfusionMatrix
Matrixfor
forMulticlass
MulticlassClassification
ClassificationEvaluation.
Evaluation.
Error Rate (ER): The reciprocal of accuracy equates to the error rate. It quantifies the
proportion of instances that the model incorrectly classifies. A lower error rate suggests
a more accurate model, and it is especially useful when you want to know how often the
model makes incorrect predictions.
FP + FN
%ER = × 100 = 100 − (%ACC )
Total Samples
Specificity (% Spe): True negative rate, commonly known as specificity, is a metric
that evaluates a model’s accuracy in correctly identifying true negative cases. This is crucial
in minimizing false alarms.
TN
Speci f icity(%Sp) = True Negative Rate (%TNR) = × 100
Total Negative
Sensitivity (% Sen): This metric, also termed recall or the true positive rate (TPR),
gauges the model’s capability to accurately identify true positive values, which correspond
to cases of cancer, among the total positive cases within a dataset [42].
TP
Sensitivity(%Sen) = Recall (%Re) = True Positive Rate (%TPR) = × 100
Total Positive
Precision (% Pr): Precision, also recognized as positive predictive value (PP), denotes
the ability to accurately predict positive values among the true positive predictions. A high
precision score signifies that the model effectively reduces false positive errors.
TP
Precision(%Pr) = Positive Predictivity(%PP) = × 100
True Prediction
F1 Score (% F1): An equitable metric that amalgamates positive predictive value and
recall forms the F1 score [44]. It is particularly valuable when you require a singular metric
that contemplates both incorrect positive predictions and missed positive predictions.
2 × TP 2PP × TPR
F1-score (%F1) = × 100 = ×100
(2 × TP + FP + FN) (PP + TPR)
Area Under the Curve (AUC): The AUC assesses the classifier’s capacity to differen-
tiate between affirmative and negative occurrences. It gauges the general efficacy of the
model concerning receiver operating characteristic (ROC) graphs. A superior AUC score
signifies enhanced differentiation capability.
Negative Predictive Value (% NPV): It measures the classifier’s capability to accu-
rately predict negative instances among all instances classified as negative. A high NPV
suggests that the classifier is effective at identifying non-cancer cases when it predicts them
as such, reducing the likelihood of unnecessary treatments.
TN
Negative Predictive Value (%NPV ) = ×100
Total Negative
False Positive Rate (%FPR): This quantifies how often the classifier falsely identifies
a negative instance as positive. It provides insights into the model’s propensity for false
positive errors. In cancer detection, a high FPR can lead to unnecessary distress and
treatments for individuals who do not have cancer.
FP
False Positive Rate (%FPR) = ×100
Total Negative
Mathematics 2023, 11, 4937 25 of 40
False Negative Rate (%FNR): It determines the classifier’s tendency to falsely identify
a positive instance as negative. It reveals the model’s performance regarding false negative
errors, which is critical in cancer detection to avoid missing real cases. High FNR can lead
to undiagnosed cancer cases and potentially delayed treatments.
FN
False Negative Rate (%FNR) = × 100
Total Positive
Matthews Correlation Coefficient (MCC): The Matthews correlation coefficient (MCC)
represents a pivotal metric utilized for evaluating the effectiveness of binary (two class) predic-
tions, prominently beneficial when dealing with scenarios where classes are asymmetrically
distributed in their volume and representation within the dataset. The formula to calculate
MCC is:
[( TN ∗ TP) − ( FN ∗ FP)]
p
((True Prediction) ∗ (False Predication) ∗ (Total Positive) ∗ (Total Negative))
3. Review Analysis
In this section, we present a thorough and extensive analysis of cancer detection
utilizing conventional machine learning models applied to medical imaging datasets. Our
study is focused exclusively on the detection of two specific types of cancer: colorectal
and stomach cancer. For each of these cancer types, we have meticulously compiled a
comprehensive review table that encompasses the relevant literature published during
the period spanning 2017 to 2023. This table encompasses a range of crucial review
parameters, including the year of publication, the datasets utilized, preprocessing methods,
feature extraction techniques, machine learning classifiers employed, the number of images
involved, the imaging modality, and various performance metrics. In total, our review
encompasses 36 research articles that have harnessed medical imaging datasets to detect
these specific types of cancer. Our primary emphasis lies in scrutinizing the utilization of
traditional machine learning methodologies in the context of cancer detection using image
datasets. We have conducted this analysis based on the meticulously assembled review
tables. Subsequent subsections provide in-depth and comprehensive reviews for both
colorectal and stomach cancer. Within our analysis, we delve into the intricate application
of machine learning approaches for the intent of cancer prediction. Our overarching goal
is to furnish valuable insights into the efficacy and constraints of conventional machine
learning models when applied to the realm of cancer detection using medical imaging
datasets. Through a meticulous examination and comparative analysis of results derived
from various studies, our objective is to make a meaningful contribution to the evolution of
cancer detection methodologies and to offer guidance for future research endeavors in this
critical domain.
RF, and KNN. These approaches successfully detect colorectal cancer using modalities
such as endocytoscopy, histopathological images, and clinical data. The studies employed
varying quantities of images, patients, or slices, ranging from 54 to 100,000. The “KCRC-16”
datasets are prominently featured in these analyses.
In a comparative analysis of colorectal cancer detection studies, (Talukder et al., 2022) [45]
stood out with an impressive accuracy of 100%. Their approach included preprocessing
steps like resizing, BGR2RGB conversion, and normalization. Deep learning models such as
DenseNet169, MobileNet, VGG19, VGG16, and DenseNet201 were employed. Performance
assessment was conducted using a combination of voting, XGB, EC, MLP, LGB, RF, SVM,
LR, and hybrid techniques on a dataset comprising 2800 H&E images from the LC25000
dataset. Their best model achieved a flawless 100% accuracy. In contrast, (Ying et al., 2022) [46]
achieved thelowest accuracy of 76.0% in colorectal cancer detection. Their approach involved
manual region of interest (ROI) selection and various preprocessing techniques. They leveraged
multiple features, including FOS, shape, GLCM, GLSZM, GLRLM, NGTDM, GLDM, LoG,
and WT. Classification was carried out using the MLR technique on a dataset consisting of
276 CECT images from a private dataset. Their least-performing model achieved an accuracy
of 76.00%. Moreover, their study exhibited a sensitivity of 65.00%, specificity of 80.00%, and
precision of 54.00%, indicating relatively suboptimal performance in accurately identifying
colorectal cancer cases.
(Khazaee Fadafen and Rezaee 2023) [47] conducted a remarkable colorectal cancer
detection study by utilizing a substantial dataset (the highest number of images among all)
comprising a total of 100,000 medical images sourced from the H&E NCT-CRC-HE-100K
dataset. Their preprocessing methodology encompassed the conversion of RGB images
to the HSV color space and the utilization of the lightness space. For classification, they
harnessed the dResNet architecture in conjunction with DSVM, which resulted in an out-
standing accuracy rate of 99.76%. (Jansen-winkeln et al., 2021) [48] conducted a study with
a notably smallest dataset, comprising only 54 medical images. Their preprocessing ap-
proach included smoothing and normalization. For classification purposes, they employed
a combination of MLP, SVM, and RF techniques. This approach yielded commendable
results with an accuracy of 94.00%, sensitivity at 86.00%, and specificity reaching 95.00%.
Notably, their analysis identified MLP as the most effective model in their study.
Within the corpus of 20 studies dedicated to the realm of colorectal cancer detection,
researchers have deployed an array of diverse preprocessing strategies encompassing
endocytoscopy, cropping, IPP, stain normalization, CEI, smoothing, normalization, filtering,
THN, DRR, augmentation, UM-SN, resizing, BGR2RGB, normalization, scaling, labeling,
RGBG, VTI, HOG, RGB to HSV, lightness space, edge preserving, and linear transforma-
tion. These sophisticated methodologies collectively served as the linchpin for optimizing
machine learning-based colorectal cancer detection, ushering in a new era of precision
and accuracy. However, it is captivating to note that, within the comprehensive assess-
ment of 23 studies, a select quartet of research endeavors chose to forgo the utilization
of any specific preprocessing techniques. This exceptional cluster includes the works of
(Bora et al., 2021) [49], (Fan et al., 2021) [50], and (Lo et al., 2023) [51]. Astonishingly, these
studies defied conventional wisdom by attaining commendable accuracies that spanned
the spectrum from 94.00% to an impressive 99.44%. Such outcomes suggest that, in cases
where the dataset is inherently pristine and impeccably aligned with the demands of the
classification task, the impact of preprocessing techniques on the classifier’s performance
might indeed exhibit a marginal influence.
In the comprehensive analysis of the research studies under scrutiny, it is noteworthy
that only the works of (Grosu et al., 2021) [52] and (Ying et al., 2022) [46] registered
accuracy figures falling below the 90% threshold, specifically at 84.7% and 76%, respectively.
This observation underscores the intriguing possibility that traditional machine learning
models can indeed yield highly accurate cancer detection performance, provided they are
meticulously optimized.
Mathematics 2023, 11, 4937 27 of 40
Year References Pre- Features Techniques Dataset Data Train Test Modality Metrics (%)
Processing Samples Data Data
2017 [53] Endocytoscopy Texture, SVM Private 5843 5643 200 ENI Acc 94.1
nuclei Sen 89.4
Spe 98.9
Pre 98.8
NPV 90.1
2019 [54] IPP CSQ, Color WSVMCS Private 180 108 72 H&E Acc 96.0
histogram
2019 [55] Cropping Biophysical NB, MLP, OMIS 316 237 79 OMIS Acc 92.6
characteristic, data Sen 96.3
WLD, Spe 88.9
2021 [56] Filtering HOS, FOS, ANN, KCRC-16 5000 4550 450 H&E Acc 95.3
GLCM, Gabor, RSVM,
WPT, LBP
2021 [57] IPP, Augmen- VGG-16 MLP KCRC-16 5000 4825 175 H&E Acc 99.0
tation Sen 96.0
Spe 99.0
Pre 96.0
NPV 99.0
F1 96.0
2021 [50] --- AlexNet EC, SVM, LC25000 10,000 4-fold cross H&E Acc 99.4
AlexNet, validation
2021 [58] THN, DRR BmzP NN MALDI 559 Leave-One-Out H&E Acc 98.0
MSI cross-validation Sen 98.2
Spe 98.6
2021 [52] Filtering Filters, RF Private 287 169 77 CT Acc 84.7 *
Texture, Sen 82.0
GLHS, Shape Spe 85.0
AUC 91.0
2021 [49] --- GFD, MLP Private 734 five-fold NBI, Acc 95.7
NSCT, Shape LSSVM, cross-validation WLI Sen 95.3
Spe 95.0
Pre 93.2
F1 90.5
2021 [48] Normalization, Spatial MLP, Private 54 Leave-One-Out HSI Acc 94.0
smoothing Information SVM, RF cross-validation Sen 86.0
Spe 95.0
2022 [59] VTI Haralick, VTF RF Private 63 cross-validation CT Acc 92.2
method Sen 88.4
Spe 96.0
AUC 96.2
2022 [60] RGBG GLCM ANN, RF, KCRC-16 5000 4500 500 H&E Acc 98.7
KNN Sen 98.6
Spe 99.0
Pre 98.9
2022 [45] Resize, Deep Features EC, LC25000 2800 10-fold H&E Acc 100.0
BGR2RGB, Hybrid, cross-validation
Normaliza- LR, LGB,
tion, MLP, RF,
SVM,
XGB,
Voting
2022 [46] ROI FOS, GLCM, MLR Private 276 194 82 CECT Acc 76.0
GLDM, Sen 65.0
GLRLM, Spe 80.0
GLSZM, LoG, Pre 54.0
NGTDM, NPV 86.0
Shape, WT
Mathematics 2023, 11, 4937 28 of 40
Table 3. Cont.
Year References Pre- Features Techniques Dataset Data Train Test Modality Metrics (%)
Processing Samples Data Data
2022 [61] UM-SN HIM, GLCM, LDA, LC25000 1000 900 100 H&E Acc 99.3
Statistical MLP, RF, Sen 99.5
SVM, Pre 99.5
XGB, F1 99.5
LGB
2022 [26] --- Color Spaces, ANN, DT, KCRC-16 5000 3504 1496 H&E Acc 97.3
Haralick KNN, Sen 97.3
QDA, Spe 99.6
SVM Pre 97.4
2023 [62] Filtering, Color CatBoost, NCT- 12,042 8429 3613 H&E Acc 90.7
linear Trans- characteristic, DT, GNB, CRCHE- Sen 97.6
formation, DBCM, KNN, RF 7K Spe 97.4
normalization SMOTE Pre 90.6
Rec 90.5
F1 90.5
2023 [51] --- Clinical, SEKNN Private 1729 tenfold ENI Acc 94.0
FEViT cross-validation Sen 74.0
Spe 98.0
AUC 93.0
Lightness KCRC-16 5000 4000 1000 H&E Acc 98.8
2023 [47] space, RGB to dResNet DSVM
HSV NCT- 100,000 80,003 19,997 H&E Acc 99.8
CRC-HE-
100
K
2023 [63] HOG, RGBG, Morphological SVM Private 540 420 120 ENI Acc 97.5
Resizing
* Not given in the paper, calculated from the result table, bold font signifies the best model in the ‘Techniques’
column. Abbreviations: BGR2RGB, Blue-Green-Red to Red-Green-Blue; BmzP, Binning of m/z Points; catBoost,
Categorical Boosting; CECT, Contrast-Enhanced CT; CSQ, Color Space Quantization; DBCM, Differential Box
Count Method; DSVM, Deep Support Vector Machine; dResNet, Dilated ResNet; DRR, Dynamic Range Reduction;
DSVM, Deep Support Vector Machine; ENI, Endomicroscopy Images; FEViT, Feature Ensemble Vision Transformer;
FOS, First-Order Statistics; GFD, Generic Fourier Descriptor; GNB, Gaussian Naive Bayes; GLDM, Gray-Level
Dependence Matrix; GLHS, Gray Level Histogram Statistics; GLSZM, Gray Level Size Zone Matrix; GNB,
Gaussian Naive Bayes; HOG, Histogram of Oriented Gradients; HOS, Higher-Order Statistic; HIM, Hu Invariants
Moments; HSI, Hyperspectral Imaging; HSV, Hue-Saturation-Value; LBP, Local Binary Pattern; LDA, Linear
Discriminant Analysis; LGB, Light Gradient Boosting; LoG, Laplacian of Gaussian; LSSVM, Least Square Support
Vector Machine; MLR, Multivariate Logistic Regression; NGTDM, Neighboring Gray Tone Difference Matrix;
NSCT, Non-Subsampled Contourlet Transform; OMIS, Optomagnetic Imaging Spectroscopy; QDA, Quadratic
Discriminant Analysis; SEKNN, Subspace Ensemble K-Nearest Neighbor; THN, TopHat and Normalization;
UMSN, Unsharp Masking and Stain Normalization; VTF, Vector Texture Features; VTI, Vector Texture Images;
WLD, Wavelength Difference; WLI, White Light Imaging; WPT, Wavelet Packet Transform; WSVMCS, Wavelet
Kernel SVM with Color Histogram; XGB, Extreme Gradient Boosting.
The analysis of colorectal cancer detection using traditional machine learning tech-
niques reveals a notable disparity in model performance across various crucial metrics,
showcasing substantial discrepancies between the models with the highest and lowest val-
ues as shown in Figure 10. The most proficient model achieved an extraordinary accuracy
of 100.0%, whereas the least effective model achieved an accuracy of 76.0%, resulting in a
substantial difference of 24.0%. When considering sensitivity, the top-performing model
reached an impressive 99.5%, whereas the lowest-performing model registered a mere
65.0%, leading to a remarkable disparity of 34.5%. Similarly, concerning specificity, the
superior model attained 99.6%, while the inferior model managed only 80.0%, resulting
in a significant difference of 19.6%. In terms of precision, the best model demonstrated
99.5%, while the worst model exhibited a precision of only 54.0%, resulting in a substantial
difference of 45.5%. When examining the F1-score, the model with the highest performance
achieved 99.5%, whereas the least proficient model attained a score of 63.2%, yielding a
notable difference of 36.3%. Lastly, in the case of the area under the curve (AUC), the
top model achieved a score of 96.2%, while the bottom model scored 76.0%, marking a
model reached an impressive 99.5%, whereas the lowest-performing model registered
mere 65.0%, leading to a remarkable disparity of 34.5%. Similarly, concerning specifici
the superior model attained 99.6%, while the inferior model managed only 80.0%, resu
ing in a significant difference of 19.6%. In terms of precision, the best model demonstrat
99.5%, while the worst model exhibited a precision of only 54.0%, resulting in a substant
Mathematics 2023, 11, 4937 difference of 45.5%. When examining the F1-score, the model with the highest
29 of 40 perfo
mance achieved 99.5%, whereas the least proficient model attained a score of 63.2%, yiel
ing a notable difference of 36.3%. Lastly, in the case of the area under the curve (AUC
the top model achieved a score of 96.2%, while the bottom model scored 76.0%, marki
significant difference of 20.2%. These conspicuous differences underscore the pivotal role
a significant difference of 20.2%. These conspicuous differences underscore the pivo
of choosing appropriate machine learning techniques and feature sets in the effectiveness
role of choosing appropriate machine learning techniques and feature sets in the effectiv
of colorectal cancer detection.
ness of colorectalEffective cancer detection
cancer detection. Effective has far-reaching
cancer implications,
detection has far-reaching implic
influencing nottions,
only patient outcomes but also the operational efficiency of
influencing not only patient outcomes but also the operational healthcare efficiency
systems and thehealthcare
allocationsystems
of valuable medical
and the resources.
allocation of valuable medical resources.
Figure
Figure 10. Metrics 10. Metrics
comparison comparison
for the forof
prediction the prediction
colorectal of colorectal cancer.
cancer.
detection study with a large dataset of 245,196 medical images. They used various prepro-
cessing techniques, including ROI selection, cropping, filtering, rotation, and disruption.
The study extracted features such as color histograms, LBP, and GLCM. For classifica-
tion, they applied RF and LSVM classifiers, achieving an accuracy of 85.99%. RF was the
best-performing model in their analysis. On the other hand, (Naser and Zeki 2021) [67]
conducted a stomach cancer detection study with a smaller dataset of only 30 medical
images. They applied DIFQ-based preprocessing techniques, and their study used FCM for
classification and achieved an accuracy of 85.00%. Table 4 provides an overview of different
machine learning-based techniques for stomach (gastric) cancer detection, encompassing
16 reviewed studies. Notably, three of these studies specifically, namely, (Korkmaz and
Esmeray 2018) [68], (Nayyar et al., 2021) [69], and (Hu et al., 2022a) [70], opted not to
employ any preprocessing techniques. Surprisingly, they achieved noteworthy accuracies
of 87.77%, 99.8%, and 85.24%, respectively. This demonstrates the potential for effective
stomach cancer detection even in the absence of preprocessing methods. However, it is
essential to highlight that a significant portion of the studies examined in the table chose
to implement various preprocessing techniques, including CEI, filtering, resizing, Fourier
transform, cropping, ROI selection, rotation, disruption, binarization, augmentation, and
RSA. These preprocessing steps underscore their pivotal role in enhancing the performance
of machine learning models for stomach cancer detection.
Out of the 16 studies focused on gastric cancer detection, 50% of them (8 studies)
achieved an accuracy rate of over 90%, indicating highly accurate results. However, the
other 50% of the studies received less than 90% accuracy. This discrepancy in performance
might be attributed to the utilization of private datasets in these studies. Private datasets
may not undergo the same level of processing or standardization as publicly available
datasets, potentially leading to variations in data quality and affecting the performance of
the machine learning models.
Year References Preprocessing Features Techniques Dataset Data Train Test Modality Metrics (%)
Samples Data Data
2018 [71] Fourier BRISK, SURF, DT, DA Private 180 90 90 H&E Acc 86.7
transform MSER
2018 [72] Resizing LBP, HOG ANN, RF Private 180 90 90 H&E Acc 100.0
Table 4. Cont.
Year References Preprocessing Features Techniques Dataset Data Train Test Modality Metrics(%)
Samples Data Data
2021 [65] Filtering, ROI LoG, WT, GBM, DT, Private 159 Leave-One-Out CT Acc 71.2
GLDM, RF, LR, cross-validation Sen 43.1
GLRLM SVM. Spe 87.1
Pre 65.8
2022 [77] Augmentation, InceptionNet, SVM, RF, HKD 10,662 37,788 9610 Endoscopy Acc 98.0
resizing, VGGNet KNN. (47,398 Sen 100
filtering Augm- Pre 100
neted) F1 100
MCC 97.8
2022 [70] --- GLCM, LBP, NSVM, GasHisSDB 245196 196,157 49,039 H&E Acc 85.2
HOG, LSVM, Sen 84.9 #
histogram, LR, NB, Pre 84.6 #
luminance, RF, ANN, Spe 84.9 #
Color KNN F1 84.8 #
histogram
2022 [64] Binarization, VGG19 Bagged Private 2590 10-fold EUS Acc 99.8
CEI, filtering, Alexnet Tree, cross-validation Sen 99.8
resizing Coarse Pre 99.8
Tree, F1 99.8
CSVM, AUC 100
CKNN,
DT, Fine
Tree,
KNN, NB
2022 [66] Cropping, Color LSVM, GasHisSDB 245,196 196,157 49,039 H&E Acc 85.9
disruption, histogram, RF Sen 86.2 #
filtering, ROI, GLCM, LBP Spe 86.2 #
Rotation Pre 85.7 #
F1 85.9 #
2023 [78] Augmentation, MobileNet- Bayesian, KV2D 4854 10-fold Endoscopy Acc 96.4
CEI V2 CSVM, cross-validation Pre 97.6
LSVM, Sen 93.0
QSVM, F1 95.2
Softmax
2023 [79] RSA RSF PLS-DA, Private 450 Leave-One-Out H&E Acc 94.8
LOO, cross validation Sen 91.0
SVM Spe 100
AUC 95.8
# Calculated by averaging the normal and abnormal class, Bold Font techniques represent the best model.
Abbreviations: BPSVM, Binary Robust Invariant Scalable Keypoints; BRISK, Binary Robust Invariant Scalable
Keypoints; CKNN, Cosine K-Nearest Neighbor; CSVM, Cubic SVM; DA, Discriminant Analysis; DIFQ, Dividing
an image into four quarters; FCM, Fuzzy C-Means; GGF, Global Graph Features; HOG, Histogram of Oriented
Gradients; HTSS, Hybrid Tumor Segmentation; KMC, K-Means Clustering; LOO, Leave-One-Out; LSVM, Linear
Support Vector Machine; MSER, Maximally Stable Extremal Regions; NSVM, Non-Linear Support Vector Machine;
OAT, Otsu Adaptive Thresholding; PLS-DA, Partial Least-Squares Discriminant Analysis; QSVM, Quadratic
SVM; RSA, Raman Spectral Analysis; RSF, Raman Spectral Feature; SM, Seven Moments Invariants; SMI, Seven
Moments Invariants; SURF, Speeded Up Robust Features; TSS, Tumor Scattered Signal.
The analysis of gastric cancer detection reveals substantial variations in model perfor-
mance across key metrics, with significant differences observed between the highest and
lowest values as shown in Figure 11. Accuracy (Acc) showcased a noteworthy contrast,
with the best-performing model achieving a flawless 100.00% and the least effective model
scoring 71.20%. This substantial 28.80% difference underscores the pivotal role of model
selection in achieving accurate gastric cancer detection. Sensitivity (Sen) displayed a con-
siderable gap, with the top model achieving a perfect 100.00%, while the lowest model only
reached 43.10%. This marked difference of 56.90% emphasizes the necessity of sensitive
detection techniques in identifying gastric cancer. Similarly, specificity (Spe) followed suit,
with the highest model reaching 100.00% and the lowest model achieving 68.10%. The
substantial 31.90% difference highlights the importance of correctly identifying non-cancer
Mathematics 2023, 11, 4937 32 of 40
cases in diagnostic accuracy. Precision (Pre) also exhibited a significant disparity, with
the best model achieving 100.00%, and the least effective model achieving 65.80%. The
difference of 34.20% underscores the significance of precise identification of gastric cancer
cases. It is noteworthy that the negative predictive value (NPV) remained constant at
50.00% for both the highest and lowest models, signifying that neither model excelled in
ruling out non-cancer cases. However, since NPV is only used in a single article, its impact
on the overall analysis may be limited.
Additionally, the F1-score showed a substantial difference, with the top model achiev-
ing a perfect 100.00%, while the lowest model reached 84.80%. The 15.20% difference
emphasizes the balance between precision and sensitivity in gastric cancer detection. Lastly,
in terms of the area under the curve (AUC), the best model achieved a near-perfect 100.00%,
while the lowest model attained a still impressive 95.80%. The modest 4.20% difference
Mathematics 2023, 11, x FOR PEER REVIEW
indicates that both models performed well in distinguishing between gastric cancer and 33 of 42
non-cancer cases. It is also worth noting that the area under the curve (AUC) metric was
utilized in only three articles, and the differences in AUC were relatively modest. There-
fore, the impact metric
of AUC wasonutilized in onlyanalysis
the overall three articles,
may and the differences
be less generalized.in AUC
These were relatively mod-
findings
underscore the critical role of model choice and feature selection in the effective detec- These
est. Therefore, the impact of AUC on the overall analysis may be less generalized.
findings
tion of gastric cancer. underscore
Accurate andthesensitive
critical role of model tools
diagnostic choiceare
andcrucial
featurefor
selection in the effective
improving
detection of gastric cancer. Accurate and sensitive diagnostic tools are crucial for improv-
patient outcomes and optimizing healthcare resources. While NPV and AUC may have a
ing patient outcomes and optimizing healthcare resources. While NPV and AUC may
limited impact in this context due to their restricted usage, the other metrics highlight the
have a limited impact in this context due to their restricted usage, the other metrics high-
significance of selecting
light the appropriate
significance ofmodels forappropriate
selecting reliable gastric
modelscancer detection.
for reliable gastric cancer detection.
60.00
40.00
20.00
0.00
Acc Sen Spe Pre NPV F1 Score AUC
is a testament to the adaptability and robustness of our approach, as it allows for the
Mathematics 2023, 11, x FOR PEER REVIEW
incorporation 35 ofAt
of diverse data sources to enrich the depth and scope of our analysis. 42the
crux of our methodology lies the preprocessing phase, an instrumental step that sets the
stage for the rigorous examination of input images. Within this phase, we meticulously
execute four pivotal steps: Image Enhancement, Pixel Enhancement, RGB-to-Gray Con-
version, and Image Segmentation. These sequential operations are not arbitrary but have
been thoughtfully selected and implemented to systematically prepare the input images.
Their collective objective is to optimize the images, ensuring they are in a suitable form
for efficient feature extraction and subsequent in-depth analysis. The realm of feature
engineering is where our approach truly shines. Here, we introduce an innovative and
nuanced strategy. Instead of relying solely on one type of feature, we merge two distinct
categories: deep learning-based features, which are often referred to as “deep features”, and
a varied assortment of other features. This assortment includes Discrete Wavelet Transform
(DWT), Gray Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP), Texture,
and Gray Level Size Zone Matrix (GLSZM). The fusion of these diverse feature sets is not a
random choice but a deliberate effort to enhance the robustness and comprehensiveness
of our analysis. This fusion is designed to ensure that our model captures both the intri-
cate, high-level representations obtained through deep learning and handcrafted features
meticulously tailored to highlight specific aspects of tumor characteristics. By incorporat-
ing these different types of features, our model becomes versatile, capable of effectively
identifying patterns and characteristics in the data that may not be discernible when using
only one type of feature. By executing this innovative approach, we aim to enhance the
model’s ability to interpret and understand the complex information contained within
medical images. This, in turn, contributes to the accuracy and efficiency of colorectal cancer
detection. Furthermore, it enables our model to adapt and excel in different scenarios and
datasets, making it a powerful tool for healthcare professionals and researchers working in
the field of cancer detection.
Figure 12. Proposed architectural flow diagram for the detection of colorectal cancer using tradi-
Figure
tional 12. Proposed
machine architectural
learning flowimaging
models from diagramdatabase.
for the detection of colorectal cancer using traditional
machine learning models from imaging database.
Mathematics 2023, 11, 4937 34 of 40
The combination of these diverse features enhances the model’s capability to en-
compass both intricate, high-level representations acquired through deep learning and
meticulously tailored handcrafted features that accentuate distinct tumor characteristics.
Moving forward in the workflow, we encounter the crucial stages of feature selection and
optimization. This pivotal process serves a dual role: it reduces feature redundancy while
enhancing the overall model performance by focusing on the most distinctive attributes.
Our model evaluation process is underpinned by a rigorous data-partitioning strategy,
effectively splitting the dataset into training and testing subsets. The training dataset
undergoes additional scrutiny through a k-fold cross-validation approach, fortifying the
model’s training and facilitating a robust performance assessment. This approach not
only guards against overfitting but also assesses the model’s adaptability to various data
scenarios. The test dataset becomes the arena for predicting colorectal cancer, with the
cubic support vector machine (SVM) taking the lead in this classification task. The SVM is a
formidable presence among traditional machine learning classifiers, known for its prowess
in handling high-dimensional data and executing binary classification tasks, making it
ideally suited for the intricacies of cancer detection. In summary, our proposed model
architecture harmoniously integrates advanced image preprocessing techniques, innovative
feature-engineering methodologies, and the proven machinery of a traditional machine
learning classifier. This synthesis yields an efficient and accurate framework for colorectal
cancer detection. Pending further validation and testing on diverse datasets, this approach
has the potential to revolutionize early cancer detection and diagnosis, potentially leading
to improved patient outcomes and a transformation in healthcare effectiveness.
Figure 13. Proposed architectural flow diagram for the detection of stomach cancer using traditional
Proposed
Figure 13.learning
machine architectural
models flow diagram
from imaging dataset.for the detection of stomach cancer using traditional
machine learning models from imaging dataset.
Mathematics 2023, 11, 4937 36 of 40
to avoid overfitting. Future research should focus on refining model selection strategies to
enhance the robustness of cancer detection techniques and improve diagnostic accuracy.
5. Conclusions
In this manuscript, a thorough review and analysis of colorectal and gastric cancer de-
tection using traditional machine learning techniques are presented. We have meticulously
scrutinized 36 research papers published between 2017 and 2023, specifically focusing on
the domain of medical imaging datasets for detecting these types of cancers. Mathematical
formulations elucidating frequently employed preprocessing techniques, feature extraction
methods, traditional machine learning classifiers, and assessment metrics are provided.
These formulations offer valuable guidance to researchers when selecting the most suitable
techniques for their cancer detection studies. To conduct this analysis, a range of criteria
such as publication year, preprocessing methods, dataset particulars, image quantities,
modality, techniques, best models, and metrics (%) were considered. An extensive array
of metrics was employed to evaluate model performance comprehensively. Notably, the
study delves into the highest and lowest metric values and their disparities, highlighting
opportunities for enhancement. Remarkably, we found that the highest achievable value for
all metrics reached an astonishing 100%, with gastric cancer detection registering the lowest
sensitivity at 43.10%. This underscores the potential of traditional ML classifiers, while
indicating areas for further refinement. Drawing from these insights, we present a proposed
(optimized) methodology for both colorectal and gastric cancer detection, aiding in the
selection of an optimized approach for future cancer detection research. The manuscript
concludes by delineating key findings and challenges that offer valuable directions for
future research endeavors.
In our future research endeavors, we plan to implement the proposed optimized
methodology for the detection of colorectal and gastric cancer within the specified exper-
imental framework. This proactive approach aligns with our commitment to enhancing
the effectiveness of cancer detection methodologies. Furthermore, we will conscientiously
incorporate and address the challenges and limitations identified in this study, ensuring a
comprehensive and iterative improvement in our investigative efforts.
Author Contributions: Original Draft Preparation: H.M.R.; Review and Editing: H.M.R.; Visualiza-
tion: H.M.R.; Supervision: J.Y.; Project Administration: J.Y.; Funding Acquisition: J.Y. All authors
have read and agreed to the published version of the manuscript.
Funding: This work was supported by the National Research Foundation of Korea (NRF) Grant
funded by the Korea government (MSIT) (NRF-2021R1F1A1063640).
Data Availability Statement: Data sharing is not applicable to this article as no datasets were
generated or analyzed during the current study.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Faguet, G.B. A brief history of cancer: Age-old milestones underlying our current knowledge database. Int. J. Cancer 2014,
136, 2022–2036. [CrossRef] [PubMed]
2. Afrash, M.R.; Shafiee, M.; Kazemi-Arpanahi, H. Establishing machine learning models to predict the early risk of gastric cancer
based on lifestyle factors. BMC Gastroenterol. 2023, 23, 6. [CrossRef] [PubMed]
3. Kumar, Y.; Gupta, S.; Singla, R.; Hu, Y.-C. A systematic review of artificial intelligence techniques in cancer prediction and
diagnosis. Arch. Comput. Methods Eng. 2021, 29, 2043–2070. [CrossRef] [PubMed]
4. Nguon, L.S.; Seo, K.; Lim, J.-H.; Song, T.-J.; Cho, S.-H.; Park, J.-S.; Park, S. Deep learning-based differentiation between mucinous
cystic neoplasm and serous cystic neoplasm in the pancreas using endoscopic ultrasonography. Diagnostics 2021, 11, 1052.
[CrossRef] [PubMed]
5. Kim, S.H.; Hong, S.J. Current status of image-enhanced endoscopy for early identification of esophageal neoplasms. Clin. Endosc.
2021, 54, 464–476. [CrossRef] [PubMed]
6. NCI. What Is Cancer?—NCI. National Cancer Institute. Available online: https://www.cancer.gov/about-cancer/understanding/
what-is-cancer (accessed on 9 June 2023).
Mathematics 2023, 11, 4937 38 of 40
7. Zhi, J.; Sun, J.; Wang, Z.; Ding, W. Support vector machine classifier for prediction of the metastasis of colorectal cancer. Int. J.
Mol. Med. 2018, 41, 1419–1426. [CrossRef] [PubMed]
8. Zhou, H.; Dong, D.; Chen, B.; Fang, M.; Cheng, Y.; Gan, Y.; Zhang, R.; Zhang, L.; Zang, Y.; Liu, Z.; et al. Diagnosis of Distant
Metastasis of Lung Cancer: Based on Clinical and Radiomic Features. Transl. Oncol. 2017, 11, 31–36. [CrossRef] [PubMed]
9. Levine, A.B.; Schlosser, C.; Grewal, J.; Coope, R.; Jones, S.J.; Yip, S. Rise of the Machines: Advances in Deep Learning for Cancer
Diagnosis. Trends Cancer 2019, 5, 157–169. [CrossRef] [PubMed]
10. Huang, S.; Yang, J.; Fong, S.; Zhao, Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges.
Cancer Lett. 2019, 471, 61–71. [CrossRef]
11. Saba, T. Recent advancement in cancer detection using machine learning: Systematic survey of decades, comparisons and
challenges. J. Infect. Public Health 2020, 13, 1274–1289. [CrossRef]
12. Shah, B.; Alsadoon, A.; Prasad, P.; Al-Naymat, G.; Beg, A. DPV: A taxonomy for utilizing deep learning as a prediction technique
for various types of cancers detection. Multimed. Tools Appl. 2021, 80, 21339–21361. [CrossRef]
13. Majumder, A.; Sen, D. Artificial intelligence in cancer diagnostics and therapy: Current perspectives. Indian J. Cancer 2021,
58, 481–492. [CrossRef] [PubMed]
14. Bin Tufail, A.; Ma, Y.-K.; Kaabar, M.K.A.; Martínez, F.; Junejo, A.R.; Ullah, I.; Khan, R. Deep Learning in Cancer Diagnosis and
Prognosis Prediction: A Minireview on Challenges, Recent Trends, and Future Directions. Comput. Math. Methods Med. 2021,
2021, 9025470. [CrossRef] [PubMed]
15. Kumar, G.; Alqahtani, H. Deep Learning-Based Cancer Detection-Recent Developments, Trend and Challenges. Comput. Model.
Eng. Sci. 2022, 130, 1271–1307. [CrossRef]
16. Painuli, D.; Bhardwaj, S.; Köse, U. Recent advancement in cancer diagnosis using machine learning and deep learning techniques:
A comprehensive review. Comput. Biol. Med. 2022, 146, 105580. [CrossRef] [PubMed]
17. Rai, H.M. Cancer detection and segmentation using machine learning and deep learning techniques: A review. Multimed. Tools
Appl. 2023, 1–35. [CrossRef]
18. Maurya, S.; Tiwari, S.; Mothukuri, M.C.; Tangeda, C.M.; Nandigam, R.N.S.; Addagiri, D.C. A review on recent developments in
cancer detection using Machine Learning and Deep Learning models. Biomed. Signal Process. Control. 2023, 80, 104398. [CrossRef]
19. Mokoatle, M.; Marivate, V.; Mapiye, D.; Bornman, R.; Hayes, V.M. A review and comparative study of cancer detection using
machine learning: SBERT and SimCSE application. BMC Bioinform. 2023, 24, 112. [CrossRef]
20. Rai, H.M.; Yoo, J. A comprehensive analysis of recent advancements in cancer detection using machine learning and deep learning
models for improved diagnostics. J. Cancer Res. Clin. Oncol. 2023, 149, 14365–14408. [CrossRef]
21. Ullah, A.; Chen, W.; Khan, M.A. A new variational approach for restoring images with multiplicative noise. Comput. Math. Appl.
2016, 71, 2034–2050. [CrossRef]
22. Azmi, K.Z.M.; Ghani, A.S.A.; Yusof, Z.M.; Ibrahim, Z. Natural-based underwater image color enhancement through fusion of
swarm-intelligence algorithm. Appl. Soft Comput. 2019, 85, 105810. [CrossRef]
23. Alruwaili, M.; Gupta, L. A statistical adaptive algorithm for dust image enhancement and restoration. In Proceedings of
the 2015 IEEE International Conference on Electro/Information Technology (EIT), Dekalb, IL, USA, 21–23 May 2015; IEEE:
Piscataway, NJ, USA, 2015; pp. 286–289.
24. Cai, J.-H.; He, Y.; Zhong, X.-L.; Lei, H.; Wang, F.; Luo, G.-H.; Zhao, H.; Liu, J.-C. Magnetic Resonance Texture Analysis in
Alzheimer’s disease. Acad. Radiol. 2020, 27, 1774–1783. [CrossRef]
25. Chandrasekhara, S.P.R.; Kabadi, M.G.; Srivinay, S. Wearable IoT based diagnosis of prostate cancer using GLCM-multiclass SVM
and SIFT-multiclass SVM feature extraction strategies. Int. J. Pervasive Comput. Commun. 2021. ahead-of-print. [CrossRef]
26. Alqudah, A.M.; Alqudah, A. Improving machine learning recognition of colorectal cancer using 3D GLCM applied to different
color spaces. Multimed. Tools Appl. 2022, 81, 10839–10860. [CrossRef]
27. Vallabhaneni, R.B.; Rajesh, V. Brain tumour detection using mean shift clustering and GLCM features with edge adaptive total
variation denoising technique. Alex. Eng. J. 2018, 57, 2387–2392. [CrossRef]
28. Rego, C.H.Q.; França-Silva, F.; Gomes-Junior, F.G.; de Moraes, M.H.D.; de Medeiros, A.D.; da Silva, C.B. Using Multispectral
Imaging for Detecting Seed-Borne Fungi in Cowpea. Agriculture 2020, 10, 361. [CrossRef]
29. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [CrossRef]
30. Callen, J.L.; Segal, D. An Analytical and Empirical Measure of the Degree of Conditional Conservatism. J. Account. Audit. Financ.
2013, 28, 215–242. [CrossRef]
31. Weinberger, K. Lecture 2: K-Nearest Neighbors. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/
lecturenote02_kNN.html (accessed on 12 November 2023).
32. Weinberger, K. Lecture 3: The Perceptron. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/
lecturenote03.html (accessed on 12 November 2023).
33. Watt, J.; Borhani, R.; Katsaggelos, A.K. Machine Learning Refined; Cambridge University Press (CUP): Cambridge, UK, 2020;
ISBN 9781107123526.
34. Watt, R.B.J. 13.1 Multi-Layer Perceptrons (MLPs). Available online: https://kenndanielso.github.io/mlrefined/blog_posts/13
_Multilayer_perceptrons/13_1_Multi_layer_perceptrons.html (accessed on 12 November 2023).
35. Weinberger, K. Lecture 9: SVM. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote09.
html (accessed on 13 November 2023).
Mathematics 2023, 11, 4937 39 of 40
36. Balas, V.E.; Mastorakis, N.E.; Popescu, M.-C.; Balas, V.E. Multilayer Perceptron and Neural Networks. 2009. Available online:
https://www.researchgate.net/publication/228340819 (accessed on 18 September 2023).
37. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012.
38. Islam, U.; Al-Atawi, A.; Alwageed, H.S.; Ahsan, M.; Awwad, F.A.; Abonazel, M.R. Real-Time Detection Schemes for Memory DoS
(M-DoS) Attacks on Cloud Computing Applications. IEEE Access 2023, 11, 74641–74656. [CrossRef]
39. Houshmand, M.; Hosseini-Khayat, S.; Wilde, M.M. Minimal-Memory, Noncatastrophic, Polynomial-Depth Quantum Convolu-
tional Encoders. IEEE Trans. Inf. Theory 2012, 59, 1198–1210. [CrossRef]
40. Bagging. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote18.html (accessed on
13 November 2023).
41. Boosting. Available online: https://www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote19.html (accessed on
13 November 2023).
42. Dewangan, S.; Rao, R.S.; Mishra, A.; Gupta, M. Code Smell Detection Using Ensemble Machine Learning Algorithms. Appl. Sci.
2022, 12, 10321. [CrossRef]
43. Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2018, 17, 168–192. [CrossRef]
44. Leem, S.; Oh, J.; So, D.; Moon, J. Towards Data-Driven Decision-Making in the Korean Film Industry: An XAI Model for Box
Office Analysis Using Dimension Reduction, Clustering, and Classification. Entropy 2023, 25, 571. [CrossRef] [PubMed]
45. Talukder, A.; Islam, M.; Uddin, A.; Akhter, A.; Hasan, K.F.; Moni, M.A. Machine learning-based lung and colon cancer detection
using deep feature extraction and ensemble learning. Expert Syst. Appl. 2022, 205, 117695. [CrossRef]
46. Ying, M.; Pan, J.; Lu, G.; Zhou, S.; Fu, J.; Wang, Q.; Wang, L.; Hu, B.; Wei, Y.; Shen, J. Development and validation of a radiomics-
based nomogram for the preoperative prediction of microsatellite instability in colorectal cancer. BMC Cancer 2022, 22, 524.
[CrossRef]
47. Fadafen, M.K.; Rezaee, K. Ensemble-based multi-tissue classification approach of colorectal cancer histology images using a novel
hybrid deep learning framework. Sci. Rep. 2023, 13, 8823. [CrossRef]
48. Jansen-Winkeln, B.; Barberio, M.; Chalopin, C.; Schierle, K.; Diana, M.; Köhler, H.; Gockel, I.; Maktabi, M. Feedforward artificial
neural network-based colorectal cancer detection using hyperspectral imaging: A step towards automatic optical biopsy. Cancers
2021, 13, 967. [CrossRef]
49. Bora, K.; Bhuyan, M.K.; Kasugai, K.; Mallik, S.; Zhao, Z. Computational learning of features for automated colonic polyp
classification. Sci. Rep. 2021, 11, 4347. [CrossRef]
50. Fan, J.; Lee, J.; Lee, Y. A Transfer learning architecture based on a support vector machine for histopathology image classification.
Appl. Sci. 2021, 11, 6380. [CrossRef]
51. Lo, C.-M.; Yang, Y.-W.; Lin, J.-K.; Lin, T.-C.; Chen, W.-S.; Yang, S.-H.; Chang, S.-C.; Wang, H.-S.; Lan, Y.-T.; Lin, H.-H.; et al.
Modeling the survival of colorectal cancer patients based on colonoscopic features in a feature ensemble vision transformer.
Comput. Med. Imaging Graph. 2023, 107, 102242. [CrossRef] [PubMed]
52. Grosu, S.; Wesp, P.; Graser, A.; Maurus, S.; Schulz, C.; Knösel, T.; Cyran, C.C.; Ricke, J.; Ingrisch, M.; Kazmierczak, P.M. Machine
learning–based differentiation of benign and premalignant colorectal polyps detected with CT colonography in an asymptomatic
screening population: A proof-of-concept study. Radiology 2021, 299, 326–335. [CrossRef]
53. Takeda, K.; Kudo, S.-E.; Mori, Y.; Misawa, M.; Kudo, T.; Wakamura, K.; Katagiri, A.; Baba, T.; Hidaka, E.; Ishida, F.; et al. Accuracy
of diagnosing invasive colorectal cancer using computer-aided endocytoscopy. Endoscopy 2017, 49, 798–802. [CrossRef] [PubMed]
54. Yang, K.; Zhou, B.; Yi, F.; Chen, Y.; Chen, Y. Colorectal Cancer Diagnostic Algorithm Based on Sub-Patch Weight Color Histogram
in Combination of Improved Least Squares Support Vector Machine for Pathological Image. J. Med. Syst. 2019, 43, 306. [CrossRef]
[PubMed]
55. Dragicevic, A.; Matija, L.; Krivokapic, Z.; Dimitrijevic, I.; Baros, M.; Koruga, D. Classification of Healthy and Cancer States of
Colon Epithelial Tissues Using Opto-magnetic Imaging Spectroscopy. J. Med. Biol. Eng. 2018, 39, 367–380. [CrossRef]
56. Trivizakis, E.; Ioannidis, G.S.; Souglakos, I.; Karantanas, A.H.; Tzardi, M.; Marias, K. A neural pathomics framework for classifying
colorectal cancer histopathology images based on wavelet multi-scale texture analysis. Sci. Rep. 2021, 11, 15546. [CrossRef]
57. Damkliang, K.; Wongsirichot, T.; Thongsuksai, P. Tissue classification for colorectal cancer utilizing techniques of deep learning
and machine learning. Biomed. Eng. Appl. Basis Commun. 2021, 33, 2150022. [CrossRef]
58. Mittal, P.; Condina, M.R.; Klingler-Hoffmann, M.; Kaur, G.; Oehler, M.K.; Sieber, O.M.; Palmieri, M.; Kommoss, S.; Brucker, S.;
McDonnell, M.D.; et al. Cancer tissue classification using supervised machine learning applied to MALDI mass spectrometry
imaging. Cancers 2021, 13, 5388. [CrossRef]
59. Cao, W.; Pomeroy, M.J.; Liang, Z.; Abbasi, A.F.; Pickhardt, P.J.; Lu, H. Vector textures derived from higher order derivative
domains for classification of colorectal polyps. Vis. Comput. Ind. Biomed. Art 2022, 5, 16. [CrossRef]
60. Deif, M.A.; Attar, H.; Amer, A.; Issa, H.; Khosravi, M.R.; Solyman, A.A.A. A New Feature Selection Method Based on Hybrid
Approach for Colorectal Cancer Histology Classification. Wirel. Commun. Mob. Comput. 2022, 2022, 7614264. [CrossRef]
61. Chehade, A.H.; Abdallah, N.; Marion, J.-M.; Oueidat, M.; Chauvet, P. Lung and colon cancer classification using medical imaging:
A feature engineering approach. Phys. Eng. Sci. Med. 2022, 45, 729–746. [CrossRef]
62. Tripathi, A.; Misra, A.; Kumar, K.; Chaurasia, B.K. Optimized Machine Learning for Classifying Colorectal Tissues. SN Comput.
Sci. 2023, 4, 461. [CrossRef]
Mathematics 2023, 11, 4937 40 of 40
63. Kara, O.C.; Venkatayogi, N.; Ikoma, N.; Alambeigi, F. A Reliable and Sensitive Framework for Simultaneous Type and Stage
Detection of Colorectal Cancer Polyps. Ann. Biomed. Eng. 2023, 51, 1499–1512. [CrossRef] [PubMed]
64. Ayyaz, M.S.; Lali, M.I.U.; Hussain, M.; Rauf, H.T.; Alouffi, B.; Alyami, H.; Wasti, S. Hybrid deep learning model for endoscopic
lesion detection and classification using endoscopy videos. Diagnostics 2021, 12, 43. [CrossRef] [PubMed]
65. Mirniaharikandehei, S.; Heidari, M.; Danala, G.; Lakshmivarahan, S.; Zheng, B. Applying a random projection algorithm to
optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images. Comput.
Methods Programs Biomed. 2021, 200, 105937. [CrossRef]
66. Hu, W.; Li, C.; Li, X.; Rahaman, M.; Ma, J.; Zhang, Y.; Chen, H.; Liu, W.; Sun, C.; Yao, Y.; et al. GasHisSDB: A new gastric
histopathology image dataset for computer aided diagnosis of gastric cancer. Comput. Biol. Med. 2022, 142, 105207. [CrossRef]
[PubMed]
67. Naser, E.F.; Zeki, S.M. Using Fuzzy Clustering to Detect the Tumor Area in Stomach Medical Images. Baghdad Sci. J. 2021, 18, 1294.
[CrossRef]
68. Korkmaz, S.A.; Esmeray, F. A New Application Based on GPLVM, LMNN, and NCA for Early Detection of the Stomach Cancer.
Appl. Artif. Intell. 2018, 32, 541–557. [CrossRef]
69. Nayyar, Z.; Khan, M.A.; Alhussein, M.; Nazir, M.; Aurangzeb, K.; Nam, Y.; Kadry, S.; Haider, S.I. Gastric tract disease recognition
using optimized deep learning features. Comput. Mater. Contin. 2021, 68, 2041–2056. [CrossRef]
70. Hu, W.; Chen, H.; Liu, W.; Li, X.; Sun, H.; Huang, X.; Grzegorzek, M.; Li, C. A comparative study of gastric histopathology
sub-size image classification: From linear regression to visual transformer. Front. Med. 2022, 9, 1072109. [CrossRef]
71. Korkmaz, S.A. Recognition of the Gastric Molecular Image Based on Decision Tree and Discriminant Analysis Classifiers by using
Discrete Fourier Transform and Features. Appl. Artif. Intell. 2018, 32, 629–643. [CrossRef]
72. Korkmaz, S.A.; Binol, H. Classification of molecular structure images by using ANN, RF, LBP, HOG, and size reduction methods
for early stomach cancer detection. J. Mol. Struct. 2018, 1156, 255–263. [CrossRef]
73. Kanesaka, T.; Lee, T.-C.; Uedo, N.; Lin, K.-P.; Chen, H.-Z.; Lee, J.-Y.; Wang, H.-P.; Chang, H.-T. Computer-aided diagnosis for
identifying and delineating early gastric cancers in magnifying narrow-band imaging. Gastrointest. Endosc. 2018, 87, 1339–1344.
[CrossRef] [PubMed]
74. Feng, Q.-X.; Liu, C.; Qi, L.; Sun, S.-W.; Song, Y.; Yang, G.; Zhang, Y.-D.; Liu, X.-S. An Intelligent Clinical Decision Support System
for Preoperative Prediction of Lymph Node Metastasis in Gastric Cancer. J. Am. Coll. Radiol. 2019, 16, 952–960. [CrossRef]
75. Korkmaz, S.A. Classification of histopathological gastric images using a new method. Neural Comput. Appl. 2021, 33, 12007–12022.
[CrossRef]
76. Dai, H.; Bian, Y.; Wang, L.; Yang, J. Support Vector Machine-Based Backprojection Algorithm for Detection of Gastric Cancer
Lesions with Abdominal Endoscope Using Magnetic Resonance Imaging Images. Sci. Program. 2021, 2021, 9964203. [CrossRef]
77. Haile, M.B.; Salau, A.; Enyew, B.; Belay, A.J. Detection and classification of gastrointestinal disease using convolutional neural
network and SVM. Cogent Eng. 2022, 9, 2084878. [CrossRef]
78. Noor, M.N.; Nazir, M.; Khan, S.A.; Song, O.-Y.; Ashraf, I. Efficient Gastrointestinal Disease Classification Using Pretrained Deep
Convolutional Neural Network. Electronics 2023, 12, 1557. [CrossRef]
79. Yin, F.; Zhang, X.; Fan, A.; Liu, X.; Xu, J.; Ma, X.; Yang, L.; Su, H.; Xie, H.; Wang, X.; et al. A novel detection technology for early
gastric cancer based on Raman spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2023, 292, 122422. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.