FDGTH 05 1187578
FDGTH 05 1187578
KEYWORDS
1. Introduction
Ovarian cancer is most often found in granulosa cells or germ cells, with epithelial
histology accounting for more than 90% of all ovarian cancer. Epithelial ovarian cancer
(EOC) (1) is a widespread gynecologic malignancy in industrialized and developing
countries (2), with approximately 230,000 new cases and nearly 140,000 deaths per year
(3). In 2020, the United States was expected to see 21,750 new cases multiple cancers with high accuracy. Using statistical approaches,
and 13,940 deaths (4), while Europe experienced 29,000 deaths (5). Hamidi et al. (21) identified 10 miRNAs regulated in ovarian
According to the International Federation of Gynecology and serum cancer samples compared with non-cancer samples in
Obstetrics (FIGO), only 30% of advanced-stage cancer patients the publicly available data set GSE106817: hsa-miR-5100,
live for nearly 5 years after receiving a primary-stage prognosis hsa-miR-6800-5p, hsa-miR-1233-5p, hsa-miR-4532, hsa-miR-4783-
(6, 7). Only 19% of ovarian cancer patients are diagnosed at its 3p, hsa-miR-4787-3p, hsa-miR-1228-5p, hsa-miR-1290, hsa-miR-
early stage due to the absence of robust and minimally invasive 3184-5p, and hsa-miR-320b. However, the approach of the
methods at its early detection (8). Hence, advanced approaches previous study (21) failed to take into account the non-linearity
for the early screening of ovarian cancer are necessary for proper structure in big data; therefore, in this paper, we are implementing
medication and timely treatment. Regarding the genetic basis of a new machine-learning variable selection approach called Boruta
cancer malignancy, microarray technology (9) has recently been to address this problem. We will observe that the new miRNAs
one of the most widely used tools to evaluate the functions of will be explored by the new method that has not been recognized
genes in related patients. MicroRNAs (miRNAs) are short (18–25 in the traditional methods.
nucleotides in length) non-coding RNAs that have emerged as
important translational gene regulators in cancer cells (6). The
screening models currently available are insufficient, and accurate 1.2. Novel contributions
non-invasive molecular biomarkers are urgently needed. Many
studies have looked at the expression profiles of miRNAs in It is important to note that the choice of feature selection (FS)
tissue and serum samples from ovarian cancer patients to method should be tailored to the specific characteristics of the data
identify appropriate biomarkers (10). Even though in many set and research question at hand. Gene expression data are the
studies miRNAs are still insufficient for clinical applications that representation of non-linear interactions among genes (22). By
are due to large-scale non-validation and inconsistencies in the computing analysis of these data, it is expected to gain
diagnosis of devices (11–13), it could expand a new screening knowledge of gene functions and disease mechanisms. Statistical
strategy that can differentiate cancerous from non-cancerous methods can only identify linear patterns, while non-linear
women. In addition, the comprehensive characteristics of patterns of relationships remain hidden. As mentioned in many
circulating miRNAs enable us to produce optimal diagnostic research (23–29), Boruta has superior advantages in terms of
models for ovarian cancer (11–14). feature selection accuracy, stability, and classification
performance across different domains such as protein subcellular
localization and credit risk assessment, however, especially in
1.1. Related works microarray data sets of ovarian cancer that have been rarely used
before. This is based on some studies on the stability of Boruta
MicroRNA molecules can act as an important tool for the (30–32) as a machine-learning method that can more accurately
detection of ovarian cancer. Chung et al. (15) reported let-7b, discover new miRNAs that were hidden in statistical methods.
miR-26a, miR-132, and miR-145 as potential biomarkers in Therefore, this work attempts an innovation in two important
ovarian cancer patients. Among the results of Yuan et al.’s (16) issues: the identification of new miRNAs based on complex non-
study, has-miR-6784-5p, has-miR-6800-5p, and has-miR-5100 are linear structures and the comparison of new results with the
indicating ovarian-associated cancer signature. Jeon et al. (17) previous ones, which will be described in the results and
reported that the serum and tissue miR-1290 was significantly discussion section.
elevated in patients with epithelial ovarian cancer compared with
patients with benign ovarian neoplasm. Chen et al. (18) reported a
total of 19 miRNAs, which were identified by random forest 2. Materials and methods
models, that were important in cancer diagnosis. In this study, the
top five miRNAs with the highest frequency were chosen to be the To identify a robust circulating miRNA biomarker, we searched
biomarker candidates for cancer screening, which has-miR-3184- the Gene Expression Omnibus (GEO) database with specific
5p achieved a high rank. Yaghoobi et al. (19) proposed a method keywords, namely, (“ovarian neoplasms” [MeSH Terms] OR
called EBST that has identified 11 serum miRNAs as potential ovarian cancer [All Fields]) AND “Homo sapiens” [porgn] AND
biomarkers associated with ovarian cancer; among the miRNAs “MicroRNAs” [MeSH Terms] OR miRNA [All Fields]. Then,
set, has-miR-1228-5p and has-miR-6784-5p were also reported. three data sets using the same platform (3D-Gene Human
Zhang et al. (20) reported the four miRNA models that showed miRNA V21_1.0.0) with a larger sample size GSE106817,
very strong performances with AUCs > 0.95 in the biliary tract, GSE113486, and GSE113740 were included (385 ovarian cancer
bladder, colorectal, esophageal, gastric, glioma, liver, ovarian, patients and 3,026 non-cancer controls in total) for further
pancreatic, and prostate cancers. This study provides proof-of- analysis. The GSE106817 has 320 ovarian cancer patients with an
concept data in demonstrating that the four miRNA (hsa-miR- average age of 52 years and 2,759 non-cancer controls that were
5100, hsa-miR-1343-3p, hsa-miR-1290, and hsa-miR-4787-3p) used as the internal discovery data set, and the GSE113486 has
model has the potential to be developed into a simple, 40 ovarian cancer patients and 52 non-cancer controls. The
inexpensive, and non-invasive blood test for the early detection of GSE113740 has 25 ovarian cancer patients, and 215 non-cancer
controls were used for independent validation data sets. This study and hybridized to 3D-Gene® Human miRNA Oligo Chip (Toray
was approved by the ethics committee of Tabriz University of Industries, Kanagawa, Japan) that is designed to investigate 2,588
Medical Sciences (no.: IR.TBZMED.REC.1400.006). miRNA sequences registered in miRBase release 21 (http://www.
mirbase.org/, accessed on 10 January 2022). The following low-
quality samples were excluded: coefficient of variation of negative
2.1. Study design and data set control probes of >0.15 and number of flagged probes identified
by 3D-Gene® Scanner as “uneven spot images” of >10. The
We have used the GSE106817, GSE113486, and GSE113740 presence of a miRNA was determined when signal intensity was
data sets from the GEO database, which is available at https:// greater than the mean plus two times the standard deviation of
www.ncbi.nlm.nih.gov/geo/. The GSE106817 data set started on the negative control signals, and in using the negative control
13 November 2017 in Kanagawa, Japan, which is serum miRNA signals, the top and bottom 5% of the ranked signal intensities
profiles of 4,046 women specimens, and which consists of 333 were removed. Background subtraction was performed by
ovarian cancer and 2,759 non-cancer controls and 976 other subtracting the mean signal of negative control signals (after
types of cancer. The GSE106817 data set consists of ovarian removing the top and bottom 5% as ranked by signal intensities)
cancer patients who were of mean age 57(±12) years, 25% from the miRNA signal.
stage I, 10% stage II, 55% serous, 19% clear cell, and 13%
endometrioid histology (33). Three microarray data sets totaling
to 6,835 unique participants including 728 ovarian cancer 2.2. Machine learning
patients and 3,892 non-cancer controls were included in the
current analysis, all derived from studies originating from a In cancer prediction models, statistical and machine-learning
Japanese nationwide research project “Development and algorithms have been widely used, providing more accurate
Diagnostic Technology for Detection of miRNA in Body Fluids” prognoses and lower per-patient costs. The high dimensionality
that is designed to characterize serum miRNAs in over 5,000 of the gene expression profiles is a crucial issue when building
participants across several types of cancer using a standardized cancer-predictive models (36). As a result, we used a machine-
microarray platform. Supplementary Figure S1 clearly shows the learning algorithm based on the random forest classifier, which is
stages of data pre-processing, identification of significant features easily implemented in the Boruta package in R (37). In many
or predictors, the model building of classifier algorithms, and studies involving miRNAs expression data, Boruta has been used
performance evaluation, which are the four main phases of this to identify important features (38); this could help in the
analysis. development of biomarkers for cancer diagnosis and prognosis.
On the other hand, we used these techniques to characterize
2.1.1. Participants and serum samples miRNAs with biomarker potential that may be useful in the
The serum sample collection has been previously described in diagnosis and/or prognosis of this disease, potentially assisting
the original publications (33–35). Briefly, serum samples were public health (39).
collected from cancer patients who were referred or admitted to
the National Cancer Center Hospital (NCCH) and stored at 4°C
for 1 week before being stored at −20°C until further use. Cancer 2.3. Data cleaning and feature selection
patients who were treated with preoperative chemotherapy and
radiotherapy before serum collection were excluded. The serum We cleaned and normalized the data using the min-max
samples for non-cancer controls who had no history of cancer normalization method (40). Since gene expression data sets had
and no hospitalization during the previous 3 months were too many irrelevant features for classification, feature selection
collected along with routine blood tests from outpatient was inevitable. Feature selection techniques can be used in data
departments of three sources: NCCH, National Center for pre-processing to perform successful data reduction, which is
Geriatrics and Gerontology (NCGG) Biobank, and Yokohama beneficial for finding accurate data models (41). As noted, feature
Minoru Clinic (YMC). Serums collected from NCCH were stored selection techniques have the benefits of reducing over-fitting
in the same way as the serum from cancer patients, while those and reducing model complexity with ease of understanding, as
from NCGG and YMC were stored at −80°C until use. The well as training models more quickly.
original studies were approved by the NCCH Institutional Review
Board, the Ethics and Conflict of Interest Committee of the
NCGG, and the Research Ethics Committee of Medical 2.3.1. Boruta
Corporation Shintokai YMC. Written informed consent was Boruta is a wrapper-based feature selection algorithm that
obtained from each participant. implements a random forest algorithm to iteratively delete the
statistically irrelevant features. Boruta searches for all features that
2.1.2. MiRNA microarray expression analysis are either strongly or weakly relevant to the output variable (27).
Boruta algorithm selects features as follows:
The details about microarray analysis were described in the
original publications (33–35). Briefly, total RNA was extracted (a) It assigns randomness to the data set by making shuffled
from a 300 µl serum, labeled by 3D-Gene® miRNA labeling kit copies of all features (termed as shadow features).
(b) Next, Boruta uses the data set for training a random forest 2.4.2.1. Logistic regression
classifier and uses a feature ranking measure (mean decrease Logistic regression (LR) is used when the answer of a feature is
accuracy, MDA) to estimate the relationship with each computed as numerical (quantitative) data. The relationship
feature (higher mean value). between multiple independent variables and a single binary
(c) It determines whether a real feature has higher rank than the dependent variable, which is a two-category variable, is
best of its shadow features on each iteration (in our analysis, investigated using logistic regression. In cancer microarray data,
100) and excludes features that are considered extremely which is a form of the data set in which the outcome (cancer) is
insignificant. determined by the combined outcome of many features (genes),
(d) Boruta algorithm comes to a halt when all features have been logistic regression has a variety of uses. Logistic regression rejects
confirmed. a linear relationship between the dependent and independent
This would ultimately result in at least a subset of features that is variables in favor of the binomial probability principle, which
ideal. Since this approach reduces the error of the random forest states that there are only two possible outcomes (50). The fit of a
model, it identifies all features that are either highly significant or logistic regression model will be evaluated using the area under
unrelated (32, 42, 43). Boruta is used in such a way that the the curve (AUC) (51).
features selected are mostly correlated with the prediction variable.
2.4.2.2. Decision trees
In the process of identifying if a feature is important or not,
Decision trees (DTs) are a type of supervised machine learning that
some features may be signed by Boruta as “Tentative.” Tentative
can be used to find attributes and extract patterns in big databases
attributes are decided as confirmed or rejected by using the
that are important for predictive modeling (46). The
median Z score of the attributes with the median Z score of the
interoperability of the rendered model is a feature of decision
best shadow attribute.
tree modeling that distinguishes it from other techniques of
pattern recognition. The most straightforward algorithm for
processing a visual representation of the relationship between
2.4. Model building and potential miRNAs independent and dependent variables is decision trees (52). DTs
signature identification are easy to build, train, interpret, and explain. However, the
variation in the decision trees, in some instances, can be
We split the data using the CARET package into two parts:
improved using random forests as the outcomes of randomly
two-thirds of the data were used for model development or
generated decision trees to produce a more impressive model.
training, while the remaining one-third of the data were used to
evaluate or validate the model. 2.4.2.3. Random forest
Random forest (RF) is a supervised ensemble learning algorithm
that provides a single combination of prediction accuracy and
2.4.1. Handling of imbalanced classes model interoperability among general machine-learning
In most cases, prediction algorithms train to predict the technique (39). RFs are an instance of ensemble learning, in
majority class (i.e., non-cancer), resulting in incorrect sensitivities which a complex model was developed by combining numerous
and specificities (44). Instead, fixing the imbalance in the simple decision tree algorithms, due to lower variance than single
outcomes (i.e., lower cancer rates) in the training data usually decision trees. Random forest is a meta-classification approach
leads to the creation of a better prediction model and a better that fits a number of sub-classifiers (DTs) on various subsets of a
trade-off between sensitivity and specificity (45). Oversampling data set, and the averages from each decision tree are used to
the minority class and under-sampling the majority class are the ameliorate the accuracy of classification, the superiorities of RF
most effective strategy for overcoming imbalanced outcomes (46). that they decrease the over-fitting, thus improving accuracy.
To balance the training sample in this article, we used SMOTE Random forests can be used to rate the importance of variables
random oversampling (47). in a regression or classification problem (53).
scalable machine-learning system for tree boosting. The most features is classified using LR (statistical), DT and RF (tree-
significant component of the success of XGBoost is its scalability based), ANN, and XGB (machine learning) classifiers. After
across all scenarios. XGB scalability is due to a number of major finding the more important features (in our study over 80%) as
systems and algorithmic enhancements, parallel and distributed shown in Supplementary Table S1, we identified 10 potential
computing speed up learning, allowing for more rapid model miRNAs, has-miR-1290, has-miR-1233-5p, has-miR-1914-5p,
exploration. XGB also allows data scientists to process by has-miR-1469, has-miR-4675, has-miR-1228-5p, has-miR-3184-
utilizing out-of-core processing (53). 5p, has-miR-6784-5p, has-miR-6800-5p, and has-miR-5100,
from the GSE106817 data sets and were defined as the
candidate miRNAs for ovarian cancer diagnosis. In
2.5. Evaluation criteria Supplementary Table S2, we reported the t-test table to
compare cancer and non-cancerous samples, and all of these
The validation technique is widely used to avoid over-fitting miRNAs had significant P-value. Using the 10 selected
and to check the validity of the models. We evaluated our miRNAs, the final machine-learning models with optimal
outcomes employing two external data sets, as shown in the hyperparameters are presented in Table 1.
Supplementary Figure S1. The metrics utilized to assess the
results of the classification models are expressed below:
3.1. Internal validation data set
TP þ TN
Accuracy: ACC ¼ , As noted in the previous section, we find 10 miRNAs that are
TP þ FP þ TN þ FN
has-miR-1290, has-miR-1233-5p, has-miR-1914-5p, has-miR-
TP
Sensitivity: SEN ¼ , 1469, has-miR-4675, has-miR-1228-5p, has-miR-3184-5p, has-
TP þ FN
miR-6784-5p, has-miR-6800-5p, and has-miR-5100. We
TN implemented each miRNA separately in models to get their
Specificity: SPC ¼ ,
TN þ FP power of prediction individually in classification between cancer
Pr (a) Pr (e) and non-cancerous samples. The AUC of each of these miRNAs
Kappa: k ¼
1 Pr (e) is listed in Supplementary Table S1A. We observe that in the
where: internal validation, all miRNAs have high AUC (minimum AUC:
1. TP (true positive) is the number of people who suffer from 86.0%; maximum AUC is 96.8%). The performance measures for
“cancer” among those who were diagnosed with “cancer.” LR, DT, RF, ANN, and XGB models are shown in
2. FP (false positive) depicts the number of persons who are Supplementary Table S3A. We observe that the AUC of LR, RF,
“cancerous” but were diagnosed as “non-cancerous.” ANN, and XGB is 99.9%. Supplementary Table S3A shows the
3. FN (false negative) is the number of people wrongly found to accuracy, sensitivity, specificity, NPV, PPV, and Kappa for LR,
be “non-cancerous.” DT, RF, ANN, and XGB models in the classification and
4. TN (true negative) states the number of “non-cancerous” prediction of ovarian cancer. Four models obtained an AUC of
correctly. 99.9%; however, DT obtained 98% AUC. In detail, RF has the
5. Pr(a) represents the observed agreement, and Pr(e) represents highest value of accuracy (99.13), specificity (99.51), PPV (95.83),
the chance agreement. and Kappa (95.35), and LR have high sensitivity (98.96) and
NPV (99.88). Figure 1A illustrates the ROC curve for the
We tested classifier reliability for multi-class data sets using proposed models of 10 candidate miRNAs that are shown in
Kappa values, which reflect the compromise among real and Supplementary Table S1A. All models except DT have over
expected values (58); positive predictive value (PPV) and 99.9% of AUC. Figure 1B shows the individual AUCs of 10
negative predictive value (NPV) were also obtained (59). The miRNAs in internal data set: has-mir-5100 (93.7%), has-mir-
one-sided DeLong’s test was used to calculate the power 6800-5p (97%), has-mir-6784-5p (94.2%), has-mir-3184-5p
for the ROC curves, which was done using the R package (94.2%), has-mir-1228-5p (95.6%), has-mir-4675 (95.4%), has-
“pROC” (60). mir-1469 (96.7%), has-mir-1914-5p (96%), has-mir-1233-5p
(97.7%), and has-mir-1290(95.4%). In Supplementary Figure S2,
we used a boxplot to display the expression levels of these 10
3. Result candidate miRNAs in the cancer and non-cancer groups. In the
boxplots, it is clear that four of the miRNAs has-miR-1233-5p,
The data have 2,568 variables. In this initial variable section has-miR-1914-5p, has-miR-4675, and has-miR-5100 have higher
stage by Boruta, 199 variables were selected in 29 min. The expression level with various cut-off for cancerous samples, and
training set included 2,156 samples, while the testing set on average, four of them (has-miR-1228-5p, has-miR-3184-5p,
included 923 samples. The training set consisted of 1,932 non- has-miR-6784-5p, and has-miR-6800-5p) have lower expression
cancerous samples and 224 cancerous samples. After balancing level for cancerous samples. We used heatmap plots by
the training data, the non-cancerous and cancerous samples implementing the “heatmaply” package to underpin the potential
became 1,121 and 1,035, respectively. The data set with reduced relationships between features and the hierarchical clustering
TABLE 1 Hyperparameters and predictive power of models for ovarian cancer classification.
Classifier Hyperparameters AUCa Accuracy Sensitivity Specificity Negative predictive Positive predictive
(%) (%) (%) (%) value (NPV) % value (PPV) %
Logistic Parametersb 99.77 100.0 100 100.0 100.0 100.0
regression
Decision trees Cp = 0.01014493c 98.30 91.30 97.41 97.10 88.10 94.0
Random forest Mtry = 2d 100.0 96.74 99.55 100.0 94.55 100.0
Artificial neural Size = 3e and decay = 99.93 100.0 98.84 98.74 100.0 100.0
network 0.1f
XGBoosting nrounds = 50, 99.99 98.91 99.28 100.0 100.0 98.11
max_depth = 2,
eta = 0.4g
gamma = 0h
colsample_bytreei = 0.6
min_child_weightj = 1
and
subsample = 0.75k
a
The area under the receiver operating characteristic curve (maximum) was used to select the optimal model.
b
The formula for logistic regression for the prediction of ovarian cancer is
0 1
(10:463 18:25(has:miR:5100) 29:63(has:miR:6800:5p) 9:30(has:miR:6784:5p) 7:38(has:miR:3184:5p) þ 2:702(has:miR:1228:5p) 1
B þ11:33(has:miR:4675) 8:19(has:miR:1469) þ 0(has:miR:1914:5p) þ 5:70(has:miR:1233:5p) þ 9:08(has:miR:1290)) C
p ¼ @1 þ e A
c
The complexity parameter (cp) is used to control the size of the decision tree and to select the optimal tree size. If the cost of adding an additional variable to the decision
tree from the current node is above the value of the cp, then tree building does not continue.
d
mtry is the number of variables available for splitting at each tree node. In the random forests literature, this is referred to as the mtry parameter.
e
Size is the number of units in a hidden layer.
f
Decay is the regularization parameter used to avoid over-fitting.
g
max-depth is used to control over-fitting as higher depth will allow the model to learn relations very specific to a particular sample.
h
gamma A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split,
which makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.
i
Denotes the fraction of columns to be randomly sampled for each tree.
j
min_child_weight is used to control over-fitting. Higher values prevent a model from learning relations that might be highly specific to the particular sample selected for a
tree. Too high values can lead to under-fitting; hence, it should be tuned using CV.
k
subsample lower values make the algorithm more conservative and prevent over-fitting but too small values might lead to under-fitting.
analysis using the selected features to recognize different samples in cancerous and non-cancerous samples in GSE106817. The
the internal discovery data sets. Supplementary Figure S3 shows a selected microRNAs are differently expressed in the non-cancer
promising result of the hierarchical clustering analysis (heatmap) and cancerous classes. This is well illustrated by drawing the
using the 10 identified miRNAs to differentiate between heatmap (Supplementary Figure S3).
FIGURE 1
(A) ROC curve for the proposed models in GSE106817. (B) ROC curve of each selected miRNA in GSE106817.
FIGURE 2
Targeted pathway clusters/heatmap presenting the top 10 Kyoto Encyclopedia of Genes and Genomes pathways regulated by the miRNAs (P < 0.005;
DIANA/miRPath v.4).
FIGURE 3
Network of interactions between selected miRNAs with coding genes and long non-coding RNAs. Yellow colored genes represent LNC-RNAs and green
colored genes represent transcription factors.
3.2. External validation data sets TABLE 2 Summary of the role of selected miRNAs in cancer.
specificity, and PPV of 100. Supplementary Figure S4A shows hsa-miR-1233-5p Renal cell carcinoma Dias et al. (72)
hsa-miR-1914-5p Colorectal Liu et al. (73)
that all models for GSE113486 yielded 100% AUC except DT and
hsa-miR-1914-5p Epithelial ovarian Chong et al. (74)
RF. Supplementary Figure S4B illustrates how biomarkers
hsa-miR-1469 Pancreatic Shams et al. (75)
perform individually has-mir-5100 (95.8%), has-mir-6800-5p hsa-miR-1469 Laryngeal Ma et al. (76)
(99.7%), has-mir-6784-5p (97.5%), has-mir-3184-5p (93.9%), has- hsa-miR-1469 Colon Gungormez et al. (77)
mir-1228-5p (99.8%), has-mir-4675 (99.1%), has-mir-1469 (100%), hsa-miR-4675 Breast Lai et al. (78)
has-mir-1914-5p (99.8%), has-mir-1233-5p (99.4%), and has-mir- hsa-miR-4675 Various types Chen and Dhahbi (18)
1290 (96.4%) in GSE113486. Boxplots show us that six of miRNAs hsa-miR-3184-5p Breast Rajarajan et al. (79)
hsa-miR-3184-5p Ovarian Alshamrani (80)
(has-mir-1290, has-mir-1233-5p, has-mir-1914-5p, has-mir-1469,
hsa-miR-3184-5p Various types Chen and Dhahbi (18)
has-mir-4675, and has-mir-5100) have upregulated to ovarian
hsa-miR-6800-5p Epithelial ovarian Tuncer et al. (81)
cancer samples in GSE113486 (Supplementary Figure S5). In
hsa-miR-5100 Various types Chen and Dhahbi (18)
GSE113740, as the second external validation data set, we can see hsa-miR-5100 Epithelial ovarian Tuncer et al. (81)
the result of AUC of LR, RF, ANN, and XGB over 94% in hsa-miR-5100 Pancreatic Chijiiwa et al. (82)
Supplementary Figure S6A. We also found AUC for these 10 Shams et al. (75)
miRNAs (individually) in external data sets that included hsa-miR-5100 Esophageal Song et al. (83)
hsa-miR-1228-5p Breast Peña-Chilet et al. (84)
individually has-mir-5100 (90.6%), has-mir-6800-5p (89.7%), has-
hsa-miR-1228-5p Various types Hu et al. (85)
mir-6784-5p (74.4%), has-mir-3184-5p (74.4%), has-mir-1228-5p
hsa-miR-1228-5p Breast Cilek et al. (86)
(85.2%), has-mir-4675 (79.7%), has-mir-1469 (84.4%), has-mir- hsa-miR-1228-5p Hepatocellular Morishita et al. (87)
1914-5p (81.5%), has-mir-1233-5p (86.5%), and has-mir-1290 hsa-miR-1228-5p Epithelial ovarian Chen et al. (88)
(91%) as shown in Supplementary Figure S6B. Supplementary hsa-miR-1228-5p Pancreatic Wang et al. (89)
Table S3C shows us that RF and XGB have the highest value in hsa-miR-6784-5p Hepatocellular Morishita et al. (87)
Kappa (72.96 and 71.96) in AUC and accuracy (97.2, 93.75), as hsa-miR-6784-5p Various types Alshamrani (80)
seen ANN has 100 of sensitivity and NPV. Boxplots hsa-miR-6784-5p Esophageal Song et al. (83)
FIGURE 4
Predicted pathways of the effect of selected miRNAs in ovarian cancer.
three groups and their analysis results by miRPath v.4 tool are shown signaling, colorectal cancer, hepatocellular cancer, pathways in
in Figure 2. As shown in Figure 2A, among the six common genes cancer, pancreatic cancer, axon guidance, and Hippo signaling.
between the present and previous work, four genes are involved in Axon guidance pathway is common among all the three groups.
at least one known cancer pathway (axon guidance). Among those Many axon guidance molecules regulate cell migration and
four genes, hsa-miR-5100 and hsa-miR-1290 are involved in several apoptosis in normal and tumorigenic tissues (63). Supplementary
well-known and important pathways in cancer. Figure 2B shows Table S4 shows the target genes of the selected microRNAs and the
that among the four specific genes identified by the Boruta associated KEGG pathways from the genes union method, which
technique, three genes are involved in at least two well-known indicates the significance of the relationship between the
pathways in cancer, among which hsa-miR-4675 is involved in microRNAs and the corresponding pathways under the specified
several pathways. However, in Figure 2C, among the four specific threshold values. Figure 3 shows the network of miRNAs and
genes identified in the previous work of Hamidi et al. (21), only the identified target genes. In this figure, transcription factors and LNC-
hsa-miR-320b gene is involved in several important cancer RNAs have also been added through some studies. References for
pathways. It should be noted that there are six common paths these interactions are described in Supplementary Table S4.
between Groups A and B, while there are four common paths In Supplementary Table S5, we only selected seven pathways
between A and C. This means that there are more correlation because only the pathways that had very high correlation with
between genes of Group A and B than of Group A and C. This miRNAs were selected (including a P-value of < 0.002). Among the
interpretation shows the biological superiority of Boruta’s technique top seven pathways identified, based on P-value, were pathways
over the previous work. A comparison between routes of Group B associated with fatty acid biosynthesis, prion diseases, axon guidance,
and C also provides interesting results. Eight pathways are common glioma, ErbB signaling pathway, proteoglycans in cancer, and
between the two groups, which are proteoglycans in cancer, ErbB endometrial cancer. All signaling pathways related to miRNAs were
FIGURE 5
Venn diagram of common miRNAs among three different studies.
used from known pathways, and in general, they play an important role investigated that miRNA-1290 in the epithelial ovarian cancer
in all types of cancer. According to the KEGG database, some of the group was significantly overexpressed in serum exosomes and
published articles confirm the role of some of the selected miRNAs tissues as compared with the benign ovarian neoplasm group. Ying
in cancer directly. A number of these documents are summarized in et al. (90) expressed that microarray data analysis showed that
Table 2. Figure 4 shows the predicted pathways of the effect of some hsa-miR-1290 was differentially expressed between COC1 (DDP-
of the selected microRNAs that have been taken from the https:// sensitive) and COC1/DDP (DDP-resistant) tumor cell lines. Chen
targetexplorer.ingenuity.com/index.htm. et al. (18) showed that only five balanced miRNAs were
Figure 5 presents the common miRNAs between two related determined to be important in cancer diagnosis: hsa-miR-663a,
studies (18, 21) and miRNAs that were obtained in our study. hsa-miR-6802-5p, hsa-miR-6784-5p, hsa-miR-3184-5p, and hsa-
There is some evidence in the literature for the biomarkers miR-8073. Furthermore, Chen et al. (18) found that hsa-miR-3184-
included in our study. Hamidi et al. (21) showed that hsa-miR- 5p can act as an early biomarker of bladder cancer and as a key
5100, hsa-miR-1233-5p, hsa-miR-4532, hsa-miR-1290, has-miR- regulator of breast cancer. Also, hsa-miR-6784-5p has been
3184-5p, and hsa-miR-320b could potentially be employed as reported to be a sensitive serum biomarker for ovarian cancer
important biomarkers in ovarian cancer. Jeon et al. (17) diagnosis and a key regulator for breast cancer.
In the end, we note that although there are fundamental differences Ethics statement
between microarray and RNA-Seq methods for obtaining gene
expression data, the data matrix obtained from both methods is The studies involving human participants were reviewed and
completely similar after performing the necessary pre-processing. approved by the ethics committee of Tabriz University of
Therefore, our method is also applicable to RNA-Seq data. Medical Sciences (IR.TBZMED.REC.1400.006). The patients/
participants provided their written informed consent to
participate in this study.
5. Strengths and limitations
This study provides several advantages. Firstly, to identify the
relevant and important miRNAs, we utilized a robust variable Author contributions
selection method and a novel random forest-based feature selection
of a machine-learning approach to identify and select the relevant RA, NG, and FH contributed to the conception and design of
and important miRNAs for ovarian cancer diagnosis, using Boruta the study. RA, NG, and FH performed the statistical analysis. FH
as a novel random forest-based feature selection in the machine- wrote the first draft of the manuscript. HY, EB and FH wrote the
learning techniques that has known roles in dimension reduction biological discussion section. RA, FH, PS and JM wrote sections
and select properties variables. Secondly, we used logistic regression of the manuscript. All authors contributed to the manuscript
and four of the most used machine-learning methods to predict revision and read and approved the submitted version.
and classify ovarian cancer. Thirdly, we selected three GEO data
sets and ensured that they were from a similar platform, and used
them in the evaluation stages. The first limitation of this study is
that the biomarkers obtained in this study for ovarian cancer were Acknowledgments
not compared with the other common types of cancer in females.
Secondly, the result of this study is possibly appropriate for a The authors would like to thank all those who spent their
specific race or area because of the main data set. valuable time participating in this research project, and we are
also immensely grateful to the “anonymous” reviewers.
6. Conclusion
Our study aimed to investigate reliable classification biomarkers Conflict of interest
in ovarian cancer. After utilizing Boruta for identifying the
important biomarkers, we found 10 miRNAs that have high The authors declare that the research was conducted in the
reliability in evaluating output from each classification model. The absence of any commercial or financial relationships that could
Hsa-miR-5100, hsa-miR-6800-5p, hsa-miR-6784-5p, hsa-miR-3184- be construed as a potential conflict of interest.
5p, hsa-miR-1228-5p, hsa-miR-4675, hsa-miR-1469, hsa-miR-1914-
5p, hsa-miR-1233-5p, and hsa-miR-1290 had significant differential
expression in all models, especially in the two data sets studied
(GSE106817, GSE113486). Except for decision trees, all the Publisher’s note
proposed models have performed fairly well in terms of the
detection accuracy for ovarian cancer in the validation data sets. All claims expressed in this article are solely those of the
The LR, RF, ANN, and XGB in GSE106817 and GSE113486 data authors and do not necessarily represent those of their affiliated
sets had over 99% AUC, and in GSE113740 over 94%. Even organizations, or those of the publisher, the editors and the
though this study presented some additional biomarkers for reviewers. Any product that may be evaluated in this article, or
possible consideration in future research, the analyses in these data claim that may be made by its manufacturer, is not guaranteed
sets do not support the immediate clinical use of these biomarkers or endorsed by the publisher.
without more rigorous testing in large case-control and cohort studies.
References
1. Lheureux S, Gourley C, Vergote I, Oza AM. Epithelial ovarian cancer. Lancet. 23. Nithya B, Ilango V. Evaluation of machine learning based optimized feature
(2019) 393(10177):1240–53. doi: 10.1016/S0140-6736(18)32552-2 selection approaches and classification methods for cervical cancer prediction. SN
Appl Sci. (2019) 1:1–16. doi: 10.1007/s42452-019-0645-7
2. Reid BM, Permuth JB, Sellers TA. Epidemiology of ovarian cancer: a review.
Cancer Biol Med. (2017) 14(1):9. doi: 10.20892/j.issn.2095-3941.2016.0084 24. Chen R-C, Dewi C, Huang S-W, Caraka RE. Selecting critical features for data
classification based on machine learning methods. J Big Data. (2020) 7(1):52. doi: 10.
3. Cabasag CJ, Fagan PJ, Ferlay J, Vignat J, Laversanne M, Liu L, et al. Ovarian
1186/s40537-020-00327-4
cancer today and tomorrow: a global assessment by world region and human
development index using GLOBOCAN 2020. Int J Cancer. (2022) 151(9):1535–41. 25. Ali NM, Aziz N, Besar R. Comparison of microarray breast cancer classification using
doi: 10.1002/ijc.34002 support vector machine and logistic regression with LASSO and boruta feature selection.
Indones J Electr Eng Comput Sci. (2020) 20(2):712–9. doi: 10.11591/ijeecs.v20.i2.pp712-719
4. Miller KD, Siegel RL, Lin CC, Mariotto AB, Kramer JL, Rowland JH, et al. Cancer
treatment and survivorship statistics, 2016. CA Cancer J Clin. (2016) 66(4):271–89. 26. Fortino V, Kinaret P, Fyhrquist N, Alenius H, Greco D. A robust and accurate
doi: 10.3322/caac.21349 method for feature selection and prioritization from multi-class OMICs data. PLoS
One. (2014) 9(9):e107801. doi: 10.1371/journal.pone.0107801
5. Carioli G, Bertuccio P, Boffetta P, Levi F, La Vecchia C, Negri E, et al. European
cancer mortality predictions for the year 2020 with a focus on prostate cancer. Ann 27. Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Softw.
Oncol. (2020) 31(5):650–8. doi: 10.1016/j.annonc.2020.02.009 (2010) 36:1–13. doi: 10.18637/jss.v036.i11
6. Iorio MV, Visone R, Di Leva G, Donati V, Petrocca F, Casalini P, et al. MicroRNA 28. Chen K-H, Wang K-J, Tsai M-L, Wang K-M, Adrian AM, Cheng W-C, et al.
signatures in human ovarian cancer. Cancer Res. (2007) 67(18):8699–707. doi: 10. Gene selection for cancer identification: a decision tree model empowered by
1158/0008-5472.CAN-07-1936 particle swarm optimization algorithm. BMC Bioinform. (2014) 15(1):1–10. doi: 10.
1186/1471-2105-15-1
7. Du Bois A, Reuss A, Pujade-Lauraine E, Harter P, Ray-Coquard I, Pfisterer J. Role
of surgical outcome as prognostic factor in advanced epithelial ovarian cancer: a 29. Trivedi SK. A study on credit scoring modeling with different feature selection
combined exploratory analysis of 3 prospectively randomized phase 3 multicenter and machine learning approaches. Technol Soc. (2020) 63:101413. doi: 10.1016/j.
trials: by the arbeitsgemeinschaft gynaekologische onkologie studiengruppe techsoc.2020.101413
ovarialkarzinom (AGO-OVAR) and the groupe d’Investigateurs nationaux pour les
30. Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for
etudes des cancers de l’Ovaire (GINECO). Cancer. (2009) 115(6):1234–44. doi: 10.
random forests and omics data sets. Brief Bioinform. (2019) 20(2):492–503. doi: 10.
1002/cncr.24149
1093/bib/bbx124
8. Zheng H, Zhang L, Zhao Y, Yang D, Song F, Wen Y, et al. Plasma miRNAs as
31. Acharjee A, Larkman J, Xu Y, Cardoso VR, Gkoutos GV. A random forest based
diagnostic and prognostic biomarkers for ovarian cancer. PLoS One. (2013) 8(11):
biomarker discovery and power analysis framework for diagnostics research. BMC
e77853. doi: 10.1371/journal.pone.0077853
Med Genomics. (2020) 13(1):1–14. doi: 10.1186/s12920-020-00826-6
9. Bartels CL, Tsongalis GJ. MicroRNAs: novel biomarkers for human cancer. Clin
32. Kursa MB. Robustness of random forest-based gene selection methods. BMC
Chem. (2009) 55(4):623–31. doi: 10.1373/clinchem.2008.112805
Bioinform. (2014) 15:1–8. doi: 10.1186/1471-2105-15-8
10. Flavin R, Smyth P, Barrett C, Russell S, Wen H, Wei J, et al. miR-29b expression
33. Yokoi A, Matsuzaki J, Yamamoto Y, Yoneoka Y, Takahashi K, Shimizu H, et al.
is associated with disease-free survival in patients with ovarian serous carcinoma. Int
Integrated extracellular microRNA profiling for ovarian cancer screening. Nat
J Gynecologic Cancer. (2009) 19(4):641–7. doi: 10.1111/IGC.0b013e3181a48cf9
Commun. (2018) 9(1):4319. doi: 10.1038/s41467-018-06434-4
11. Schwarzenbach H, Nishida N, Calin GA, Pantel K. Clinical relevance of
34. Usuba W, Urabe F, Yamamoto Y, Matsuzaki J, Sasaki H, Ichikawa M, et al.
circulating cell-free microRNAs in cancer. Nat Rev Clin Oncol. (2014) 11(3):145–56.
Circulating miRNA panels for specific and early detection in bladder cancer. Cancer
doi: 10.1038/nrclinonc.2014.5
Sci. (2019) 110(1):408–19. doi: 10.1111/cas.13856
12. Yokoi A, Yoshioka Y, Hirakawa A, Yamamoto Y, Ishikawa M, Ikeda S-i, et al. A
35. Yamamoto Y, Kondo S, Matsuzaki J, Esaki M, Okusaka T, Shimada K, et al.
combination of circulating miRNAs for the early detection of ovarian cancer.
Highly sensitive circulating microRNA panel for accurate detection of hepatocellular
Oncotarget. (2017) 8(52):89811. doi: 10.18632/oncotarget.20688
carcinoma in patients with liver disease. Hepatol Commun. (2020) 4(2):284–97.
13. Matsuzaki J, Ochiya T. Circulating microRNAs and extracellular vesicles as doi: 10.1002/hep4.1451
potential cancer biomarkers: a systematic review. Int J Clin Oncol. (2017)
36. Wiemken TL, Kelley RR. Machine learning in epidemiology and health
22:413–20. doi: 10.1007/s10147-017-1104-3
outcomes research. Annu Rev Public Health. (2019) 41:21–36. doi: 10.1146/annurev-
14. Hamidi F, Gilani N, Belaghi RA, Sarbakhsh P, Edgünlü T, Santaguida P. publhealth-040119-094437
Exploration of potential miRNA biomarkers and prediction for ovarian cancer
37. Speiser JL, Miller ME, Tooze J, Ip E. A comparison of random forest variable
using artificial intelligence. Front Genet. (2021) 12:724785. doi: 10.3389/fgene.2021.
selection methods for classification prediction modeling. Expert Syst Appl. (2019)
724785
134:93–101. doi: 10.1016/j.eswa.2019.05.028
15. Chung Y-W, Bae H-S, Song J-Y, Lee JK, Lee NW, Kim T, et al. Detection of
38. Lekchnov EA, Amelina EV, Bryzgunova OE, Zaporozhchenko IA, Konoshenko
microRNA as novel biomarkers of epithelial ovarian cancer from the serum of
MY, Yarmoschuk SV, et al. Searching for the novel specific predictors of prostate
ovarian cancer patient. Int J Gynecologic Cancer. (2013) 23(4):673–9. doi: 10.1097/
cancer in urine: the analysis of 84 miRNA expression. Int J Mol Sci. (2018) 19
IGC.0b013e31828c166d
(12):4088. doi: 10.3390/ijms19124088
16. Yuan F, Li Z, Chen L, Zeng T, Zhang Y-H, Ding S, et al. Identifying the
39. Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical
signatures and rules of circulating extracellular microRNA for distinguishing cancer
learning: Data mining, inference, and prediction. Springer (2009).
subtypes. Front Genet. (2021) 12:651610. doi: 10.3389/fgene.2021.651610
40. Huang J, Li Y-F, Xie M. An empirical analysis of data preprocessing for machine
17. Jeon H, Seo SM, Kim TW, Ryu J, Kong H, Jang SH, et al. Circulating exosomal
learning-based software cost estimation. Inf Softw Technol. (2015) 67:108–27. doi: 10.
miR-1290 for diagnosis of epithelial ovarian cancer. Curr Issues Mol Biol. (2022) 44
1016/j.infsof.2015.07.004
(1):288–300. doi: 10.3390/cimb44010021
41. Jović A, Brkić K, Bogunović N, editor. A review of feature selection methods with
18. Chen JW, Dhahbi J. Identification of four serum miRNAs as potential markers
applications. 2015 38th International Convention on Information and Communication
to screen for thirteen cancer types. PLoS One. (2022) 17(6):e0269554. doi: 10.1371/
Technology, Electronics and Microelectronics (MIPRO); IEEE (2015).
journal.pone.0269554
42. Stuart EA, Azur M, Frangakis C, Leaf P. Multiple imputation with large data sets:
19. Yaghoobi H, Babaei E, Hussen BM, Emami A. EBST: an evolutionary multi-
a case study of the children’s mental health initiative. Am J Epidemiol. (2009) 169
objective optimization based tool for discovering potential biomarkers in ovarian
(9):1133–9. doi: 10.1093/aje/kwp026
cancer. IEEE/ACM Trans Comput Biol Bioinform. (2020) 18(6):2384–93. doi: 10.
1109/TCBB.2020.2993150 43. Azimi I, Pahikkala T, Rahmani AM, Niela-Vilén H, Axelin A, Liljeberg P.
Missing data resilient decision-making for healthcare IoT through personalization: a
20. Zhang A, Hu H. A novel blood-based microRNA diagnostic model with high
case study on maternal health. Future Gener Comput Syst. (2019) 96:297–308.
accuracy for multi-cancer early detection. Cancers (Basel). (2022) 14(6):1450.
doi: 10.1016/j.future.2019.02.015
doi: 10.3390/cancers14061450
44. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance.
21. Hamidi F, Gilani N, Belaghi RA, Sarbakhsh P, Edgünlü T, Santaguida P.
J Big Data. (2019) 6(1):1–54. doi: 10.1186/s40537-018-0162-3
Exploration of potential miRNA biomarkers and prediction for ovarian cancer
using artificial intelligence. Front Genet. (2021) 12:2079. doi: 10.3389/fgene.2021. 45. Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: a review. Int
724785 J Pattern Recognit Artif Intell. (2009) 23(04):687–719. doi: 10.1142/S0218001409007326
22. Tripathi YM, Chatla SB, Chang Y-CI, Huang L-S, Shieh GS. A nonlinear 46. Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer
correlation measure with applications to gene expression data. PLoS One. (2022) 17 diagnosis on imbalanced data. J Biomed Inform. (2019) 90:103089. doi: 10.1016/j.jbi.
(6):e0270270. doi: 10.1371/journal.pone.0270270 2018.12.003
47. Shanab AA, Khoshgoftaar TM, Wald R, Napolitano A, editor. Impact of noise carcinoma and can discriminate patients from those with malignancies of other
and data sampling on stability of feature ranking techniques for biological datasets. histological types. J Ovarian Res.(2018) 11:1–10. doi: 10.1186/s13048-018-0458-0
2012 IEEE 13th International Conference on Information Reuse & Integration (IRI);
71. Li Y, Yao L, Liu F, Hong J, Chen L, Zhang B, et al. Characterization of
IEEE (2012).
microRNA expression in serous ovarian carcinoma. Int J Mol Med. (2014) 34
48. Alpaydin E. Introduction to machine learning. MIT press (2020). (2):491–8. doi: 10.3892/ijmm.2014.1813
49. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical 72. Dias F, Teixeira AL, Ferreira M, Adem B, Bastos N, Vieira J, et al. Plasmatic
learning. Springer (2013). miR-210, miR-221 and miR-1233 profile: potential liquid biopsies candidates
for renal cell carcinoma. Oncotarget. (2017) 8(61):103315. doi: 10.18632/oncotarget.
50. Stoltzfus JC. Logistic regression: a brief primer. Acad Emerg Med. (2011) 18
21733
(10):1099–104. doi: 10.1111/j.1553-2712.2011.01185.x
73. Liu S, Qu D, Li W, He C, Li S, Wu G, et al. Mir-647 and miR-1914 promote
51. Abdulqader QM. Applying the binary logistic regression analysis on the medical
cancer progression equivalently by downregulating nuclear factor IX in colorectal
data. Sci J Univ Zakho. (2017) 5(4):330–4. doi: 10.25271/2017.5.4.388
cancer. Mol Med Rep. (2017) 16(6):8189–99. doi: 10.3892/mmr.2017.7675
52. Maimon OZ, Rokach L. Data mining with decision trees: Theory and
74. Chong GO, Jeon H-S, Han HS, Son JW, Lee YH, Hong DG, et al. Differential
applications. World Scientific (2014).
microRNA expression profiles in primary and recurrent epithelial ovarian cancer.
53. Qi Y. Random forest for bioinformatics. Ensemble machine learning: Methods and Anticancer Res. (2015) 35(5):2611–7. doi: 10.3892/ijmm.2014.1813
applications. Springer (2012). 307–23.
75. Shams R, Saberi S, Zali M, Sadeghi A, Ghafouri-Fard S, Aghdaei HA.
54. DeGregory K, Kuiper P, DeSilvio T, Pleuss J, Miller R, Roginski J, et al. A review Identification of potential microRNA panels for pancreatic cancer diagnosis using
of machine learning in obesity. Obes Rev. (2018) 19(5):668–85. doi: 10.1111/obr.12667 microarray datasets and bioinformatics methods. Sci Rep. (2020) 10(1):7559. doi: 10.
1038/s41598-020-64569-1
55. Klassen V, Safin A, Maltsev A, Andrianov N, Morozov S, Vladzymyrskyy A. AI-
based screening of pulmonary tuberculosis: diagnostic accuracy. J Ehealth Technol 76. Ma C-H, Zhang Y-X, Tang L-H, Yang X-J, Cui W-M, Han C-C, et al. MicroRNA-
Appl. (2018) 16(1):28–32. 1469, a p53-responsive microRNA promotes genistein induced apoptosis by targeting
Mcl1 in human laryngeal cancer cells. Biomed Pharmacother.(2018) 106:665–71. doi: 10.
56. Sherriff A, Ott J, Team AS. Artificial neural networks as statistical tools in
1016/j.biopha.2018.07.005
epidemiological studies: analysis of risk factors for early infant wheeze. Paediatr
Perinat Epidemiol. (2004) 18(6):456–63. doi: 10.1111/j.1365-3016.2004.00592.x 77. Gungormez C, Gumushan Aktas H, Dilsiz N, Borazan E. Novel miRNAs as
potential biomarkers in stage II colon cancer: microarray analysis. Mol Biol Rep.
57. Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, et al. Xgboost: extreme
(2019) 46:4175–83. doi: 10.1007/s11033-019-04868-7
gradient boosting. R package version 04-2. Journal of eHealth Technology and
Application. (2015) 16(1):1–4. 78. Lai J, Wang H, Pan Z, Su F. A novel six-microRNA-based model to improve
prognosis prediction of breast cancer. Aging (Albany NY). (2019) 11(2):649. doi: 10.
58. Kang C, Huo Y, Xin L, Tian B, Yu B. Feature selection and tumor classification
18632/aging.101767
for microarray data using relaxed Lasso and generalized multi-class support vector
machine. J Theor Biol. (2019) 463:77–91. doi: 10.1016/j.jtbi.2018.12.010 79. Rajarajan D, Selvarajan S, Charan Raja MR, Kar Mahapatra S, Kasiappan R.
Genome-wide analysis reveals miR-3184-5p and miR-181c-3p as a critical regulator
59. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a
for adipocytes-associated breast cancer. J Cell Physiol. (2019) 234(10):17959–74.
multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the
doi: 10.1002/jcp.28428
TRIPOD statement. Ann Intern Med. (2015) 162(1):55–63. doi: 10.7326/M14-0697
80. Alshamrani AA. Roles of microRNAs in ovarian cancer tumorigenesis: two
60. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an
decades later, what have we learned? Front Oncol. (2020) 10:1084. doi: 10.3389/
open-source package for R and S+ to analyze and compare ROC curves. BMC
fonc.2020.01084
Bioinform. (2011) 12(1):1–8. doi: 10.1186/1471-2105-12-77
81. Tuncer SB, Erdogan OS, Erciyas SK, Saral MA, Celik B, Odemis DA, et al.
61. Pal MK, Jaiswar SP, Dwivedi VN, Tripathi AK, Dwivedi A, Sankhwar P.
miRNA expression profile changes in the peripheral blood of monozygotic
MicroRNA: a new and promising potential biomarker for diagnosis and prognosis
discordant twins for epithelial ovarian carcinoma: potential new biomarkers for
of ovarian cancer. Cancer Biol Med. (2015) 12(4):328. doi: 10.7497/j.issn.2095-3941.
early diagnosis and prognosis of ovarian carcinoma. J Ovarian Res. (2020) 13
2015.0024
(1):1–15. doi: 10.1186/s13048-020-00706-8
62. Zhang B, Cai FF, Zhong XY. An overview of biomarkers for the ovarian cancer
82. Chijiiwa Y, Moriyama T, Ohuchida K, Nabae T, Ohtsuka T, Miyasaka Y, et al.
diagnosis. Eur J Obstet Gynecol Reprod Biol. (2011) 158(2):119–23. doi: 10.1016/j.
Overexpression of microRNA-5100 decreases the aggressive phenotype of pancreatic
ejogrb.2011.04.023
cancer cells by targeting PODXL. Int J Oncol. (2016) 48(4):1688–700. doi: 10.3892/
63. Chedotal A, Kerjan G, Moreau-Fauvarque C. The brain within the tumor: new ijo.2016.3389
roles for axon guidance molecules in cancers. Cell Death Differ. (2005) 12(8):1044–56.
83. Song Y, Zhu S, Zhang N, Cheng L. Blood circulating miRNA pairs as a robust
doi: 10.1038/sj.cdd.4401707
signature for early detection of esophageal cancer. Front Oncol. (2021) 11:723779.
64. Zhang WC, Chin TM, Yang H, Nga ME, Lunny DP, Lim EKH, et al. Tumour- doi: 10.3389/fonc.2021.723779
initiating cell-specific miR-1246 and miR-1290 expression converge to promote non-
84. Peña-Chilet M, Martínez MT, Pérez-Fidalgo JA, Peiró-Chova L, Oltra SS, Tormo
small cell lung cancer progression. Nat Commun. (2016) 7(1):11702. doi: 10.1038/
E, et al. MicroRNA profile in very young women with breast cancer. BMC Cancer.
ncomms11702
(2014) 14(1):1–14. doi: 10.1186/1471-2407-14-529
65. Imaoka H, Toiyama Y, Fujikawa H, Hiro J, Saigusa S, Tanaka K, et al. Circulating
microRNA-1290 as a novel diagnostic and prognostic biomarker in human colorectal 85. Hu J, Wang Z, Liao BY, Yu L, Gao X, Lu S, et al. Human miR-1228 as a stable
cancer. Ann Oncol. (2016) 27(10):1879–86. doi: 10.1093/annonc/mdw279 endogenous control for the quantification of circulating microRNAs in cancer
patients. Int J Cancer. (2014) 135(5):1187–94. doi: 10.1002/ijc.28757
66. Ye L, Jiang T, Shao H, Zhong L, Wang Z, Liu Y, et al. miR-1290 is a biomarker in
DNA-mismatch-repair-deficient colon cancer and promotes resistance to 5- 86. Ruggles DR, Freyman RL, Oxenham AJ. Influence of musical training on
fluorouracil by directly targeting hMSH2. Mol Ther Nucl Acids. (2017) 7:453–64. understanding voiced and whispered speech in noise. PLoS One. (2014) 9(1):e86980.
doi: 10.1016/j.omtn.2017.05.006 doi: 10.1371/journal.pone.0086980
67. Wang Q, Wang G, Niu L, Zhao S, Li J, Zhang Z, et al. Exosomal MiR-1290 promotes 87. Morishita A, Iwama H, Fujihara S, Sakamoto T, Fujita K, Tani J, et al.
angiogenesis of hepatocellular carcinoma via targeting SMEK1. J Oncol.(2021) MicroRNA profiles in various hepatocellular carcinoma cell lines. Oncol Lett. (2016)
2021:6617700. doi: 10.1155/2021/6617700 12(3):1687–92. doi: 10.3892/ol.2016.4853
68. Nakashima H, Yoshida R, Hirosue A, Kawahara K, Sakata J, Arita H, et al. 88. Chen X, Paranjape T, Stahlhut C, McVeigh T, Keane F, Nallur S, et al. Targeted
Circulating miRNA-1290 as a potential biomarker for response to resequencing of the microRNAome and 3′ UTRome reveals functional germline DNA
chemoradiotherapy and prognosis of patients with advanced oral squamous cell variants with altered prevalence in epithelial ovarian cancer. Oncogene. (2015) 34
carcinoma: a single-center retrospective study. Tumor Biol. (2019) 41 (16):2125–37. doi: 10.1038/onc.2014.117
(3):1010428319826853. doi: 10.1177/1010428319826853
89. Wang J, Raimondo M, Guha S, Chen J, Diao L, Dong X, et al. Circulating
69. Wei J, Yang L, Wu Y-n, Xu J. Serum miR-1290 and miR-1246 as potential microRNAs in pancreatic juice as candidate biomarkers of pancreatic cancer.
diagnostic biomarkers of human pancreatic cancer. J Cancer. (2020) 11(6):1325. J Cancer. (2014) 5(8):696. doi: 10.7150/jca.10094
doi: 10.7150/jca.38048
90. Ying H, Xu H, Lv J, Ying T, Yang Q. MicroRNA signatures of platinum-
70. Kobayashi M, Sawada K, Nakamura K, Yoshimura A, Miyamoto M, Shimizu A, resistance in ovarian cancer. Eur J Gynaecol Oncol. (2015) 36(1):16–20. doi: 10.
et al. Exosomal miR-1290 is a potential biomarker of high-grade serous ovarian 12892/ejgo2511.2015