0% found this document useful (0 votes)

13 views4 pages

Cancer Type Prediction and Classification Based On RNA-sequencing Data

Cancer Type Prediction and Classification Based on RNA-sequencing Data

Uploaded by

Suraiya Hasan 190042107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

Cancer Type Prediction and Classification Based On RNA-sequencing Data

Cancer Type Prediction and Classification Based on RNA-sequencing Data

Uploaded by

Suraiya Hasan 190042107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Cancer Type Prediction and Classification Based on RNA-

sequencing Data
Yi-Hsin Hsu, Dong Si*, Computer and Software Systems Department, University of Washington
Bothell

experiments and results. Section four discusses and



Abstract— Pan-cancer analysis is a significant research topic

in the past few years. Due to many advancing sequencing summaries the findings of this study.
technologies, researchers possess more resources and
knowledge to identify the key factors that could trigger cancer. II. METHOD
Furthermore, since The Cancer Genome Atlas (TCGA) project The workflow of the experiment is listed below (see Fig.
launched, using machine learning (ML) techniques to analyze 1).
TCGA data has been recognized as a useful solution in the line
of research. Therefore, this study uses RNA-sequencing data
Pre- Baseline Experiments Model Evaluation of
from TCGA and focuses on classifying thirty-three types of Raw Data
processing Measurement Design Training Models
cancer patients. Five ML algorithms include decision tree (DT),
k nearest neighbor (kNN), linear support vector machine Figure 1. The overview of workflow.
(linear SVM), polynomial support vector machine (poly SVM),
and artificial neural network (ANN) are conducted to compare A. Select Data
the performances of their accuracies, training time, precisions,
recalls, and F1-scores. The results show that linear SVM with a The raw data set of this study was from TCGA Pan-
95.8% accuracy rate is the best classifier in this study. Several Cancer analysis project [5] and downloaded from synapse.org
critical and sophisticated data pre-processing experiments are [8] which included 10,471 cancer patient samples across 33
also presented to clarify and to improve the performance of the cancer types. The attributes were a list of 20,531 genes; each
built model. gene had a measurement on the RNA expression levels. All
measurements in the raw data set were taken with the
I. INTRODUCTION Illumina HiSeq system [9].
The causes of tumor formation have been studied by
B. Class Labeling and TCGA Barcode
scientists for years. Since there are many subtypes in cancer,
it is challenging for researchers to identify the causes and Since the goal of this project is to classify 33 cancer
treatments regarding cancer. To characterize and identify types, additional class labels are needed before building the
different types of cancer, The Cancer Genome Atlas (TCGA) classification models. The labels were generated based on the
project emerged. Its goal is to demonstrate that gathering and TCGA barcodes. Each sample in the data set had a unique
analyzing large data can advance cancer research [1]. TCGA barcode. The barcode contained information including the
project has so far generated comprehensive genomic maps of cancer type and sample source site. By mapping out each
33 types of cancer. As TCGA provides an easily accessible barcode to the corresponding cancer type, we were able to
platform of genomic data [2], many scientists start to use label each sample with the correct cancer class.
TCGA data to discover genomic patterns in cancer [3][4]. TCGA barcodes also provided sample type information.
To understand the commonalities and differences across The raw data set included primary solid tumor samples,
cancer types, TCGA later launched the Pan-Cancer analysis secondary tumor samples, and other types of samples from
project [5]. Recently, there have been many studies [6][7] cancer patients. Because this study focuses on classifying
using machine learning (ML) algorithms to analyze TCGA cancer patients rather than cancer sample types, we remained
pan-cancer data sets and demonstrate its effectiveness in all types of samples in the study.
discovering cancer causes. As such, this study focuses on C. Data Pre-processing
using ML to build a reliable classification model which can
recognize 33 types of cancer patients. Five ML algorithms, To build a classification model with a reliable prediction
i.e., decision tree (DT), k nearest neighbor (kNN), linear rate, a data pre-processing step is needed. First, the raw data
support vector machine (linear SVM), polynomial support were checked to ensure there were no missing or duplicated
vector machine (ploy SVM), and artificial neural network values. Second, any row or column with all zero values was
(ANN) were tested and compared in sequence in this study. removed from the data set. As such, 212 attributes found with
The rest of this paper is organized as follows: Section two all zero values were removed. After these two steps, the
describes the procedures and methods. Section three presents attributes were reduced from 20,531 to 20,319 but the total
number of samples remained the same. Next, several data
pre-processing methods were considered for model training
preparation.
Yi-Hsin Hsu, Dong Si are with the Department of Computer and
Software Systems, University of Washington Bothell, Bothell, WA 98011 Feature Selection: When observing raw data, there were
USA (e-mail: [email protected], corresponding author to provide email: several issues which might affect the training results. First,
[email protected]).

U.S. Government work not protected by U.S. copyright 5374

Authorized licensed use limited to: Cornell University Library. Downloaded on September 04,2020 at 05:52:45 UTC from IEEE Xplore. Restrictions apply.
the number of attributes was relatively larger than the number 𝑥 − min(𝑥)
𝑥′ = (new max(𝑥) − new min(𝑥)) + new min(𝑥)
of samples, so a feature selection or a feature reduction max(𝑥) − min(𝑥) 
method would be useful [10][11]. Therefore, tree classifier
and variance threshold [12] methods were adopted to select The other approach was the standardization scaling (2). This
features. The tree classifier method is to select a subset of method standardizes features by removing the mean of each
features based on their importance scores. The variance data point and scaling each feature to unit variance.
threshold method filters out features with variances less than 𝑥−𝑥
𝑥′ =
threshold. Hence, the filtered data set contains high variance 𝜎 
attributes only. Both data selection methods were used in
later experiments to see which method provided a better III. EXPERIMENTS AND RESULTS
result. A. Baseline Measurement
Imbalanced Classes: As mentioned above, the second First, without applying any pre-processing method, a
issue was that 33 classes were imbalanced (see Fig. 2). The baseline measurement was conducted to evaluate the original
outcomes of five ML algorithms. The measurement results
can be used to compare with the performances of later
Number of Samples in Each Cancer Class experiments using pre-processing procedures.
1400
Baseline Measurement Data Set and Methods: The
Number of Samples

1200

1000
baseline data set removed 212 attributes which had all zero
values, and then normalized through the standardization
800
method. In other words, the over-sampling, under-sampling,
600
and feature selection methods were not applied in baseline
400 measurement phase. Next, the data set was split into a 80%
200 training set and a 20% test set. After training was done, the
0 accuracy scores were calculated. The accuracy score is the
THYM
KIRC
LUAD

LGG

MESO
LUSC

LAML
STAD

KIRP

GBM

KICH

ACC
PRAD

SKCM

BLCA

COAD

UCS
SARC

CHOL
BRCA

HNSC

UCEC

PCPG

READ

UVM

DLBC
THCA

LIHC

CESC

ESCA

PAAD

TGCT

proportion of samples which are correctly classified in the

Cancer Type test set. Besides, the average precisions, recalls, F1-scores,
and training time were also calculated to compare the
Figure 2. Each cancer type along with the number of samples. performances. The baseline training results are presented in
Table I.
breast invasive carcinoma (BRCA) had 1,218 samples while
cholangiocarcinoma (CHOL) held 45 samples only. The TABLE I. BASELINE MEASUREMENT RESULTS
difference between the largest sample class (i.e., BRCA) and
the smallest sample class (i.e., CHOL) was 1,173 cases. Testing Accuracy Training Ave. Ave. Ave.
Variables Score Time Precision Recall F1
To fix the imbalanced problem, two methods were
23m 42s
utilized to cope with it. First, under-sampling 32 cancer DT 0.86014
121ms
0.86 0.86 0.86
classes to the same size as the smallest sample class. Since 30s
CHOL was the smallest sample class, we randomly selected kNN 0.89212 0.90 0.89 0.89
751ms
45 samples each from the other 32 cancer classes and pooled Linear
0.94988 ~4hr 0.95 0.95 0.95
them together to generate a total of 1,485 sample data set. In SVM
other words, this method selected 1,485 unique samples, and Poly 52m 52s
0.76754 0.86 0.77 0.77
SVM 518ms
all classes were balanced. However, this relatively small 18m 43s
sample size data set might impact the accuracy rate of a ANN 0.94797 0.95 0.95 0.95
312ms
classifier. To create a balanced and decent sample size data
set, another method was considered. Based on the raw data,
the average number of samples per class was 317. Thus, Baseline Measurement Results: As displayed in Table I,
choosing 300 samples per class to create a total of 9,900 the linear SVM had the highest accuracy score but longest
sample data set was reasonable. For classes with more than training time among five models. The long training time
300 samples, under-sampling them by randomly selecting indicated that the data set was relatively big, so a feature
300 unique samples for each class; for classes with less than selection method was needed. Besides, looking at the
300 samples, over-sampling them by randomly choosing precision, recall, and F1-score for each class, we found that
from their sample pools to fill up to a total of 300 samples for for classes with relatively low number of samples, the
each class. Finally, the data set with 45 samples per class and precision, recall, and F1-scores were all relatively low
the data set with 300 samples per class were tested in later comparing to other classes. Therefore, a balanced data set
experiments. was needed to improve the performance of a classifier.
Normalization: The mean and standard deviation of the In addition, when comparing all criteria, ANN was the
attributes widely spread in the raw data. Therefore, two best classifier among five models. In this experiment, two
normalization methods were adopted. The min-max scaling hidden layers, one with 850 neurons and the other with 800
(1) is a common feature scaling method. It scales each feature neurons, were used to train the ANN model. The activation
separately to make it in the same given range. function used was rectified linear unit function.

5375

Authorized licensed use limited to: Cornell University Library. Downloaded on September 04,2020 at 05:52:45 UTC from IEEE Xplore. Restrictions apply.
In conclusion, the baseline measurement results not only
gave us a basic understanding regarding the performance of
each algorithm, but revealed the long training time and
imbalanced class problems should be fixed.
B. Experiments Design
As discussed above, several pre-processing methods were
selected to use in the following experiments. The tested pre-
processing methods and ML algorithms were listed as
follows:
 Feature Selection: Tree Classifier and Variance
Threshold
 Under-sampling and Over-sampling: 45 unique
samples per class (balanced data set), 300 repeated
samples per class (balanced data set), and 10,471
original samples (imbalanced data set)
 Normalization: Min-max and Standardization
 ML Algorithms: DT, kNN, linear SVM, poly SVM, Figure 3. Four box plots (A,B,C, and D) demonstrated how each testing
variable impacts the accuracy score among 21 experiments.
and ANN
C. Model Training As noted, in baseline measurement, the problem of linear
SVM was long training time. However, in the later 21
There is a total of 60 testing scenarios based on the listing
experiments, the median training time was much shorter. It
testing variables. Considering the complexity of testing all
was even better than ANN and DT (see Fig. 4). In total,
scenarios, 21 important experiments were selected and
considering accuracy score, F1-score, and training time, the
conducted to measure their performances. In these 21
linear SVM was the best algorithms among all. Besides, poly
experiments, there were five DT and poly SVM experiments
SVM performed much better in later 21 experiments than in
for each, four kNN and linear SVM experiments for each,
the baseline experiment in accuracy score and training time.
and three ANN experiments.
DT performed better than kNN in accuracy and F1-score, but
The whole data set was split into a 80% training set and a it took longer to train models. On the other hand, although
20% testing set. All 21 experiments were analyzed and kNN had the lowest accuracy score on average, it took the
compared by their accuracy scores, training time, precisions, shortest amount of time on training.
recalls, and F1-scores to identify the best model and best
testing scenario.
D. Results
Among 21 experiments, each testing variable was
compared side-by-side (see Fig. 3). As seen, min-max scaling
and standardization scaling were very close at their median
accuracy scores. However, since min-max scaling had
smaller interquartile range, the performance could be
considered more stable. Similarly, tree classifier and variance
threshold were close at their median accuracies, but
comparing the interquartile range, tree classifier was more Figure 4. A. box plot is F1-scores of each algorithm; B. box plot is
stable. training time of each algorithm.
On the other hand, comparing the balanced data sets
versus the imbalanced data set, the results had a big E. Validation
difference. The data set with 45 samples per class had the Among 21 models, the best model from each algorithm
lowest accuracy score on average comparing to other two was selected and run on 5-fold cross validation. Both
data sets. The reason might be related to the small total accuracy score and validation score were used to determine
sample size. Conversely, the data set with 300 samples per the top performance model (see Table II).
class had the highest accuracy score on average. The reason
can be linked to the decent total sample size and balanced TABLE II. THE PERFORMANCE OF THE BEST MODEL FROM EACH
ALGORITHM
classes. Lastly, five ML algorithms were compared with their
accuracy scores. Linear SVM was still the best classifier Model DT kNN
Linear Poly
ANN
among five algorithms in terms of accuracy rate. This result SVM SVM
was the same as in the baseline measurement. Linear SVM Accuracy
0.92222 0.86313 0.95808 0.94545 0.91515
score
also demonstrated the highest F1-score on average comparing Cross
to other algorithms (see Fig. 4). validation 0.92444 0.87455 0.94980 0.94030 0.91394
score

5376

Authorized licensed use limited to: Cornell University Library. Downloaded on September 04,2020 at 05:52:45 UTC from IEEE Xplore. Restrictions apply.
The 5-fold cross validation score results confirmed that C. Future Work
linear SVM was the top classifier among 5 models. The This study presents a high accuracy classification model.
accuracy score was 0.95808 and the cross validation was However, there are still many unknown questions need to be
0.94980. The pre-processing methods of this model were addressed. First, we would like to see if these pre-processing
variance threshold and standardization scaling. The data set methods are also applicable to other types of genomic data,
with 300 samples per class was used to train this model. In or even clinical data. Second, as the genomic data normally
addition, linear SVM also demonstrated the largest area have a relatively large number of features or genes, using
under the curve (AUC) in the receiver operating other methods to reduce features might be useful. Lastly,
characteristic (ROC) curve (see Fig. 5). since imbalanced data is a very common issue in biomedical
data, it needs to apply different strategies to cope with this
problem to improve the performance of a classifier.

ACKNOWLEDGMENT
This work was supported by the Graduate Research
Award from the Computing and Software Systems division
of University of Washington Bothell and the startup fund 74-
0525.

REFERENCES
[1] K.A. Hoadley, C. Yau, and D.M. Wolf, “Multiplatform analysis of 12
cancer types reveals molecular classification within and across tissues
of origin,” Cell, vol. 158, pp. 929–944, 2014.
[2] K. Tomczak, P. Czerwinska, and M. Wiznerowicz, “The Cancer
Genome Atlas (TCGA): an immeasurable source of knowledge,”
Figure 5. The ROC curve along with the AUC for the best model of each Contemporary Oncology, vol. 19 (1A), A68-77, 2015.
algorithm. [3] T.Q. Gan, Z.C. Xie, R.X. Tang, “Clinical value of miR-145-5p in
NSCLC and potential molecular mechanism exploration: A
IV. CONCLUSION AND DISCUSSION retrospective study based on GEO, qRT-PCR, and TCGA data,”
Tumor Biology, vol. 39, pp. 1-23, 2017.
A. Conclusion [4] Y. Guo, Q. Sheng, J. Li, “Large Scale Comparison of Gene
Expression Levels by Microarrays and RNAseq Using TCGA Data,”
Comparing the beginning baseline and later 21 PLoS One, vol. 8, pp. 1-10, 2013.
experiments, it is suggested that linear SVM was the best [5] Cancer Genome Atlas Research Network, J.N. Weinstein, E.A.
classifier in terms of accuracy score, F1-score, training time, Collisson, G.B. Mills, “The Cancer Genome Atlas Pan-Cancer
and AUC among 5 algorithms. Since original data set is big analysis project,” Nature Genetics, vol. 45, pp. 1113–1120, 2013.
and imbalanced, it is impossible to directly run and test on [6] A.G. Telonis, R. Magee, P. Loher, “Knowledge about the presence or
ML algorithms. This study adopted several effective data pre- absence of miRNA isoforms (isomiRs) can successfully discriminate
processing approaches to improve the performance of a amongst 32 TCGA cancer types,” Nucleic Acids Research, vol. 45, pp.
2973–2985, 2017.
classifier. Using ML for solutions to the problems of [7] K. Kourou, T.P. Exarchos, K.P. Exarchos, “Machine learning
predicting cancer types based on gene expression levels, this applications in cancer prognosis and prediction,” Computational and
study provides further information for understanding how to Structural Biotechnology Journal, vol. 13, pp. 8-17, 2015.
use RNA-sequencing data to build a reliable classification [8] L. Omberg, K. Ellrott, Y. Yuan, “Enabling transparent and
model. collaborative computational analysis of 12 tumor types within The
Cancer Genome Atlas,” Nature Genetics, vol. 45, pp. 1121-1126,
B. Discussion 2013.
Comparing baseline and later 21 experiments, we [9] A.E. Minoche, J.C. Dohm, H. Himmelbauer, “Evaluation of genomic
high-throughput sequencing data generated on Illumina HiSeq and
confirmed that normalization, feature selection, balanced Genome Analyzer systems,” Genome Biology, vol. 12, R112, 2011.
classes are all key factors which have great impacts on the [10] B. Zhang, X. He, F. Ouyang, “Radiomic machine-learning classifiers
performance of models. However, identifying which pre- for prognostic biomarkers of advanced nasopharyngeal carcinoma,”
processing methods are useful before model training is still Cancer Letters, vol. 403, pp. 21-27, 2017.
an unclear question. Based on the experiment results, there [11] C. Parmar, P. Grossmann, J. Bussink, “Machine Learning methods for
was no significant evidence suggesting which normalization Quantitative Radiomic Biomarkers,” Scientific Reports, vol. 5, 13087,
2015.
or feature selection method was better. These questions need
[12] Y. Saeys, I. Inza, P. Larranaga, “A review of feature selection
further clarification. Although the data set with 300 samples techniques in bioinformatics,” Bioinformatics, vol. 23, pp. 2507-2517,
per class was the best sampling data set in experiments, there 2007.
was a large number of repeated samples existed in this data
set. Therefore, new data are needed to further validate the
model. In this study, linear SVM was the best algorithm for
training this sequencing data set, and this finding was
consistent with previous studies. However, it is important to
note that if the data set includes more cancer types, whether
linear SVM is still the best algorithm is worth noting.

5377

Authorized licensed use limited to: Cornell University Library. Downloaded on September 04,2020 at 05:52:45 UTC from IEEE Xplore. Restrictions apply.

Android AND-401 v2018-03-22 q208
No ratings yet
Android AND-401 v2018-03-22 q208
72 pages
5f1e PDF
No ratings yet
5f1e PDF
17 pages
Panasonic SC UA3
No ratings yet
Panasonic SC UA3
52 pages
Machine Learning Based Approaches For Cancer Classification Using Gene Expression Data
No ratings yet
Machine Learning Based Approaches For Cancer Classification Using Gene Expression Data
12 pages
A Privacy-Preserving and Untraceable Group Data Sharing Scheme in Cloud Computing
No ratings yet
A Privacy-Preserving and Untraceable Group Data Sharing Scheme in Cloud Computing
13 pages
Dissertation Durmaz
No ratings yet
Dissertation Durmaz
204 pages
RPGToolBox Manual
No ratings yet
RPGToolBox Manual
156 pages
Convolutional Neural Network Models For Cancer Typ PDF
No ratings yet
Convolutional Neural Network Models For Cancer Typ PDF
34 pages
Classification of Cancerous Profiles Using Machine Learning
No ratings yet
Classification of Cancerous Profiles Using Machine Learning
38 pages
Xtream Iptv Activation Code 2025 - Compress
86% (7)
Xtream Iptv Activation Code 2025 - Compress
18 pages
Project (Sec: 01)
No ratings yet
Project (Sec: 01)
10 pages
Machine Learning For Engineers: Ryan G. Mcclarren
No ratings yet
Machine Learning For Engineers: Ryan G. Mcclarren
252 pages
Accepted Manuscript: 10.1016/j.cmpb.2017.09.005
No ratings yet
Accepted Manuscript: 10.1016/j.cmpb.2017.09.005
17 pages
Evolutionary Neural Network
No ratings yet
Evolutionary Neural Network
19 pages
Research - 1
No ratings yet
Research - 1
19 pages
Cancer Detection - Formal Version
No ratings yet
Cancer Detection - Formal Version
38 pages
Cancerdiscover: An Integrative Pipeline For Cancer Biomarker and Cancer Class Prediction From High-Throughput Sequencing Data
No ratings yet
Cancerdiscover: An Integrative Pipeline For Cancer Biomarker and Cancer Class Prediction From High-Throughput Sequencing Data
9 pages
Paper 2 Cancer Dicovery
No ratings yet
Paper 2 Cancer Dicovery
18 pages
2nd Review PPT Template
No ratings yet
2nd Review PPT Template
13 pages
Cancerous Profiles - 2017 - Conference - Paper
No ratings yet
Cancerous Profiles - 2017 - Conference - Paper
6 pages
Mathematics 11 04937 v2
No ratings yet
Mathematics 11 04937 v2
40 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
14 pages
DeepGene - An Advanced Cancer Type Classifier Based On Deep Learning and Somatic Point Mutations
No ratings yet
DeepGene - An Advanced Cancer Type Classifier Based On Deep Learning and Somatic Point Mutations
14 pages
TSP_CMC_44065
No ratings yet
TSP_CMC_44065
26 pages
Random walk with restart
No ratings yet
Random walk with restart
22 pages
Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis a Review
No ratings yet
Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis a Review
21 pages
Neon DNA The Human Body Recipe Presentation
No ratings yet
Neon DNA The Human Body Recipe Presentation
18 pages
Final Year Project
No ratings yet
Final Year Project
10 pages
Machine learning 2
No ratings yet
Machine learning 2
18 pages
s12859-023-05622-4 (1)
No ratings yet
s12859-023-05622-4 (1)
19 pages
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
No ratings yet
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
11 pages
Leveraging TCGA Gene Expression Data To Build Predictive Models For Cancer Drug Response
No ratings yet
Leveraging TCGA Gene Expression Data To Build Predictive Models For Cancer Drug Response
11 pages
The Role and Applications of Artificial Intelligence A I in Disaster Management
No ratings yet
The Role and Applications of Artificial Intelligence A I in Disaster Management
20 pages
1-s2.0-S0957417421017590-main
No ratings yet
1-s2.0-S0957417421017590-main
10 pages
Classification of Kidney Cancer Data Using Depth Aware Generative Adversarial Networks Approach
No ratings yet
Classification of Kidney Cancer Data Using Depth Aware Generative Adversarial Networks Approach
8 pages
[email protected]
No ratings yet
[email protected]
10 pages
Gene Expression Analysis On Cancer Dataset
No ratings yet
Gene Expression Analysis On Cancer Dataset
11 pages
Deep Learning Predictive Model for Colon Cancer
No ratings yet
Deep Learning Predictive Model for Colon Cancer
10 pages
8.A_Comparative_Study_on_Classification_Methods_for_Renal_Cell_and_Lung_Cancers_Using_RNA-Seq_Data
No ratings yet
8.A_Comparative_Study_on_Classification_Methods_for_Renal_Cell_and_Lung_Cancers_Using_RNA-Seq_Data
9 pages
cancer
No ratings yet
cancer
9 pages
Cancer Detection and Analysis Using Machine Learning: Abstract-Among The Various Types of Diseases, Cancer Is
No ratings yet
Cancer Detection and Analysis Using Machine Learning: Abstract-Among The Various Types of Diseases, Cancer Is
5 pages
Artikel Data Science yohana juniati sitorus b.indo.id.en
No ratings yet
Artikel Data Science yohana juniati sitorus b.indo.id.en
7 pages
CBAC
No ratings yet
CBAC
6 pages
2012 IJCSE Gene Expression
No ratings yet
2012 IJCSE Gene Expression
6 pages
Revolutionizing cancer classification: the snr-ogscc method for improved gene selection and clustering
No ratings yet
Revolutionizing cancer classification: the snr-ogscc method for improved gene selection and clustering
7 pages
1 s2.0 S0079610722000803 Main
No ratings yet
1 s2.0 S0079610722000803 Main
13 pages
A Microarray Gene Expression Data Classification U
No ratings yet
A Microarray Gene Expression Data Classification U
14 pages
Microarray gene expression classification: dwarf mongoose optimization with deep learning
No ratings yet
Microarray gene expression classification: dwarf mongoose optimization with deep learning
9 pages
Aditya Predictive
No ratings yet
Aditya Predictive
12 pages
Artikel Data Science Yohana Juniati Sitorus b.indo.Id.en
No ratings yet
Artikel Data Science Yohana Juniati Sitorus b.indo.Id.en
7 pages
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
No ratings yet
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
11 pages
An Overview of Content Analysis
No ratings yet
An Overview of Content Analysis
20 pages
Iot Based Transformer Monitoring System: Savitribai Phule Pune University
No ratings yet
Iot Based Transformer Monitoring System: Savitribai Phule Pune University
69 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
2019 Faculty of Computer Science and Information Technology-3
No ratings yet
2019 Faculty of Computer Science and Information Technology-3
20 pages
s41598-018-34753-5
No ratings yet
s41598-018-34753-5
8 pages
Floating Point Numbers: The Architecture of Computer Hardware and Systems Software
No ratings yet
Floating Point Numbers: The Architecture of Computer Hardware and Systems Software
28 pages
Supervised Learning Approach For Human Liver Cancer Diagnosis
No ratings yet
Supervised Learning Approach For Human Liver Cancer Diagnosis
10 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Deep Learning For Comp Bio Review
No ratings yet
Deep Learning For Comp Bio Review
16 pages
Neural Network
No ratings yet
Neural Network
15 pages
Classification of Genetic Mutations for Cancer
No ratings yet
Classification of Genetic Mutations for Cancer
6 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Entry Level Tax Associate Cover Letter
100% (1)
Entry Level Tax Associate Cover Letter
7 pages
Foml Project Report
No ratings yet
Foml Project Report
8 pages
mutthiman2
No ratings yet
mutthiman2
2 pages
Isefresearchplan
No ratings yet
Isefresearchplan
2 pages
G5 Q4W2 DLL ENGLISH (MELCs)
No ratings yet
G5 Q4W2 DLL ENGLISH (MELCs)
14 pages
TMS 9000 & RT 9150 Oms
No ratings yet
TMS 9000 & RT 9150 Oms
10 pages
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
No ratings yet
Feature Selection For Breast Cancer Detection Using Machine Learning Algorithms
4 pages
Pertemuan 2 - Ai
No ratings yet
Pertemuan 2 - Ai
25 pages
Reddy 2019
No ratings yet
Reddy 2019
5 pages
Adams 2021.2 Getting Started Using Adams Car Ride
No ratings yet
Adams 2021.2 Getting Started Using Adams Car Ride
28 pages
Duplichecker-Plagiarism-Report
No ratings yet
Duplichecker-Plagiarism-Report
3 pages
Cambridge International AS & A Level: Computer Science 9618/02
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/02
14 pages
K-0014 (Paper-I) (W) PDF
No ratings yet
K-0014 (Paper-I) (W) PDF
24 pages
Pid Control
No ratings yet
Pid Control
18 pages
Kohler Service Manual - Alternators Only - Industrial Generator Sets TP6353
No ratings yet
Kohler Service Manual - Alternators Only - Industrial Generator Sets TP6353
64 pages
October 2012: Questions Must Be Answered in Accordance With The Following
No ratings yet
October 2012: Questions Must Be Answered in Accordance With The Following
31 pages
FVC Format
No ratings yet
FVC Format
2 pages
Scala Basic Interview Questions
No ratings yet
Scala Basic Interview Questions
16 pages
Ali Hejazizo: - Curriculum Vitae
No ratings yet
Ali Hejazizo: - Curriculum Vitae
3 pages
CCR1009 7G 1C 1splus 170313141507
No ratings yet
CCR1009 7G 1C 1splus 170313141507
2 pages
Thb7128 Instructions
No ratings yet
Thb7128 Instructions
9 pages
Harvey Hermes
No ratings yet
Harvey Hermes
1 page
Aringay National High School
No ratings yet
Aringay National High School
2 pages
Graph Objects - Python - Plotly
No ratings yet
Graph Objects - Python - Plotly
1 page
Computational Intelligence and its Applications
From Everand
Computational Intelligence and its Applications
Vikash Yadav
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Cancer Type Prediction and Classification Based On RNA-sequencing Data

Uploaded by

Cancer Type Prediction and Classification Based On RNA-sequencing Data

Uploaded by

Cancer Type Prediction and Classification Based on RNA-

experiments and results. Section four discusses and

Abstract— Pan-cancer analysis is a significant research topic

U.S. Government work not protected by U.S. copyright 5374

proportion of samples which are correctly classified in the

You might also like