0% found this document useful (0 votes)

24 views92 pages

2023-GenomicaFuncional y Biocomputacion-Day1

Uploaded by

maitelarzabaleso

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views92 pages

2023-GenomicaFuncional y Biocomputacion-Day1

Uploaded by

maitelarzabaleso

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

DAY 1

Genómica Funcional y
Biocomputación-2023/2024

Day 1 : Introduction to transcriptomics. RNAseq. Practicals (Galaxy

and Bioconductor).

Day 2: Discussion of RNAseq practicals (Galaxy). A real case

analyses explained in deep. Practicals (Galaxy).

Day 3: Introduction to proteomics and proteogenomics. Practicals

(Galaxy).

Day 4: Introductions to networks and RNA interactions networks.

In Molecular Biology we focus on
1957, Crick
-Omics Universe

Genomics Transcriptomics
Proteomics

-Add your favourite “epi” stuff

-Omics combos
Genomics

+ Gene Regulatory
Networks

Transcriptomics
-Omics combos
Genomics

Epigenomics

+ Genome 3D

ATACseq, etc

Proteomics
Protein-DNA regulatory Networks
-Omics combos
Transcriptomics

+
Proteogenomics

Proteomics
-Omics combos
Proteomics

+
Protein Interactions
Networks
Proteomics
Russ F. Doolittle: January 10,
1931 – October 11, 2019

Roots are in the sequences

The Roots

“Sequences, the simple order of individual units in biological

polymers, are at the heart of bioinformatics, and the search for
relationships among them and the reconstruction of their
histories has arguably proved the most informative of biological
inquiries.”
The founder
Illustration by Gloria Fuentes
@glogliiita • Trained in math and quantum chemistry.
• Associate director of the back-then National Biomedical
Research Foundation (Now NIH).
• Wrote seminal FORTRAN programs to derive amino
acids sequences by using partial overlaps of
fragmented amino acid sequences (From months to
minutes!).
Developments

• Created the Protein Atlas (a 8 women/1 man team)

• Her work on organelles was essential for the scientific
community to accept Lynn Margulis’ endosymbiotic
theory.
• Worked with Sagan to model planetary atmospheres.

• Realized the potential of computer

Margaret Dayhoff applications to nucleic acids and gene
sequences.

“The mother and father of Bioinformatics” (D. Lipman)

Modified from Aaron Quinlan
https://github.com/quinlan-lab/applied-computational-genomics
Transcriptomics

Total RNA All organisms

Eukaryotes only

Coding RNA Functional RNA

4 % of total 96 % of total

Pre-mRNA
(hnRNA) Pre-rRNA miRNA siRNA
Pre-tRNA snRNA snoRNA

mRNA rRNA tRNA

Transcriptomics

The complete set of RNA transcripts produced from the genome, (under
different conditions at particular place and time).
Microarrays, RNA seq

Comparison of gene expression under different conditions (before/after

treatment, during development, cancer vs normal cells,….)

Transcriptome assembly – discoveries of novel genes, non-coding RNAs,

novel splicing variants of known genes

An alternative to genome sequencing and assembly of a species with

unknown genome, when we are interested only in expressed genes
Transcriptomics- Techiques

• mRNA-seq
• Exome capture
• Targeted
• Small RNA (miRNAs,pRNAs,sncRNA)
• Total RNA
• Ribosome profiling
• Single Cell RNA-Seq
Transcriptomics- Applications

Discovery: Differential expression:

• Transcripts • Gene level expression changes
• Isoforms • Relative isoform abundance
• Splice junctions • Splicing patterns
• Fusion genes

Variant calling
From Microarray to RNAseq

That is from data poor to data intensive

A reminder: Microarray
More limited Getting cheaper

Microarray: RNA-seq:
Requires prior Comprehensive view
knowledge Best dynamic range
Higher throughput Isoform discovery
Analyses is more user- Can detect SNV
friendly
From Microarray to RNAseq
• The Dynamic range concept

Huge dynamic range of mRNA abundance

(Some mRNAs have only a few copies per
cell, while the most abundant ones have
>10,000 copies per cell).

log-transformed data for RNA-seq and

microarray are plotted, -> don’t look
uniformly distributed around the trend line

Dynamic range of RNA-seq dependent on

seq. depth (microarray ~ fixed dynamic
range).

Zhao et al. (2014), PLOS One

RNAseq= mRNAs

Experimental Design

Sequencing Design

Quality control

Alignment & quantification

Differential Expression

Functional profiling
Experimental Design

sRNA mRNA
AAAA RNA integrity number (RIN)
AAAA
Size select by
PolyA select
PAGE or kit
AAAA
Ligate RNA AAAA
adapter
Fragment

Convert to cDNA

Construct library

Sequence

Ideally, samples with high RIN (8) are used in

RNA sequencing experiments.
New transcript discovery
Quantitation
Experimental Design

Multiplexing

modified from Malone JH, Oliver B (2011) BMC Biol.

Considerations Experimental Design

Cheaper Improves mapping of repeats,

and across exon-exon junctions
Improvess Accuracy for low
expressed genes
Experimental Design
Considerations: adequate read depth to dected
low-expression genes

http://info.l7informatics.com/blog/ngs-101-biology-is-a-big-data-problem
Experimental Design
More sequences or more replication?

Hor High and medium: more replicates

For LowE: more replicates and more 2014 Feb 1;30(3):301-4. doi:
depth 10.1093/bioinformatics/btt688.
Experimental Design

A protein nanopore is set in an electrically

resistant polymer membrane. An ionic current
is passed through the nanopore by setting a
voltage across this membrane. If an analyte
passes through the pore or near its aperture,
this event creates a characteristic disruption in
current (as shown in the diagram below).
Measurement of that current makes it possible
to identify the molecule in question.

Long Reads=many thousands of bases

https://nanoporetech.com/how-it-works
Secuenciación en mi portátil (MinIon, Nanopore)
Experimental Design
Experimental Design
Sequencing Design
Sequencing Design
Distribution of RNA species in NGS samples after poly-A Sequencing Design
enrichment (mRNA) and rRNA depletion (whole transcriptome)
sequencing

More depth needed in

total RNA to achieve
same sensitivity in
measuring DE

Enriched mRNA gives

Duplicates (PCR or optical)

Index swapping:

Sequencing errors
File Quality control
FASTQC Quality control

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
MultiQC Quality control

https://multiqc.info/examples/rna-seq/multiqc_report.html#general_stats
Quality control
Quality control

• Useful to diagnose problem with the library preparation

• Contamination can produce peaks
• rRNA also peaks if a total RNA approach
Fastq Screen: compares a sub- Quality control
sample of your library against different
genomes

Mouse DNA-seq exp with %10 of

human contamination

What is the Proportion of reads coming from different references?

To check if NGS come from the intended genome

Samples are often contaminated (i.e. human end up in mouse cDNA)
Quality control

https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
% of reads mapped to a Quality control
reference Normally 80-90% will map
(rest do not due to variety reasons)
Quality control
PCR duplicates Quality control

Duplication indicates a problem with PCR amplification

DAY 2
Analyses Overview

HiSAT2

FeatureCounts
Analyses Overview
Alignment and quantification
Alignment and quantification
Reference, splice-aware

• Reads mapped directly to a

ref genome (allows splice
junctions)

• Require large amount of

memory and CPU time
(usually run on clusters)

• For discovery of isoforms:

FULL alignment

• Tools: STAR, HiSAT2, TopHat2

Alignment and quantification
Pseudoaligment
which does not identify the positions
of the reads in the transcripts, only
their potential transcripts of origin

• Maps to a reference
transcriptome

• Does not identify the

positions of the reads in the
transcripts, only their
potential transcripts of origin

• Extremely fast

• Tools: Salmon, Kallisto From Bray et al. Near-optimal probabilistic

RNA-seq quantification, Nature Biotechnology,
2016
Overview of kallisto. The input consists of a
Pseudoaligment reference transcriptome and reads from an
Kallisto, Salmon… RNA-seq experiment.

(a) An example of a read (in black) and three

overlapping transcripts with exonic regions
as shown.

(b) An index is constructed by creating the

transcriptome de Bruijn Graph (T-DBG) where
nodes (v1, v2, v3, … ) are k-mers, each transcript
corresponds to a colored path as shown and the
path cover of the transcriptome induces a k-
compatibility class for each k-mer.

(c) Conceptually, the k-mers of a read are hashed

(black nodes) to find the k-compatibility class of a
read.

(d) Skipping (black dashed lines) uses the

information stored in the T-DBG to skip k-mers
that are redundant because they have the same
k-compatibility class.

(e) The k-compatibility class of the read is

determined by taking the intersection of the k-
From Bray et al. Near-optimal probabilistic RNA-seq
compatibility classes of its constituent k-mers.
quantification, Nature Biotechnology, 2016
Alignment and quantification

https://bioinfo.iric.ca/understanding-how-kallisto-works/
Building the T-DBG graph Alignment and quantification
and the kallisto index.
• All transcript sequences are decomposed into k-mers (here k=5) to construct the
colored de Bruijn graph.
• The idea is that each different transcript will lead to a different path in the graph

ACGTG ATGA

ATGA
ATGA

ACGTG ATGA
ACGTG

ACGTG ATGT
ATGT

https://bioinfo.iric.ca/understanding-how-kallisto-works/
Reads are decomposed into k-mers (k=5 Alignment and quantification
here too) and the pre-built index is used to
determine the k-compatibility classes of
each k-mer.

For read 1, the intersection of all the k-

compatibility classes of its k-mers suggests
that it might come from transcript 1 or
transcript 2.

https://bioinfo.iric.ca/understanding-how-kallisto-works/
Summarization

Reads can be higher because

• There are more transcripts

• The transcripts are longer
Summarisation/Counting
Summarisation/Counting

To discard:
• Not unique mapping
• Positions overlap with many genes
• Poor quality alignment
• If pair end, only one reads matches gene
Summarisation/Counting
Normalisation
Normalisation/Scaling
Normalisation/RPKM
• Gene length based normalisation
• Library size, number of reads

Highly expressed genes or highly DE

genes can distort values!!!
• Highly expressed genes or highly DE
genes can distort values!!!
Normalisation/geometric scaling

1
Normalisation/trimmed mean of M weighted trimmed mean of
the log expression ratios (trimmed mean of M values (TMM)
“highly expressed genes and those that have a large variation of expression are
excluded”

Tech higher expression in kidney

replicates
liver
distribution
of M values
(liver to
kidney)
skewed to
negative

higher expression in kidney

Differential Expression
Modelling Differential Expression
Differential Expression
8 replicates, 2 conditions
Dimensionality reduction: PCA Differential Expression

More replicates
Which pipeline? Differential Expression

Aligner/Modeler/DEtester
(i.e.ThCuNo)

The DE tool is the one making the difference

Differential Expression

If we want to validate most! (high precision pipeline)

If we want as much as genes as

possible or enrichment studies
(high recall pipeline)
Differential Expression

The multiple testing issue

The hype…
Differential Expression

• Testing for DE expression at EACH gene is ONE

experiment

• Testing across THOUSANDS of genes requires

correction for multiple comparisons

• Bonferroni vs False Discovery Rate.

• FDR aims to keep the total population false positive

rate below a threshold (usually 5%).
Differential Expression

FDR-corrected value (qval)

LogFC
Functional analyses
Put data in Biological Context
https://usegalaxy.eu/

TRANSCRIPTOMICS

https://training.galaxyproject.org/training-
material/topics/transcriptomics/tutorials/ref-based/tutorial.html

Time estimation: 8 hours

Repetir el Mapping con HISAT2 en lugar de STAR.
TRANSCRIPTOMICS

https://training.galaxyproject.org/training-
material/topics/transcriptomics/tutorials/ref-based/tutorial.html
https://usegalaxy.eu/
Quality control FastQC
MultiQC: to aggregate data

Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
Nihms 977214
No ratings yet
Nihms 977214
21 pages
Complete Bulk RNA Sequencing Presentation
No ratings yet
Complete Bulk RNA Sequencing Presentation
10 pages
Bray, 2016
No ratings yet
Bray, 2016
5 pages
Module 7 8 Lecture Slides
No ratings yet
Module 7 8 Lecture Slides
59 pages
Bio Info Merged
No ratings yet
Bio Info Merged
154 pages
The RNA World 11th Lect High-Throughput Methods GH AY16 2017
No ratings yet
The RNA World 11th Lect High-Throughput Methods GH AY16 2017
59 pages
Gene Expression RNA Sequence
No ratings yet
Gene Expression RNA Sequence
120 pages
Transcriptome Software Paper
No ratings yet
Transcriptome Software Paper
7 pages
ExSeq Presentation With Background
No ratings yet
ExSeq Presentation With Background
40 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
RNA Seq Tutorial
0% (1)
RNA Seq Tutorial
139 pages
Intro To RNA-seq Concepts
No ratings yet
Intro To RNA-seq Concepts
85 pages
BGi RNA-Seq Analysis
No ratings yet
BGi RNA-Seq Analysis
19 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Artigo Bioinformática
No ratings yet
Artigo Bioinformática
19 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
No ratings yet
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
120 pages
Biological Sequence Determination: Protein
No ratings yet
Biological Sequence Determination: Protein
68 pages
BN335 L6 Transcriptomics JH
No ratings yet
BN335 L6 Transcriptomics JH
9 pages
Day1 Laros RNASeq Galaxy 2012
No ratings yet
Day1 Laros RNASeq Galaxy 2012
40 pages
Lecture4 Expression - Analysis 2019
No ratings yet
Lecture4 Expression - Analysis 2019
79 pages
3 RNAseq Background
No ratings yet
3 RNAseq Background
42 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
44 pages
A Guide To Basic RNA Sequencing Data
No ratings yet
A Guide To Basic RNA Sequencing Data
30 pages
Chapter On Transcriptomics
No ratings yet
Chapter On Transcriptomics
13 pages
Week13
No ratings yet
Week13
43 pages
Summary Bioinformation Technology
No ratings yet
Summary Bioinformation Technology
15 pages
Biological Sequence Determination: Protein
No ratings yet
Biological Sequence Determination: Protein
68 pages
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
No ratings yet
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
56 pages
RNA Seq Data Analysis
No ratings yet
RNA Seq Data Analysis
90 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Module 3 5mark.
No ratings yet
Module 3 5mark.
23 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
2019 Evomics Reference Free
No ratings yet
2019 Evomics Reference Free
118 pages
High Throughput Sequencing
No ratings yet
High Throughput Sequencing
5 pages
Lecture 01 - Genome Sequencing
No ratings yet
Lecture 01 - Genome Sequencing
48 pages
Survey RNA-Seq Data Analysis (2016)
No ratings yet
Survey RNA-Seq Data Analysis (2016)
19 pages
Bioinformatics Assingment - New Kandy - Draft
100% (1)
Bioinformatics Assingment - New Kandy - Draft
14 pages
BIO101 Module 3
No ratings yet
BIO101 Module 3
15 pages
Lecture1 Genome - Sequencing 2019
No ratings yet
Lecture1 Genome - Sequencing 2019
41 pages
Measuring Transcriptomes With RNA-Seq
No ratings yet
Measuring Transcriptomes With RNA-Seq
48 pages
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
No ratings yet
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
6 pages
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
No ratings yet
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
7 pages
Unit 2 BI
No ratings yet
Unit 2 BI
10 pages
Bioinformatics
No ratings yet
Bioinformatics
11 pages
Lecture2-High Throughput Sequencing-2019
No ratings yet
Lecture2-High Throughput Sequencing-2019
58 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
BHU Biotech
No ratings yet
BHU Biotech
38 pages
Bio Tools Booklet
No ratings yet
Bio Tools Booklet
5 pages
Bio Intro
No ratings yet
Bio Intro
32 pages
Margue Rat 2010
No ratings yet
Margue Rat 2010
11 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
TectoRNA: Harnessing Molecular Precision for Advanced DNA Nanotechnology
From Everand
TectoRNA: Harnessing Molecular Precision for Advanced DNA Nanotechnology
Fouad Sabry
No ratings yet
RNA Origami: Engineering Self Assembling Structures at the Nanoscale
From Everand
RNA Origami: Engineering Self Assembling Structures at the Nanoscale
Fouad Sabry
No ratings yet
Big Data y Medicina Personalizada Master
No ratings yet
Big Data y Medicina Personalizada Master
45 pages
The Super Elongation Complex Drives Transcriptional Addiction in MYCN-amplified Neuroblastoma
No ratings yet
The Super Elongation Complex Drives Transcriptional Addiction in MYCN-amplified Neuroblastoma
15 pages
Clase Máster-Reconstitución I-2023
No ratings yet
Clase Máster-Reconstitución I-2023
25 pages
Bloqye 2. Señalizacion Celular
No ratings yet
Bloqye 2. Señalizacion Celular
67 pages
Castaneda Notes
No ratings yet
Castaneda Notes
10 pages
Laboratory Quality Control
50% (2)
Laboratory Quality Control
19 pages
Gender: Project All Numerates Pre-Test Results
100% (1)
Gender: Project All Numerates Pre-Test Results
6 pages
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
No ratings yet
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
15 pages
Critical Elements For A Successful Energy Transition - A Systematic Review
No ratings yet
Critical Elements For A Successful Energy Transition - A Systematic Review
21 pages
Recruitment Selection Training
No ratings yet
Recruitment Selection Training
29 pages
Asset Holiday Home Work 2
No ratings yet
Asset Holiday Home Work 2
13 pages
Feasib1 5
No ratings yet
Feasib1 5
87 pages
DRAGO COSIC-prezentacija HIDROGEN
No ratings yet
DRAGO COSIC-prezentacija HIDROGEN
12 pages
Interfacing of LED 8051
No ratings yet
Interfacing of LED 8051
16 pages
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
No ratings yet
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
2 pages
PP Math6 QTR2W7 Day 1
No ratings yet
PP Math6 QTR2W7 Day 1
14 pages
09 Elms Review
No ratings yet
09 Elms Review
1 page
Sartorius PR5510 X4
No ratings yet
Sartorius PR5510 X4
4 pages
MAD111 - Chap 1
No ratings yet
MAD111 - Chap 1
237 pages
A Conversation With William Rathje-Anthropology Today
No ratings yet
A Conversation With William Rathje-Anthropology Today
7 pages
OD328816327605052100
No ratings yet
OD328816327605052100
1 page
Darrel Todd Woodruff 261 WEST 600 NORTH #1, Logan, UT 84321 435-232-4326 Email Website
No ratings yet
Darrel Todd Woodruff 261 WEST 600 NORTH #1, Logan, UT 84321 435-232-4326 Email Website
2 pages
Configuration E3D V5 Folder :: Bltouch Hotend (Stock) : /01 - Mk4 - Hex - Nuts/02 - Bltouch
No ratings yet
Configuration E3D V5 Folder :: Bltouch Hotend (Stock) : /01 - Mk4 - Hex - Nuts/02 - Bltouch
5 pages
SLG - Sequence of Operation
No ratings yet
SLG - Sequence of Operation
1 page
Alter Table: Table - Name ADD Column - Name Datatype
No ratings yet
Alter Table: Table - Name ADD Column - Name Datatype
5 pages
Acre
No ratings yet
Acre
6 pages
RCD Tester Rev.1 Sop
67% (3)
RCD Tester Rev.1 Sop
2 pages
Heizer 17
No ratings yet
Heizer 17
33 pages
Passband Digital Transmission
No ratings yet
Passband Digital Transmission
99 pages
EMR System UI Design
No ratings yet
EMR System UI Design
3 pages
A Journey of Self-Actualization of Amir in The Kite Runner
No ratings yet
A Journey of Self-Actualization of Amir in The Kite Runner
4 pages
Mbafm MMPC 020
No ratings yet
Mbafm MMPC 020
28 pages
Blood of The Fold Terry Goodkind Instant Download
100% (1)
Blood of The Fold Terry Goodkind Instant Download
35 pages
Sap Abap On Hana-3
No ratings yet
Sap Abap On Hana-3
51 pages

2023-GenomicaFuncional y Biocomputacion-Day1

Uploaded by

2023-GenomicaFuncional y Biocomputacion-Day1

Uploaded by

DAY 1

Day 1 : Introduction to transcriptomics. RNAseq. Practicals (Galaxy

Day 2: Discussion of RNAseq practicals (Galaxy). A real case

Day 3: Introduction to proteomics and proteogenomics. Practicals

Day 4: Introductions to networks and RNA interactions networks.

-Add your favourite “epi” stuff

Roots are in the sequences

“Sequences, the simple order of individual units in biological

• Created the Protein Atlas (a 8 women/1 man team)

• Realized the potential of computer

“The mother and father of Bioinformatics” (D. Lipman)

Total RNA All organisms

Coding RNA Functional RNA

mRNA rRNA tRNA

Comparison of gene expression under different conditions (before/after

Transcriptome assembly – discoveries of novel genes, non-coding RNAs,

An alternative to genome sequencing and assembly of a species with

Discovery: Differential expression:

That is from data poor to data intensive

Huge dynamic range of mRNA abundance

log-transformed data for RNA-seq and

Dynamic range of RNA-seq dependent on

Zhao et al. (2014), PLOS One

Alignment & quantification

Ideally, samples with high RIN (8) are used in

modified from Malone JH, Oliver B (2011) BMC Biol.

Cheaper Improves mapping of repeats,

Hor High and medium: more replicates

A protein nanopore is set in an electrically

Long Reads=many thousands of bases

More depth needed in

Enriched mRNA gives

Duplicates (PCR or optical)

• Useful to diagnose problem with the library preparation

Mouse DNA-seq exp with %10 of

What is the Proportion of reads coming from different references?

To check if NGS come from the intended genome

Duplication indicates a problem with PCR amplification

• Reads mapped directly to a

• Require large amount of

• For discovery of isoforms:

• Tools: STAR, HiSAT2, TopHat2

• Does not identify the

• Tools: Salmon, Kallisto From Bray et al. Near-optimal probabilistic

(a) An example of a read (in black) and three

(b) An index is constructed by creating the

(c) Conceptually, the k-mers of a read are hashed

(d) Skipping (black dashed lines) uses the

(e) The k-compatibility class of the read is

For read 1, the intersection of all the k-

Reads can be higher because

• There are more transcripts

Highly expressed genes or highly DE

Tech higher expression in kidney

higher expression in kidney

The DE tool is the one making the difference

If we want to validate most! (high precision pipeline)

If we want as much as genes as

The multiple testing issue

• Testing for DE expression at EACH gene is ONE

• Testing across THOUSANDS of genes requires

• Bonferroni vs False Discovery Rate.

• FDR aims to keep the total population false positive

FDR-corrected value (qval)

Time estimation: 8 hours

You might also like