2023-GenomicaFuncional y Biocomputacion-Day1
2023-GenomicaFuncional y Biocomputacion-Day1
Genómica Funcional y
Biocomputación-2023/2024
Genomics Transcriptomics
Proteomics
+ Gene Regulatory
Networks
Transcriptomics
-Omics combos
Genomics
Epigenomics
+ Genome 3D
ATACseq, etc
Proteomics
Protein-DNA regulatory Networks
-Omics combos
Transcriptomics
+
Proteogenomics
Proteomics
-Omics combos
Proteomics
+
Protein Interactions
Networks
Proteomics
Russ F. Doolittle: January 10,
1931 – October 11, 2019
Eukaryotes only
Pre-mRNA
(hnRNA) Pre-rRNA miRNA siRNA
Pre-tRNA snRNA snoRNA
The complete set of RNA transcripts produced from the genome, (under
different conditions at particular place and time).
Microarrays, RNA seq
• mRNA-seq
• Exome capture
• Targeted
• Small RNA (miRNAs,pRNAs,sncRNA)
• Total RNA
• Ribosome profiling
• Single Cell RNA-Seq
Transcriptomics- Applications
Variant calling
From Microarray to RNAseq
Microarray: RNA-seq:
Requires prior Comprehensive view
knowledge Best dynamic range
Higher throughput Isoform discovery
Analyses is more user- Can detect SNV
friendly
From Microarray to RNAseq
• The Dynamic range concept
Experimental Design
Sequencing Design
Quality control
Differential Expression
Functional profiling
Experimental Design
sRNA mRNA
AAAA RNA integrity number (RIN)
AAAA
Size select by
PolyA select
PAGE or kit
AAAA
Ligate RNA AAAA
adapter
Fragment
Convert to cDNA
Construct library
Sequence
Multiplexing
http://info.l7informatics.com/blog/ngs-101-biology-is-a-big-data-problem
Experimental Design
More sequences or more replication?
For LowE: more replicates and more 2014 Feb 1;30(3):301-4. doi:
depth 10.1093/bioinformatics/btt688.
Experimental Design
https://nanoporetech.com/how-it-works
Secuenciación en mi portátil (MinIon, Nanopore)
Experimental Design
Experimental Design
Sequencing Design
Sequencing Design
Distribution of RNA species in NGS samples after poly-A Sequencing Design
enrichment (mRNA) and rRNA depletion (whole transcriptome)
sequencing
http://www.exiqon.com/whole-transcriptome-ngs
Sequencing Design
AVOID
ConFOUNDING
VARIABLES !!!!!!
The NOISE PROBLEM
Sampling Bias
Sampling Bias
Process
Process
Index swapping:
Sequencing errors
File Quality control
FASTQC Quality control
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
MultiQC Quality control
https://multiqc.info/examples/rna-seq/multiqc_report.html#general_stats
Quality control
Quality control
https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
% of reads mapped to a Quality control
reference Normally 80-90% will map
(rest do not due to variety reasons)
Quality control
PCR duplicates Quality control
HiSAT2
FeatureCounts
Analyses Overview
Alignment and quantification
Alignment and quantification
Reference, splice-aware
• Maps to a reference
transcriptome
• Extremely fast
https://bioinfo.iric.ca/understanding-how-kallisto-works/
Building the T-DBG graph Alignment and quantification
and the kallisto index.
• All transcript sequences are decomposed into k-mers (here k=5) to construct the
colored de Bruijn graph.
• The idea is that each different transcript will lead to a different path in the graph
ACGTG ATGA
ATGA
ATGA
ACGTG ATGA
ACGTG
ACGTG ATGT
ATGT
https://bioinfo.iric.ca/understanding-how-kallisto-works/
Reads are decomposed into k-mers (k=5 Alignment and quantification
here too) and the pre-built index is used to
determine the k-compatibility classes of
each k-mer.
https://bioinfo.iric.ca/understanding-how-kallisto-works/
Summarization
To discard:
• Not unique mapping
• Positions overlap with many genes
• Poor quality alignment
• If pair end, only one reads matches gene
Summarisation/Counting
Normalisation
Normalisation/Scaling
Normalisation/RPKM
• Gene length based normalisation
• Library size, number of reads
1
Normalisation/trimmed mean of M weighted trimmed mean of
the log expression ratios (trimmed mean of M values (TMM)
“highly expressed genes and those that have a large variation of expression are
excluded”
More replicates
Which pipeline? Differential Expression
Aligner/Modeler/DEtester
(i.e.ThCuNo)
The hype…
Differential Expression
LogFC
Functional analyses
Put data in Biological Context
https://usegalaxy.eu/
TRANSCRIPTOMICS
https://training.galaxyproject.org/training-
material/topics/transcriptomics/tutorials/ref-based/tutorial.html
https://training.galaxyproject.org/training-
material/topics/transcriptomics/tutorials/ref-based/tutorial.html
https://usegalaxy.eu/
Quality control FastQC
MultiQC: to aggregate data