BLAST and Sequence Alignment
BLAST and Sequence Alignment
RDNA202
Cassie
[email protected]
Biotechnology and Genomics
• P for protein
• So Protein - Protein
tBlastn
• t for translated
• n for nucleotide
• n for nucleotide
• Different Parameters:
1. General Parameters:
• E-value
• Word size
2. Scoring Parameters
3. Filter and Masking
BLAST algorithm
1. General Parameters:
• E-value
• Gives indication of statistical significance of a given pairwise alignment
• The lower the E-value – the more significant the hit
• If a sequence alignment has an E-value of 0.05 – means that it has a
similarity of 5 in 100 (1 in 20)
• E-value greater than 1 – indicates that the alignment likely occurred by chance
• Word size
• The length of the seed that initiates an alignment
BLAST algorithm
2. Scoring Parameters:
• Mask query while producing seeds used to scan database – but not for
extensions
BLAST Results
• The top most hit = the best match to the query sequence
BLAST Results
• The top most hit = the best match to the query sequence
Why is Blast popular?
1. The flexibility of the search algorithm
2. Reliable statistical reports
3. Continual software development
4. The speed attained by the heuristic search methods
Sequence Alignment
Alignment algorithms
• Sequence alignment – most essential step in comparing biological sequences
• Identifies regions of similarity between sequences
• Two commonly used sequence alignment algorithms:
1. Global alignment
• Compares two sequences – by aligning the entire length of the sequences
• Used when sequences are the same length
2. Local alignment
◦ Does not align the entire sequence lengths
◦ Aligns regions with the highest density of matches
◦ Useful in identifying short conserved regions in nucleotide or protein sequences
Sequence Alignment
• Process of comparing two (pairwise alignment) or more (multiple sequence alignment)
DNA or protein sequences
• Used to:
• Identify conserved sequence regions
• Construct phylogenetic trees
• Simply – We must compare the same nucleotide sequence in all organisms in our
comparison
Orthologs
• Orthologs
• Genes related by vertical decent from a common ancestor
• Genes that have evolved within the same species by gene duplication
events
• Code for proteins with similar – but not necessarily identical – functions
Orthologs vs Paralogs
Feature Orthologs Paralogs
• Similarity
• Any 2 sequences can be compared and similarity calculated (% nt or aa identity
BUT
• This is meaningless unless they are homologous
Alignments – Positional
Homology
AATGATCCGATT How do you compare
ATGATCCGATT these sequences?
AATGATTCTTCT Which are most
ATTGATTCGATTCTA similar?
Align them
• Analyse
• The quality of the analysis depends on the quality
of the alignment AATGATCCGATT
AATGATCCGAGT
AATGATTC - - GTCAT
ATTGATTCGAGTCTA
Importance of Sequence
Alignments
• BLAST finds matches