BIF401 Midterm Short Notes
BIF401 Midterm Short Notes
BACKGROUND
EXPERIMENTS IN BIOLOGY
These instruments include:
1. Next Generation Sequencers (NGS) for whole genome sequencing
2. High Resolution Mass Spectrometry for whole proteome profiling
3. Nuclear Magnetic Resonance Spectroscopy for structural studies
DIGITALIZATION OF BIOLOGY
Produced data is stored on Computer disks. The data may include text, numbers, symbols or
images.
MOTIVATION
It is an interdisciplinary field as it covers the information of biological digital information
including human, plants, animals and microorganisms.
It demands a very low cost infrastructure and hardly any lab equipment.
SCOPE OF BIOINFORMATICS
Bioinformatics primarily deals with digitalized biological information
ACTIVITIES IN BIOTECHNOLOGY
Developing algorithms, writing software, Statistical evaluation of data etc
Informatics and Bio-science are the umbrella terms given to a set of allied disciplines which
make up the field.
Quote
Biology easily has 500 years of exciting problems to work on.
Donald Knuth, Professor, Stanford University
“tree of life”
which consist of:
➢ Bacteria
➢ Archaea
➢ Eucarya
APPLICATIONS OF BIOINFORMATICS
GENOMICS
Bioinformatics can help in assembling
DNA sequencing data.
Gene Finding
Genome Assembly
Variation in Genomes
Transcription Data
Databases
EVOLUTIONARY STUDIES
Evolutionary relationships
Evolutionary distances
Phylogenetics
Tree of life
PROTEOMICS
APPLICATIONS OF BIOINFORMATICS
• Genomics
• Transcriptomics
• Proteomics
• Metabolomics
• Structural Proteomics
Prepared by Waqas Ejaz ( Youtube Channel : VU SUPREME TALEEM)
• Drug Design
• Systems Biology
• Personalized Medicine
FRONTIER IN GENOMICS
To sequence the whole genome with the bioinformatics tool of Next generation sequencing
(NGS)
FRONTIER IN TRANSCRIPTOMICS
to identify those matters which are unknown yet or under discussion
Identification of hitherto(until now) unknown types as well as roles of RNA
FRONTIER IN PROTEOMICS
• Identification of low abundance proteins in patient tissue samples
• Large scale protein identification
• Identification of known as well as novel post-translational modifications
Central Dogma
Transfer of this information from DNA to these molecules is termed as “Central Dogma”
DNA sends the information to cell via mRNA and that sequence the amino acids according to
coded information and protein structure is formed and that protein forms a cell.
DNA
DNA molecule is double helix structure contains base pairs composed of nucleotides and these
nucleotides are composed of sugar phosphate group and are bind with each other with
hydrogen bonds.
RNA
•RNA is constituted by four bases
• These are Adenine (A), Cytosine (C), Uracil (U), and Guanine (G)
• RNA molecule may contain thousands of A, C, U and G.
• A pairs with U only and C pairs with G
Normally all the nucleotides are same in both DNA and RNA except one position in RNA which
is U (Uracil) and in DNA it is T (Thiamin)
DNA molecule although is double stranded and RNA is single stranded but there is difference
in sugar composition.
RNA has Ribose sugar and DNA has de-oxyribose sugar:
Adenine and Guanine collectively called Purines while Cytosine, Uracil, and Thymine are called
as Pyrimidine.
TRANSCRIPTION
Transcription is the process by which the information in a strand of DNA is copied into a new
molecule of messenger RNA (mRNA).
NUCLEOTIDES
DNA and RNA molecule than these are composed of four other molecules which are named as
Nucleotides
AMINO ACIDS
In all, 20 different amino acids can be coded by such codons during translation. The amino acids
polymerize to form proteins
If we observe the structure of amino acid it contains nitrogen, hydrogen, oxygen and two
carbon atoms and a variable group R.
When polymerizations takes place water is formed
Polymerization of Amino Acids by Peptide Bond
Amino acids are joined with each other with peptide bonds and fold with each other in 3D
form they make protein structure.
SOLUTIONS DATABASES
Data bases for DNA & RNA the public database is GenBank (by NIH).
For proteins the public database is UniProt (by Uniprot Consortium)
Both GenBank and UniProt are online database and the DNA, RNA and Protein
sequences are available here online for public and researchers.
USING GENBANK
GenBank can be searched by:
• Sequence
• ID
• Name
• Species etc.
Other information includes:
• Locus
• Accession number
• Authors
• Journal etc
USING UNIPROT
UniProt is public database which is being used to search the sequence of proteins
In home page there is a box named “Swiss Prot” which contains human curated protein
information
Global alignment - maximizes the number of matches between the query and source
sequences along the entire length of both the sequences.
Local alignment - gives the highest scoring local match between both query and
sequences.
Optimal alignment - one that exhibits the most correspondences between the query and the
source sequences. It is the alignment with the highest score.
Pairwise Sequence Alignment is used to identify regions of similarity that may indicate
functional, structural and/or evolutionary relationships between two biological sequences
(protein or nucleic acid)
DOT PLOTS
• Dot plots employ dot matrix representation for pairwise alignment and comparison
• Sequences are written on top & left side of a dot matrix grid
• Dot plots employ dot matrix with two sequences plotted on top and left of the
matrix
IDENTY VS SIMILARITY
There are two concepts for sequence analysis
1. Identity
2. Similarity
Sequence Identity
Identity means the counting number of nucleotides or amino acids which exactly match when two
biological sequences are matched.
Formula for Identity:
Identity = No. of Matches / smaller length × 100
Similarity means the comparison between two different sequences calculated by alignment
approach.
Sequences vary due to insertions, deletions and substitutions
1. Global Alignment
In Global alignment we compare both sequence from end to end completely.
Global alignment creates an end-to-end matching between two sequences
2. Local Alignment
In Local alignment we compare one whole sequence with the one portion of other.
Local alignment focuses on the highly matching sub regions within the sequences
Removal and addition of amino acids in proteins and nucleotides in DNA, RNA by using Gaps
named as Indels
CONCLUSION
In identity alignment we use Gaps and in mutation we use substitution penalties and penalties
depend upon the substitution.
To find matching in nucleotides and amino acids of two sequences we use dot plot method.
But dot plot cannot capture the insertions, deletions and gaps in the sequences.
Modification in Dot Plot
We represent the matching nucleotides with +1 while gaps, substitutions, insertions and
mutations can be represented as -1 in dot plot
Need of algorithm
Compression of two sequences one by one it need time and is computationally expensive
method. That’s why we need algorithm
If we compare two sequences of length “n” than it would be “n2 “, its order is O (n2)
Conclusion
All the alignments are done in diagonal way in dot plot matrix. For total score we make calculations in
diagonal way and after calculation best one is selected
BACKTRACKING ALIGNMENTS
To find an optimal alignment in Needleman Wunsch Algorithm we use traceback method
After completely matrix calculations we apply traceback to find the optimal alignment and
traceback starts from bottom right (maximum score) to top side
Dot plot help us in finding matching residues of two sequences while Needleman wunsch helps
us to find the global alignments.
If some sequences have different regions of nucleotides which does not match to any other for
that alignment we prefer Global alignment not local, but that does not penalize leading or
trailing end
MOVING FROM GLOBAL TO LOCAL ALIGNMENT
DNA has coding and noncoding regions.
Coding regions are called “EXON” expressed as protein and they remain more conserved due to their
role in making functional proteins.
And noncoding regions of DNA are called as “INTRONS” which are more likely involved in
mutations than coding ones.
It means high degree of alignment can be find among two exons.
REPEATED ALIGNMENTS
By making some change in strategy of traceback we can find the repeated sequences.
Two modifications
• Use a score threshold ‘T’ above which the matches will be considered.
• This will help avoid low scoring local alignments
• Traceback should find multiple aligned regions by multiple traceback steps
SCORING MATRICES
How to build scoring matrices?
• We analyze the observed frequency with which each amino acid / nucleotide is
substituted by an other one in similar proteins/genes
PAM MATRICES
PAM means “Point Accepted Mutation”
Point accepted mutations means the substitution of one amino acid in a sequence with another that
protein function remains conserved.
PAM UNIT
PAM unit is actually that time during which 1% amino acid undergo for acceptable mutation.
Important: If two sequences diverge by 100 PAM units, it does not imply that
they are totally different in each position
Then, PAM1=pii=
PAM ‘n’= (PAM1)n
BLOSUM MATRICES
BLOSUM matrices can be used to align the protein sequences. BLOSUM matrices was first purposed in
1992 by Henikoff et al.
Prepared by Waqas Ejaz ( Youtube Channel : VU SUPREME TALEEM)
BLOSUM matrices is also called the Block substitution matrix without any gap although it has
mismatches in sequences.
Typically used matrices: BLOSUM62 or PAM120 in PAMx, larger x detects more divergent
sequences.
BLOSUM62 and PAM120 are considered optimal for use in protein sequence
alignment
For BLOSUMn, a lower n detects more divergent sequences
MSA vs Pairwise
For pairwise alignments, we used Dynamic Programming
For MSA, Dynamic Programming may get very expensive, so we use Progressive Alignment
• Pairwise alignment was either local or global
• Multiple alignments are mostly global
APPLICATION OF MSA
Predict secondary and tertiary structures of new protein sequences
Evaluate evolutionary order of species or “Phylogeny
METHODOLOGY
Pairwise alignment is the alignment of two sequences
MSA can be performed by repeated application of pairwise alignment
MSA can help align multiple sequences. Progressive alignment can help perform MSA. Need to
remove sequences with >80% similarity.
PROGRESSIVE ALIGNMENT FOR MSA
• Progressive alignments are used in aligning multiple sequences
• Iterative approaches can help refine results from progressive alignments
SHORTCOMING OF THIS APPROACH
➢ Dependence upon initial alignments
➢ If sequences are dissimilar, errors in alignment are propagated
➢ Solution: Begin by using an initial alignment, and refine it repeatedly
CLUSTALW
Developed by European Molecular Biology Laboratory & European Bioinformatics Institute
Performs alignment in:
• slow/accurate
• fast/approximate
SCOPE
create multiple alignments,
optimize existing alignments,
profile analysis &
create phylogenetic trees
Using CLUSTALW
• CLUSTALW can use multiple file formats including: EMBL/SwissProt, Pearson (Fasta) etc
INTRODUCTION TO BLAST-I
National Center for the Biotechnology Information (NCBI) – USA
BLAST developed in 1990
“Basic Local Alignment Search Tool”
Searches databases for query protein and nucleotide sequences
Also searches translational products etc.
Online availability www.blast.ncbi.nlm.nih.gov/Blast.cgi
INTRODUCTION TO BLAST-II
Smith Waterman can align complete sequences. BLAST does it in an approximate way. Hence,
BLAST is faster BUT does not ensure optimal alignment.
OUTPUT OF BLAST
Results are shown in HTML, plain text, and XML formats
BLAST ALGORITHM
BLAST can search sequence databases and identify unknown sequences by comparing them to
the known sequences. This can help identify the parent organism, function and evolutionary
history. BLAST performs quick alignments on sequences.
SUMMARY OF BLAST
• BLAST performs quick alignments on biological sequences
Step1: obtain a query of sequence
For known sequences: Use NCBI, UCSC etc..
For unknown sequences: Use NGS or Mass Spectrometry
INTRODUCTION TO FASTA
For comparing two sequences we use pair wise sequencing and for the comparison of many
sequences we use multiple sequence alignment. To handle the multiple alignments we
perform alignment through smith-waterman algorithm for local one. And for global alignment
we use Needleman-wunsch algorithm.
Both local and global alignments are the dynamic approaches.
Many of the sequences are compared, which takes time and we use BLAST which is an
approximate local alignment search tool. BLAST compares a large number of sequences,
quickly.
FASTA took a similar approach, Developed in 1988 it does Fast Alignment .Searches databases
for query protein and nucleotide sequences. Was later improved upon in BLAST.
Online: http://www.ebi.ac.uk/Tools/sss/fasta/
Classical global and local alignment algorithms are time consuming. FASTA achieves alignment
by using short lengths of exact matches.
Prepared by Waqas Ejaz ( Youtube Channel : VU SUPREME TALEEM)
USES OF FASTA
FASTA relies on aligning subsequences of absolute identity. Input to FASTA search can be in
FASTA, EMBL, GenBank, PIR, NBRF, PHYLIP or UniProt formats
OUTPUT OF BLAST
Results are output in visual format along with functional prediction. Makes table lists the
sequence hits found along with scores.
FASTA ALGORITHM
STEP1: Local regions of identity are found
STEP4: Create a gapped alignment in a narrow segment and then perform Smith Watermann
alignment
SUMMARY OF FASTA
FASTA can briskly perform sequence search databases if given a query sequence.
Multiple types of FASTA exist which assist in aligning DNA/RNA/Protein sequences
http://fasta.bioch.virginia.edu/fasta_docs/fasta35.shtml
EXPASY
Expasy provides access to a variety of online databases and tools
It is developed by Swiss Bioinformatics Institute (SIB). Website provides access to databases
and tools. Proteomics, genomics, phylogeny, systems biology, population genetics,
transcriptomics etc. can be searched.
http://www.expasy.org/
Protein sequences from various species and organisms can be found in uniprot.
SwissProt is the manually annotated version of the UniProt Database.
We use next generation sequencing and whole genome sequencing to obtain the genetic
information.
For protein sequencing we use Mass Spectrometry and Edman Degradation.
STORAGE:
Sequence information is stored digitally
Databases are designed to store sequence data
SHARING AND ACCESS:
Sequence databases are shared via online websites
Access to several such websites is free
USAGE OF DATA:
Sequence data can be used to obtain:
Similarity of sequences
GENBANK
• All known nucleotide sequences can be obtained from GenBank
• Thousands of genomes are available online on GenBank, for free and are searchable
• GenBank: Public database of nucleotide sequences for over 200,000 organisms
• Growing exponentially
Data in GenBank
• Entries can be a contiguous stretch of DNA or RNA sequences and their annotations
• Entries are updated every two months
• Developed by Swiss Bioinformatics Institute (SIB)
ENSEMBLE
http://www.ncbi.nlm.nih.gov/genbank/
ESNEMBLE is genome search engine which is used to search the genome of every recorded
species.
http://asia.ensembl.org/index.html
Evolution of Sequences
DNA acts as cellular memory unit and protein are the translated product of DNA coded
information. And evaluation is very important to survive in different type of environments.
Method of Change
DNA gets modified by:
Mutation & Substitution
Insertion
Deletion
The evolutionary events and their combination impart relationships between sequences. These
relationships are explored in Phylogenetics .Several algorithms exist for finding such relationships
In above figure the point A stands for ancestor and with the passage of time the evolution occurred with
and the genome sequence of organisms changed.
Root node is the ancestor of all other nodes. The direction of evolution is from ancestor to the
terminal nodes.
Conclusion
Phylogenetics specifies evolutionary relationship with the help of trees. Trees can be rooted or
unrooted. Rooted trees can show temporal evolutionary direction.
UPGMA
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a simple agglomerative
(bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and
Michener.
In this method two sequences with the shortest evolutionary distance between them are
considered and these sequences will be the last to diverge, and represented by the most
recent internal node.
Introduction to UPGMA
UPGMA: Unweighted Pair – Group Method using arithmetic Averages
Calculating distance between two clusters:
A – D becomes a new cluster lets say V. We have to modify the distance matrix. What are the
distances between:
TYPES OF RNA
There are many types of RNA according to their funtions like:
Messenger RNA (mRNA)
Transfer RNA (tRNA)
Ribosomal RNA (rRNA)
Micro RNAs (miRNA)
Small Interfering RNA (siRNA)
MESSENGER RNA
1. Only 5-10% of this RNA type is present in cell.
2. Messenger RNA 5’ end is capped with (7-Methyl Guanosine Triphosphate) which helps the
Ribosomes to identify the mRNA.
3. And 3’ end of the mRNA is poly A tail (around 30-200 adenylate residues) which help shield
against 3’ exonucleases)
We can calculate the overall energy of RNA structures by summing up energies given out
during the process of folding
Calculating energies of structures
• The stabilizing energy associated with stacking base pairs in a double-stranded region (-ve)
• The destabilizing influence of unpaired regions (+ve)
5 nucleotides formed H-Bonds. This bond formation released energy (-12.0 kcal/mol)
RNA molecule took up a 2’ structure. Hence became more stable.
The preferred structure of RNA is 2’ which has many structural patterns like Helices, Loops,
Bulges and Junctions.
The first 2’ RNA structure is called helix. Unlike the DNA helix, the RNA helix is formed when the RNA
folds onto itself.
The loop of the hairpin must at least four bases long to avoid steric hindrance with base-pairing
in the stem part of the structure.
Note: That hairpin reverses the chemical direction of the RNA molecule
Bulges, are formed when a double-stranded region cannot form base pairs perfectly. Bulges can
be asymmetric with varying number of base pairs on one side of the loop. Bulge loops are
commonly found in helical segments of cellular RNAs and used to measure the helical twist of
RNA in solution. (Tang and Draper 1990)
Interior loops are formed by an asymmetric number of unpaired bases on each side of the
loop.
RNA can be fold to form helices, bulge loops, and interior loops
Junctions include two or more double-stranded regions converging to form a closed structure.
The unpaired bases appear as a bulge
These unpaired nucleotides of 2’ structure interact with other unpaired nucleotides and form a third
structure called tertiary 3’ structure. For example 4 nucleotides in hairpin loop structure does.
The unpaired bases in 3’ structure remain paired by abnormal folding called (pseudoknots) but instead
of pairing they remain available or pairing.
Another method to measure the RNA structure is called as Atomic Force microscopy in this
technique a laser connected to a Si3N4 piezoelectric probe scans an RNA sample. It works well in air and
liquid environment.
The third method for measuring the RNA structure is Nuclear Magnetic Resonance Imaging in this
method Hydrogen atoms in RNA resonate upon placement in a high magnetic field. It Works well
without crystallizing RNA
Measuring RNA Structures:
Prepared by Waqas Ejaz ( Youtube Channel : VU SUPREME TALEEM)
Nuclear Magnetic Resonance Imaging
• Principle: Hydrogen atoms in RNA resonate upon placement in a high magnetic
field
• Works well without crystallizing RNA
STORAGE OF STRUCTURES
Reported structures are stored in online databases. Example includes RNA Bricks and RMDB
etc.
RNA Bricks is a database of RNA 3D structure motifs and their contacts, both with themselves
and with proteins Stanford University’s RNA Mapping Database is an archive that contains
results of diverse structural mapping experiments performed on ribonucleic acids.
In longest RNA nucleotides the gaps between complementary nucleotides becomes bulges and loops of
the structure.
a. Zuker’s Algorithm
• Compute the stabilizing energies (-ve values)
• Compute the destabilizing energies (+ve values)
• Compute sum of +ve and –ve energies
Energy based methods involve evaluating the free energy structures. To compute the RNA
sequence for 1’ or 2’ optimal structure prediction we use Zuker’s Algorithm.
It Compute energies of all possible 2’ structures. Generate combinations of all computed 2
structures. Select the one with lowest energy.
a. Nussinov An Overview
Nussinov-Jacobson (NJ) Algorithm is a Dynamic Programming (DP) strategy to predict optimal
RNA 2’ structures, Proposed in 1980. Computes 2’ structures with most nucleotide coupling.
HOW IT WORKS
➢ Create a matrix with RNA sequences on top and right
➢ Set diagonal & lower tri-diagonal to zero
➢ Start filling each empty position in matrix by choosing the maximum of 4 scores
Nussinov EXAMPLE
The main points to be focused in N-J Algorithm are:
➢ Scoring Matrix
➢ Matrix Initialization
WEB RESOURCES-I
The mfold web server is one of the oldest web servers in computational molecular biology.
Mfold is upgraded version of Zuker’s algorithm.
MFOLD is computationally expensive and can give results for 1’ and 2’ structures that have
sequences less than 8000 nucleotides.
Conclusion:
• Given 1’ RNA structures, RNA 2’ structures can be predicted
• Several algorithms exist for predicting 2’ structures
• These include Zuker, Martinez and NJ etc.
Online tools
In molecular genetics, an open reading frame( ORF ) is the part of a reading frame that has the
potential to code for a protein or peptide. An ORF is a continuous stretch of codons that do not contain
a stop codon (usually UAA, UAG or UGA).
Six ORF exist in any DNA sequence and longest one is marked and first stop codon will mark the end
of the protein.
Both reverse and forward RNA sequences are considered which may have many ORF and selection
is based upon longest protein sequences having.
SEQUENCING PROTEINS
Edman degradation, developed by Pehr Edman, is a method of sequencing amino acids in a
peptide.
DRAWBACKS
It is restricted to chain of 60 residues.
It is very time consuming process 40-50 amino acids per day.
Where:
F is the force applied to the ion
m is the mass of the particle
a is the acceleration
Q is the electric charge,
E is the electric field
v × B is the cross product of the ion's velocity and the magnetic flux density.
Components of MS
Sample Injection
Ionization Source
Mass Analyzer
Ion Detector
Spectra search using computational tools
CONCLUSION
Charged proteins can be set into motion within a magnetic field. Their deflections accurately
correspond to their molecular mass. Deflections can be measured (hence protein’s mass)
Bottom up proteomics measures the peptide masses produced after protein enzymatic digestion.
Top down proteomics measures the intact proteins followed by peptides after fragmentation
1. BOTTOM UP PROTEOMICS
In this methodology the protein complex is treated with site specific enzymes which cleave (chop)
them into amino acid residue and resultant peptides are measured for their masses. One peptide is
selected at one time for processing and when all are processed than protein search engine is used
for matches.
2. TOP DOWN PROTEOMICS
In this methodology proteins are ionized and measured for their masses and one protein is mass
selected at a time for fragmentation. And resultant peptide fragments are measured for mass.
We can say that bottom up proteomics deals with peptides while top down proteomics can handle
the whole protein.
PROTOCOL
1. Sample containing the mixture of protein from cells and tissues is obtained.
2. Enzymes such as trypsin is use to cleave the proteins.
3. Enzyme cleaves the amino acids at specific sites of amino acid.
4. Several peptides are formed when protein is cleaved.
5. Number of peptide depends upon the number of sites where enzymes cleaved the protein. For
example trypsin cleaves the protein at lysine (k)
6. Mass of each peptide is measured. Each peptide is measured for its mass (MS1)
7. One peptide is selected at a time.
8. Different enzyme is use to cleave the protein at different site.
10. Each resulting peptide is measured for its mass (MS3)
11. Each peptide is searched in protein sequence databases for matches
9. This process keep going until the possible number of peptides are formed or searched.
10. Peptides are searched in data base and matched.
Conclusion
• Bottom up proteomics measures the mass of peptides
• Peptides result after enzymatic digestion of precursor proteins
• Peptides are searched in protein databases
Shotgun Proteomics digest the whole protein and mix first and compared with database.
Peptide mass finger printing involves in protein separation followed by single protein’s peptide
analysis.
For example:
MALDI
In this technique one proton is added to protein or peptide and the molecular weight is increases
by one and Mass spectrometry reports the molecule at +1.
ESI
ESI adds many protons to protein or peptides and molecular weight is increased by the number of
protons added. But it is difficult in ESI to find the molecule with +1.
MS data from MALDI ionization is easier to handle as the product ions masses are mostly at
“1+mass”. ESI is difficult to use as it does not easily give away the +1 charged ion
MS1 & INTACT PROTEIN MASS
When we ionize the protein, it can be deflected by a magnetic field in proportion to its mass and
the mass of protein can be measured by spectrometry.
Mass/charge helps us to calculate the mass of protein, “Mass Select” can help to select specific MS1 for
further analysis.
MS1 results the intact masses of the peptides.
SCORING INTACT PROTEIN MASS
Conclusions:
• Experimental mass reported from MS1 is matched with theoretical mass of proteins in the database
• Score is awarded on the basis of the closeness between experimental and theoretical mass
PROTEIN FRAGMENTATION TECHNIQUES
We compare the experimental mass with theoretical data base mass of protein and on base of
closeness we rank or score it.
If several proteins have same score than selection is done by using another technique protein
fragmentation. We fragment the protein or peptide and ionize it, it helps us to measure the
fragment masses as the same ways as their precursor.
There are different techniques for protein fragmentation.
Electron Capture Dissociation (ECD)
Electron Transfer Dissociation (ETD)
Collision Induced Dissociated (CID)
FRAGMENT MASS
Mass of fragment is produced by MS2 deepening upon the technique because each techniques
splits the protein or peptide at different location.
Experimental mass reported from MS2 is matched with theoretical peptides of candidate proteins
(from DB). Score is awarded on the basis of the closeness between experimental and theoretical
masses.