2. Protein Structure Prediction
2. Protein Structure Prediction
DRUG DESIGN
Dr. M Indira
Associate Professor
Department of Biotechnology
Vignan University
SYLLABUS
Protein structure prediction;
Introduction to comparative modeling;
Sequence alignment;
Constructing and evaluating a comparative model;
Predicting protein structures by 'threading';
Molecular docking - AUTODOCK/EASYMODELLER
and HEX;
Structure based de novo ligand design;
Drug discovery;
Chemoinformatics; QSAR.
PROTEIN STRUCTURE
PREDICTION
INTRODUCTION
• Proteins are an important class of
biological macromolecules
which are the polymers of amino
acids.
• Biochemists have distinguished
several levels of structural
organization of proteins. They
are:
– Primary structure
– Secondary structure
– Tertiary structure
– Quaternary structure
HOMOLOGY MODELLING
INTRODUCTION:
Homology modeling, also known as comparative modeling of protein
is the technique which allows to construct an unknown atomic-
resolution model of the "target" protein from:
1.Its amino acid sequence and
2.An experimental 3D structure of a related homologous protein (the
"template").
?
TE
STRUCTURAL
MODEL
SEQUENCE
ALIGNMENT
As long as the length of two sequences and the percentage of identical residues fall
in the region marked as “safe” the two sequences are practically guaranteed to adopt
a similar structure.
HISTORY
The first homology modelling studies were done using wire and plastic models of
bonds and atoms as early as the 1960’s. The models were constructed by taking the
coordinates of a known protein structure and modified by hand for those amino
acids that did not match the structure.
In 1969 David Phillips, Brown and co-workers published the first paper regarding
homology modelling. They modelled -lactalbumin based on the structure of
hen- egg white lysozyme. The sequence identity between these two proteins was
39%.
STEPS OF HOMOLOGY MODELLING
Protein
1. Template recognition and Sequence
initial alignment
2. Alignment correction Database
Sequence
3. Backbone generation alignment Searches
4. Loop modeling
5. Side chain modeling Secondary
Good
6. Model optimization Structur structure
e prediction
7. Model validation homolog
ue?
Improve
alignment
using
secondary
structure
prediction
Homology
modelling Minimisation
Three
Check dimensional
model structure
1.Template recognition and initial alignment
Template recognition & selection involves searching the PDB for
homologous proteins with determined structures.
The search can be performed using simple sequence alignment programs
such as BLAST or FASTA as the percentage identity between the Target
sequence and a possible template is high enough in the safe zone, to be
detected with these programs.
For ex: To align the sequence LTLTLTLT with YAYAYAYAY which is nearly
impossible, then only a third sequence, TYTYTYTYT, that aligns easily to
both of them can solve the issue.
4.LOOP MODELLING
After the sequence alignment, there are often regions created by
insertions and deletions that lead to gaps in alignment. These gaps are
modeled by loop modeling, which is less accurate. Currently, two main
techniques are used to approach the problem:
The database searching method - this involves finding loops from
known protein structures and superimposing them onto the two stem
regions (main chains mostly) of the target protein. Some specialized
programs like FREAD and CODA can be used.
The ab initio method - this generates many random loops and searches
for one that has reasonably low energy and φ and ψ angles in the
allowable regions in the Ramachandran plot.
The red loop is modeled with the green
residues as anchor residues. The insertion
of 2 residues results in a longer loop.
5.Side-Chain Modeling
This is important in evaluating protein–ligand interactions at active sites
and protein–protein interactions at the contact interface.
A side chain can be built by searching every possible conformation for
every torsion angle of the side chain to select the one that has the lowest
interaction energy with neighboring atoms.
A rotamer library can also be used, which has all the favorable side chain
torsion angles extracted from known protein crystal structures.
6: Model Optimization
energy minimization procedure on the entire model, by adjusting the
relative position of the atoms so that the overall conformation of the
molecule has the lowest possible energy potential. The goal is to
relieve steric collisions without altering the overall structure.
Optimization can also be done by Molecular Dynamic Simulation which
moves the atoms toward a global minimum by applying various
stimulation conditions (heating, cooling, considering water molecules)
thus having a better chance at finding the true structure.
Energy = Stretching Energy +Bending Energy +Torsion Energy +Non-
Bonded Interaction Energy
7.Model Validation
Every homology model contains errors. Two main reasons are:
1. The percentage sequence identity between template and target. If it is
greater than 90%, the accuracy of the model can be compared to
crystallographically determined structures & if less than 30% large
error occurs
2. The number of errors in templates
The final model has to be evaluated for checking the φ–ψ angles,
chirality,
bond lengths, close contacts and also the stereo chemical
properties. Modeling Programs like Modeller, SWISS MODEL,
Schrodinger, 3D- JIGSAW.
A successful model depends on template selection, algorithm used and the
validation of the model.
Advantages
It can find the location of alpha carbons of key residues inside the
folded protein.
It can help to guide the mutagenesis experiments, or hypothesize
structure-
function relationships.
The positions of conserved regions of the protein surface can help
identify putative active sites, binding pockets and ligands.
Disadvantages
3
http://www.ebi.ac.uk/Tools/msa/clusta
lw2/
2
CPHmodel
EsyPred3D
SWISS-
MODEL
http://swissmodel.expasy.
org/
Swiss-PdbViewer
4.1
YASARA
Accelrys DS Viewer
5.0
PDB
YASA
Verification of
Model
Verify3D
ErratPlot
Ramachandran
Plot
RAMACHANDRAN PLOT
In a polypeptide the main chain (N-Calpha)
and (Calpha-C bonds) relatively are free to
rotate. These rotations are represented by the
torsion angles phi (φ) and psi(ψ ), respectively.
A Ramachandran plot (or a [φ,ψ] plot),
originally developed in 1963 by G. N.
Ramachandran, C. Ramakrishnan, and V.
Sasisekharan,is a way to visualize
backbone dihedral angles ψ against φ of
amino acid residues in protein structure.
A Ramachandran plot can be used:
One is to show in theory which values, or
conformations, of the ψ and φ angles are
possible for an amino-acid residue in a
protein.
second is to show the empirical distribution
of datapoints observed in a single structure
in usage for structure validation, or else in a
database of many structures.