Tree building

Tree building-distance
and character based
method
U.DEEPALAKSHMI
I M.Sc
MICROBIOLOFY
18PY04

Phylogenetic tree:
Phylogenetic tree is other
wise known as phylogeny is a diagram
that represents the lines of evolutionary
descent of different species, organisms or
genes from a common ancestor. The tree
contain a nodes , branches , clade.

Tree building is a method to Phylogenetic
analysis there are two types
Distance
based
method
Character
based
method

Distance based method:
All possible pairs of sequences are aligned to
determine which pairs are the most similar or closely related.
These alignments provide a measure of the genetic distance
between the sequences. These distance measurements are then
used to predict the evolutionary relationship.
These methods are:
UPGMA.
NJ.
FM.
MINIMUM EVOULATION .

Character based method:
there are two types
1.Maximum parsimony(MP).
2.Maximum likelihood
method(ML).

Distance based method:
UPGMA(Unweighted pair group method
with arithmetic mean) this method is a simplest method of
tree construction it is a cluster analysis derived from the
clustering algorithms proposed by skoal and sneath (1973)
it was originally developed for constructing taxonomic
phenograms and it can also used to construct a Phylogenetic
trees.
UPGMA employs a sequential clustering algorithm
.
We first identify from among all the OTU s the 2
OTUs that are most similar to each other and then treat these
a new single OTU.
This method is least accurate but is widely used.

STEPS:
This method begins with the construction of a distance
matrix.(dij)
The two taxa that have the smallest distances are clustered
together (I and j are taxa)and form a OTU.
The branch lengths for the I and J taxa are taken to be half
of the distance between them(dij/2)
I and J is a average distance and form a new taxa k.
These all of them are clustered it forms a dik+djk/2.
The average length is taken to be the average distance
between OTU.

NEIGHBOR JOINIG METHOD(NJ):
It is a simplest distance method. It
begins by choosing the two most closely
related sequences and then adding the next
most distant sequence as third branch of tree.
This method is developed in 1987 by saitou
and nei. This method produces a unrooted
trees.

Advantage:
this method is a fast and the large datasets
for bootstrap analysis.
This method permits correction for multiple
substitutions.
It can use empirical substitutions scoring
methods.
Disadvantage:
it can perform only a single tree.
It does not consider intermediate ancestors.
In this method sequence information is
reduced.

FITCH-MARGOLIASH METHOD:
it is a common pair-wise clustering
algorithm FM method. This method
developed by FITCH-MARGOLIASH(1967)
he showed that different sets of internal
branch lengths could be obtained by
considering alternate trees which moved one
or more branches to different parts of the tree.

Advantage:
It tests more than one tree.
Fastly method.
It can use empirical substitution
scoring methods.
Disadvantage:
Requires long time compare to NJ
method.
It does not consider intermediate
ancestors.
Long evolutionary distances will be
underestimated.

MINIMUM EVOLUTION:
First aided by Kidd & sgaramella-
zonta in 1971. the minimum evolution tree is
the tree which minimizes L.
It give an unrooted metric tree for n
sequences there are (2n-3)branches, each with
length ei L is for length.
This method is similar to parsimony
method.

Advantages:
easy to perform.
Quick calculation.
Disadvantage:
The sequence are not considered so the
information will be loss.
Not applicable to distantly divergent
sequences.

CHARACTER METHOD:
Maximum parsimony method(MP)
In MP method a multiple sequence alignment is
produced in order to predict which sequence positions are
likely to correspond.. These position will appear in vertical
column in MSA.
For each aligned position, Phylogenetic trees that
requires the smallest number of evolutionary changes to
produce the observed sequence changes are identified.
Finally those trees which produce the smallest
number of changes overall for all sequences are identified.
This method attempts to reconstruct mutational events
leading to the currently observed sequences.

The point of maximum parsimony is
that although sequences could be placed at
any position on the tree, the number of steps
required to interconvert one sequence to
another changes at each time branches are
moved.
Thus the parsimonious tree is the tree
whose topology requires the fewest total
mutations.

Advantages:
Reconstruct ancestral nodes.
It can be better performance than
distance based method.
It provides numerous “most
parsimonious trees”
Disadvantage:
branch lengths can not be determined
only topology.
Slower than matrix methods.
Sensitive to order in which sequences
are added to tree.

MAXIMUM LIKELIHOOD:
ML methods depends upon the first
obtaining a reliable sequence alignment and then
examining the changes in each column in the
alignment.
The likelihood its finding the actual
sequence changes at each columns in the aligned
sequence is calculated.
The probabilities for each aligned position
are then multiplied to provide likelihood for each
tree.
The tree that provide the maximum
likelihood value is the most probable tree.

This method introduced by Edwards and cavalli-
sforza(1964)for Phylogenetic analysis.
The main feature:
substitution model is chosen for the sequence
data.
Likelihood of observing the data in
substitution model is obtained for each topology
evaluated.
Topology that gives the highest likelihood is
chosen as the best tree.

The ML to phylogeny is implemented in DNAML
and PHYLIP package and in a modified version of DNAML
called fastDNAml(optimizing the tree by ML at each step)
The ML method of inference is available for both protein and
nucleic acid.
The following programs are
1.DNAML (only DNA data in the PHYLIP package)
2.Fast DNAML (only DNA data, a faster algorithm
to applied to DNAML)
3.ProtML(both DNA and protein data)
4.Puzzle(both DNA and protein data). This programs
is much faster than PROTML

Advantages:
uses all the sequence information.
Reconstruct ancestral nodes.
Generate branch lengths.
It has been perform better than
distance methods.
Disadvantage:
it is very slow.
It needs a long time to construct a tree.

NJ MP ML
Employs distance
between pairs of
sequence.
It employs a subset of
the alignment position of
the sequences.
It employs all the data
Minimize the distances
between the closed
neighbor.
Minimizes the total
distance
It maximize the
likelihood of the given
certain values for the
parameters.
Very fast slow Very slow
A good choice to
construct on initial tree
or to choose between
many candidates trees.
A good choice for less
than 30 sequences and
when homoplasy is rare.
A good choice for small
data sets and for
validating the trees
constructed by other
methods

Tree building

More Related Content

What's hot

Similar to Tree building

More from deepalakshmi59

Recently uploaded

In this document

Tree building