Tree building-distance
and character based
method
U.DEEPALAKSHMI
I M.Sc
MICROBIOLOFY
18PY04
Phylogenetic tree:
Phylogenetic tree is other
wise known as phylogeny is a diagram
that represents the lines of evolutionary
descent of different species, organisms or
genes from a common ancestor. The tree
contain a nodes , branches , clade.
Tree building is a method to Phylogenetic
analysis there are two types
Distance
based
method
Character
based
method
Distance based method:
All possible pairs of sequences are aligned to
determine which pairs are the most similar or closely related.
These alignments provide a measure of the genetic distance
between the sequences. These distance measurements are then
used to predict the evolutionary relationship.
These methods are:
UPGMA.
NJ.
FM.
MINIMUM EVOULATION .
Character based method:
there are two types
1.Maximum parsimony(MP).
2.Maximum likelihood
method(ML).
Distance based method:
UPGMA(Unweighted pair group method
with arithmetic mean) this method is a simplest method of
tree construction it is a cluster analysis derived from the
clustering algorithms proposed by skoal and sneath (1973)
it was originally developed for constructing taxonomic
phenograms and it can also used to construct a Phylogenetic
trees.
UPGMA employs a sequential clustering algorithm
.
We first identify from among all the OTU s the 2
OTUs that are most similar to each other and then treat these
a new single OTU.
This method is least accurate but is widely used.
STEPS:
This method begins with the construction of a distance
matrix.(dij)
The two taxa that have the smallest distances are clustered
together (I and j are taxa)and form a OTU.
The branch lengths for the I and J taxa are taken to be half
of the distance between them(dij/2)
I and J is a average distance and form a new taxa k.
These all of them are clustered it forms a dik+djk/2.
The average length is taken to be the average distance
between OTU.
NEIGHBOR JOINIG METHOD(NJ):
It is a simplest distance method. It
begins by choosing the two most closely
related sequences and then adding the next
most distant sequence as third branch of tree.
This method is developed in 1987 by saitou
and nei. This method produces a unrooted
trees.
Advantage:
this method is a fast and the large datasets
for bootstrap analysis.
This method permits correction for multiple
substitutions.
It can use empirical substitutions scoring
methods.
Disadvantage:
it can perform only a single tree.
It does not consider intermediate ancestors.
In this method sequence information is
reduced.
FITCH-MARGOLIASH METHOD:
it is a common pair-wise clustering
algorithm FM method. This method
developed by FITCH-MARGOLIASH(1967)
he showed that different sets of internal
branch lengths could be obtained by
considering alternate trees which moved one
or more branches to different parts of the tree.
Advantage:
It tests more than one tree.
Fastly method.
It can use empirical substitution
scoring methods.
Disadvantage:
Requires long time compare to NJ
method.
It does not consider intermediate
ancestors.
Long evolutionary distances will be
underestimated.
MINIMUM EVOLUTION:
First aided by Kidd & sgaramella-
zonta in 1971. the minimum evolution tree is
the tree which minimizes L.
It give an unrooted metric tree for n
sequences there are (2n-3)branches, each with
length ei L is for length.
This method is similar to parsimony
method.
Advantages:
easy to perform.
Quick calculation.
Disadvantage:
The sequence are not considered so the
information will be loss.
Not applicable to distantly divergent
sequences.
CHARACTER METHOD:
Maximum parsimony method(MP)
In MP method a multiple sequence alignment is
produced in order to predict which sequence positions are
likely to correspond.. These position will appear in vertical
column in MSA.
For each aligned position, Phylogenetic trees that
requires the smallest number of evolutionary changes to
produce the observed sequence changes are identified.
Finally those trees which produce the smallest
number of changes overall for all sequences are identified.
This method attempts to reconstruct mutational events
leading to the currently observed sequences.
The point of maximum parsimony is
that although sequences could be placed at
any position on the tree, the number of steps
required to interconvert one sequence to
another changes at each time branches are
moved.
Thus the parsimonious tree is the tree
whose topology requires the fewest total
mutations.
Advantages:
Reconstruct ancestral nodes.
It can be better performance than
distance based method.
It provides numerous “most
parsimonious trees”
Disadvantage:
branch lengths can not be determined
only topology.
Slower than matrix methods.
Sensitive to order in which sequences
are added to tree.
MAXIMUM LIKELIHOOD:
ML methods depends upon the first
obtaining a reliable sequence alignment and then
examining the changes in each column in the
alignment.
The likelihood its finding the actual
sequence changes at each columns in the aligned
sequence is calculated.
The probabilities for each aligned position
are then multiplied to provide likelihood for each
tree.
The tree that provide the maximum
likelihood value is the most probable tree.
This method introduced by Edwards and cavalli-
sforza(1964)for Phylogenetic analysis.
The main feature:
substitution model is chosen for the sequence
data.
Likelihood of observing the data in
substitution model is obtained for each topology
evaluated.
Topology that gives the highest likelihood is
chosen as the best tree.
The ML to phylogeny is implemented in DNAML
and PHYLIP package and in a modified version of DNAML
called fastDNAml(optimizing the tree by ML at each step)
The ML method of inference is available for both protein and
nucleic acid.
The following programs are
1.DNAML (only DNA data in the PHYLIP package)
2.Fast DNAML (only DNA data, a faster algorithm
to applied to DNAML)
3.ProtML(both DNA and protein data)
4.Puzzle(both DNA and protein data). This programs
is much faster than PROTML
Advantages:
uses all the sequence information.
Reconstruct ancestral nodes.
Generate branch lengths.
It has been perform better than
distance methods.
Disadvantage:
it is very slow.
It needs a long time to construct a tree.
NJ MP ML
Employs distance
between pairs of
sequence.
It employs a subset of
the alignment position of
the sequences.
It employs all the data
Minimize the distances
between the closed
neighbor.
Minimizes the total
distance
It maximize the
likelihood of the given
certain values for the
parameters.
Very fast slow Very slow
A good choice to
construct on initial tree
or to choose between
many candidates trees.
A good choice for less
than 30 sequences and
when homoplasy is rare.
A good choice for small
data sets and for
validating the trees
constructed by other
methods
Tree building
Tree building

Tree building

  • 1.
    Tree building-distance and characterbased method U.DEEPALAKSHMI I M.Sc MICROBIOLOFY 18PY04
  • 2.
    Phylogenetic tree: Phylogenetic treeis other wise known as phylogeny is a diagram that represents the lines of evolutionary descent of different species, organisms or genes from a common ancestor. The tree contain a nodes , branches , clade.
  • 4.
    Tree building isa method to Phylogenetic analysis there are two types Distance based method Character based method
  • 5.
    Distance based method: Allpossible pairs of sequences are aligned to determine which pairs are the most similar or closely related. These alignments provide a measure of the genetic distance between the sequences. These distance measurements are then used to predict the evolutionary relationship. These methods are: UPGMA. NJ. FM. MINIMUM EVOULATION .
  • 6.
    Character based method: thereare two types 1.Maximum parsimony(MP). 2.Maximum likelihood method(ML).
  • 7.
    Distance based method: UPGMA(Unweightedpair group method with arithmetic mean) this method is a simplest method of tree construction it is a cluster analysis derived from the clustering algorithms proposed by skoal and sneath (1973) it was originally developed for constructing taxonomic phenograms and it can also used to construct a Phylogenetic trees. UPGMA employs a sequential clustering algorithm . We first identify from among all the OTU s the 2 OTUs that are most similar to each other and then treat these a new single OTU. This method is least accurate but is widely used.
  • 8.
    STEPS: This method beginswith the construction of a distance matrix.(dij) The two taxa that have the smallest distances are clustered together (I and j are taxa)and form a OTU. The branch lengths for the I and J taxa are taken to be half of the distance between them(dij/2) I and J is a average distance and form a new taxa k. These all of them are clustered it forms a dik+djk/2. The average length is taken to be the average distance between OTU.
  • 10.
    NEIGHBOR JOINIG METHOD(NJ): Itis a simplest distance method. It begins by choosing the two most closely related sequences and then adding the next most distant sequence as third branch of tree. This method is developed in 1987 by saitou and nei. This method produces a unrooted trees.
  • 11.
    Advantage: this method isa fast and the large datasets for bootstrap analysis. This method permits correction for multiple substitutions. It can use empirical substitutions scoring methods. Disadvantage: it can perform only a single tree. It does not consider intermediate ancestors. In this method sequence information is reduced.
  • 13.
    FITCH-MARGOLIASH METHOD: it isa common pair-wise clustering algorithm FM method. This method developed by FITCH-MARGOLIASH(1967) he showed that different sets of internal branch lengths could be obtained by considering alternate trees which moved one or more branches to different parts of the tree.
  • 15.
    Advantage: It tests morethan one tree. Fastly method. It can use empirical substitution scoring methods. Disadvantage: Requires long time compare to NJ method. It does not consider intermediate ancestors. Long evolutionary distances will be underestimated.
  • 16.
    MINIMUM EVOLUTION: First aidedby Kidd & sgaramella- zonta in 1971. the minimum evolution tree is the tree which minimizes L. It give an unrooted metric tree for n sequences there are (2n-3)branches, each with length ei L is for length. This method is similar to parsimony method.
  • 18.
    Advantages: easy to perform. Quickcalculation. Disadvantage: The sequence are not considered so the information will be loss. Not applicable to distantly divergent sequences.
  • 19.
    CHARACTER METHOD: Maximum parsimonymethod(MP) In MP method a multiple sequence alignment is produced in order to predict which sequence positions are likely to correspond.. These position will appear in vertical column in MSA. For each aligned position, Phylogenetic trees that requires the smallest number of evolutionary changes to produce the observed sequence changes are identified. Finally those trees which produce the smallest number of changes overall for all sequences are identified. This method attempts to reconstruct mutational events leading to the currently observed sequences.
  • 20.
    The point ofmaximum parsimony is that although sequences could be placed at any position on the tree, the number of steps required to interconvert one sequence to another changes at each time branches are moved. Thus the parsimonious tree is the tree whose topology requires the fewest total mutations.
  • 21.
    Advantages: Reconstruct ancestral nodes. Itcan be better performance than distance based method. It provides numerous “most parsimonious trees” Disadvantage: branch lengths can not be determined only topology. Slower than matrix methods. Sensitive to order in which sequences are added to tree.
  • 23.
    MAXIMUM LIKELIHOOD: ML methodsdepends upon the first obtaining a reliable sequence alignment and then examining the changes in each column in the alignment. The likelihood its finding the actual sequence changes at each columns in the aligned sequence is calculated. The probabilities for each aligned position are then multiplied to provide likelihood for each tree. The tree that provide the maximum likelihood value is the most probable tree.
  • 24.
    This method introducedby Edwards and cavalli- sforza(1964)for Phylogenetic analysis. The main feature: substitution model is chosen for the sequence data. Likelihood of observing the data in substitution model is obtained for each topology evaluated. Topology that gives the highest likelihood is chosen as the best tree.
  • 25.
    The ML tophylogeny is implemented in DNAML and PHYLIP package and in a modified version of DNAML called fastDNAml(optimizing the tree by ML at each step) The ML method of inference is available for both protein and nucleic acid. The following programs are 1.DNAML (only DNA data in the PHYLIP package) 2.Fast DNAML (only DNA data, a faster algorithm to applied to DNAML) 3.ProtML(both DNA and protein data) 4.Puzzle(both DNA and protein data). This programs is much faster than PROTML
  • 26.
    Advantages: uses all thesequence information. Reconstruct ancestral nodes. Generate branch lengths. It has been perform better than distance methods. Disadvantage: it is very slow. It needs a long time to construct a tree.
  • 28.
    NJ MP ML Employsdistance between pairs of sequence. It employs a subset of the alignment position of the sequences. It employs all the data Minimize the distances between the closed neighbor. Minimizes the total distance It maximize the likelihood of the given certain values for the parameters. Very fast slow Very slow A good choice to construct on initial tree or to choose between many candidates trees. A good choice for less than 30 sequences and when homoplasy is rare. A good choice for small data sets and for validating the trees constructed by other methods