Tree Traversal
Tree Traversal
ABSTRACT
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be
statistically consistent when it is used in conjunction with ordinary least-squares (OLS)
tting of a metric to a tree structure. The traditional approach to using ME has been to
start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological
search from that starting point. The rst stage requires O(n3 ) time, where n is the number of
taxa, while the current implementations of the second are in O(p n3 ) or more, where p is the
number of swaps performed by the program. In this paper, we examine a greedy approach
to minimum evolution which produces a starting topology in O(n2 ) time. Moreover, we
provide an algorithm that searches for the best topology using nearest neighbor interchanges
(NNIs), where the cost of doing p NNIs is O(n2 C p n), i.e., O(n2 ) in practice because p
is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when
used in combination with NNIs, produces trees which are fairly close to NJ trees in terms
of topological accuracy. We also examine ME under a balanced weighting scheme, where
sibling subtrees have equal weight, as opposed to the standard “unweighted” OLS, where
all taxa have the same weight so that the weight of a subtree is equal to the number of its
taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version,
requiring O(n2 £ diam(T )) operations to build the starting tree and O(p n £ diam(T)) to
perform the NNIs, where diam(T) is the topological diameter of the output tree. In the
usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n),
so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very
signi cant improvement over NJ and other distance-based algorithms, especially with large
trees, in terms of topological accuracy.
Key words: phylogenetic inference, distance methods, minimum evolution, topological accuracy,
computational speed.
1 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Cen-
ter Drive, Bethesda, MD, 20892.
2 Département Informatique Fondamentale et Applications, LIRMM, 161 rue Ada, 34392 Montpellier, France.
687
688 DESPER AND GASCUEL
INTRODUCTION
M inimum evolution was proposed by several authors (Kidd and Sgaramella-Zonta, 1971; Saitou
and Nei, 1987; Rzhetsky and Nei, 1993; Swofford et al., 1996) as a basic principle for phyloge-
netic inference. Given the matrix of pairwise evolutionary distances between the taxa being studied, this
principle involves rst estimating the length of any given topology and then selecting the topology with
shortest length. Minimum evolution is thus conceptually close to character-based parsimony and complies
with Occam’s principle of scienti c inference, which essentially maintains that simpler explanations are
preferable to more complicated ones and that ad hoc explanations should be avoided.
Numerous variants of the minimum evolution principle exist, depending on how the branch lengths
are estimated and how the tree length is calculated from these branch lengths. Several de nitions of tree
length have been proposed, differing from one another by the treatment of negative branch lengths. The
most common solution (Saitou and Nei, 1987; Rzhetsky and Nei, 1993) simply de nes the tree length
as the sum of all branch lengths, regardless of whether they are positive or negative. Branch lengths
are usually estimated within the least-squares framework. If all distance estimates can be assumed to be
independent and to have the same variance, we use the ordinary least-squares (OLS) framework. The
weighted least-squares framework corresponds to the case were distance estimates are independent but
(possibly) with different variances, while the generalized least-squares approach does not impose any
restriction and is able to bene t from the covariances of the distance estimates. It is well known that
distance estimates obtained from sequences do not have the same variance, because the largest distances
are much more variable than the shortest ones (Fitch and Margoliash, 1967) and are mutually depen-
dent when they share a common history (or path) in the true phylogeny (Nei and Jin, 1989). Therefore,
to estimate branch lengths from evolutionary distances, using generalized least-squares is theoretically
superior to using weighted least-squares, which is in turn more appropriate than ordinary least-squares
(Bulmer, 1991).
The minimum evolution principle has been shown to be statistically consistent when combined with
ordinary least-squares (Rzhetsky and Nei, 1993; Denis and Gascuel, 2002). This important property implies
that the more accurate the distance estimates, as induced by the use of long sequences when a correct
sequence evolution model is chosen, the higher the probability of recovering the true phylogeny. However,
ordinary least-squares poorly ts the features of evolutionary distance data, as explained above. Thus,
it is tempting to combine the minimum-evolution principle with a more reliable estimation of branch
lengths, using weighted least-squares or generalized least-squares. However, we recently demonstrated
that such a combination is not always statistically consistent and, therefore, could represent a dead end
towards obtaining better phylogenetic inference methods, especially in the case of generalized least-squares
(Gascuel et al., 2001).
This paper further investigates the minimum evolution principle, but with a more optimistic perspective.
First, we demonstrate that its usage in combination with ordinary least-squares, even when not fully optimal
in terms of topological accuracy, has the great advantage of leading to very fast algorithms, much faster
than the NJ algorithm (Saitou and Nei, 1987) and fast enough as to be able to build very large trees as
envisaged in biodiversity studies. Second, we show that a new version of this principle, rst introduced
by Pauplin (2000) to simplify tree length computation, is more appropriate than the OLS version. In this
new version, sibling subtrees have equal weight, as opposed to the standard unweighted OLS, where all
taxa have the same weight and thus the weight of a subtree is equal to the number of its taxa. This
new version can be seen as weighted, just as WPGMA is the weighted version of UPGMA (Sneath and
Sokal, 1973), but we will prefer the term “balanced” to avoid confusion with weighted least-squares.
In addition to the aforementioned fast OLS minimum evolution algorithms, we also present algorithms
to deal with this new balanced version that are also faster than NJ, though not as fast as their OLS
counterparts. Furthermore, the balanced algorithms produced output trees with better topological accuracy
than those from NJ, BIONJ (Gascuel, 1997a) and WEIGHBOR (Bruno et al., 2000). The rest of this paper
is organized as follows: we rst provide the notation and de nitions, then describe the algorithms for the
OLS version of the minimum evolution principle, explain how these algorithms are modi ed to deal with the
balanced version, provide simulation results to illustrate the gain in topological accuracy and run times, and
conclude by a brief discussion. The appendix provides the details of the algorithms and some mathematical
proofs.
ALGORITHMS FOR MINIMUM EVOLUTION 689
A tree is made of nodes (or vertices) and of edges (or branches). Among the nodes, we distinguish the
internal (or ancestral) nodes and the leaves (or taxa). The leaves are denoted as i; j; or k and the internal
nodes as u; v; or w, while an edge e is de ned by a pair of nodes and a length l.e/. We shall be considering
various length assignments of the same underlying shape. In this case, we shall use the word “topology,”
while “tree” will be reserved for an instance of a topology with given edge lengths associated. We use the
letter T to refer to a topology and T to refer to a tree. A tree is also made of subtrees (or clades), typically
denoted as A; B; C; or D. For the sake of simplicity, we shall use the same notation for the subtrees and
for the sets of taxa they contain. Accordingly, T also represents the set of taxa being studied, and n is the
number of these taxa. Moreover, we shall use lowercase letters, e.g., a; b; c; or d, to represent the subtree
roots. If A and B are two disjoint subtrees, with roots a and b respectively, we’ll say that A and B are
distant-k subtrees if there are k edges in the path from a to b.
The matrix 1 is the matrix of pairwise evolutionary distance estimates, with 1ij being the distance
between taxa i and j . Let A and B be two nonintersecting subtrees from a tree T . We de ne the average
distance between A and B as
1 X
1AjB D 1ij I (1)
jAjjBj
i2A;j 2B
² If A and B are singleton sets, i.e., A D fag and B D fbg, then 1AjB D 1ab ,
² Else, without loss of generality let B D B1 [ B2 as shown in Fig. 1, we then have
jB1 j jB2 j
1AjB D 1AjB1 C 1AjB2 : (2)
jB j jBj
It is easily seen that Equations (1) and (2) are equivalent. Equation (2) follows the notion that the weight
of a subtree is proportional to the number of its taxa. So every taxon has the same weight, and the same
holds for the distances as shown by Equation (1). Thus, this average is said to be unweighted (Sneath and
Sokal, 1973). It must be noted that the unweighted average distance between subtrees does not depend on
their topologies, but only on the taxa they contain.
The distance 1T is the distance induced by the tree T ; i.e., 1Tij is equal to the length of the path
connecting i to j in T , for every taxon pair .i; j / . Given a topology T and a distance matrix 1, the OLS
branch length estimation produces the tree T with topology T minimizing the sum of squares:
X
.1Tij ¡ 1ij /2 :
i;j 2T
Vach (1989), Rzhetsky and Nei (1993), and others showed analytical formulae for the proper OLS edge
length estimation, as functions of the average distances. Suppose e is an internal edge of T , with the four
subtrees A, B , C, and D de ned as depicted in Fig. 2(a). Then, the OLS length estimate of e is equal to
1£ ¤
l.e/ D ¸.1AjC C 1BjD / C .1 ¡ ¸/.1AjD C 1BjC / ¡ .1AjB C 1CjD / ; (3)
2
where
jAjjDj C jBjjCj
¸D :
.jAj C jBj/.jCj C jDj/
Suppose e is an external branch, with i, A, and B as represented in Fig. 2(b). Then we have
1
l.e/ D .1Aji C 1Bji ¡ 1AjB /: (4)
2
Equations (3) and (4) demonstrate an important property of OLS edge length estimation: the length
estimate of any given edge does not depend on the topology of the “corner” subtrees, i.e., A; B; C; and
D in Equation (3) and A and B in Equation (4), but only on the taxa contained in these subtrees.
Following Saitou and Nei (1987) and Rzhetsky and Nei (1993), we de ne the tree length l.T / of T to
be the sum of the edge lengths of T . The OLS minimum evolution tree is then that tree with topology T
minimizing l.T /, where T has the OLS edge length estimates for T , and T ranges over all possible tree
topologies for the taxa being studied.
Now, suppose that we are interested in the length of the tree T shown in Figure 2(a), depending on the
con guration of the corner subtrees. We then have (proof in Appendix 1)
1£ ¤
l.T / D ¸.1AjC C 1BjD / C .1 ¡ ¸/.1AjD C 1BjC / C 1AjB C 1CjD / (5)
2
C l.A/ C l.B/ C l.C/ C l.D/ ¡ 1ajA ¡ 1bjB ¡ 1cjC ¡ 1djD
where ¸ is de ned as in Equation (3). The advantage of Equation (5) is that the lengths l.A/; l.B/; l.C/,
and l.D/ of the corner subtrees, as well as the average root/leaf distances, 1ajA ; 1bjB ; 1cjC , and 1djD ,
do not depend of the con guration of A; B; C; and D around e. Exchanging B and C or B and D might
change the length of the ve edges shown in Fig. 2(a) and then the length of T , but not the lengths of
A, B, C, and D. This simply comes from the fact that the edge e is within the corner subtrees associated
FIG. 2. Corner subtrees used to estimate the length of e: (a) for e an internal edge, (b) for e an external edge.
ALGORITHMS FOR MINIMUM EVOLUTION 691
to any of the edges of A; B; C; and D. As we shall see in the next section, this property is of great help
in designing fast OLS tree-swapping algorithms.
Let us now turn our attention toward the balanced version of minimum evolution, as de ned by Pauplin
(2000). The tree length de nition is the same. Formulae for edge length estimates are identical to Equations
(3) and (4), with ¸ replaced by 1=2 and using a different de nition of the average distance between subtrees
that depends on the topology under consideration. Letting T be this topology, the balanced average distance
between two nonintersecting subtrees A and B is then recursively de ned by the following:
² If A and B are singleton sets, i.e., A D fag and B D fbg, then 1TAjB D 1ab ;
² Else, without loss of generality let B D B1 [ B2 as shown in Fig. 1, we then have
1 T
1TAjB D .1 C 1TAjB2 /: (6)
2 AjB1
The change from Equation (2) is that the sibling subtrees B1 and B2 now have equal weight, regardless
of the number of taxa they contain. Thus, taxa do not have the same in uence depending on whether they
belong to a large clade or are isolated, which can be seen as consistent in the phylogenetic inference context
(Sneath and Sokal, 1973). Moreover, a comparison of variants of the NJ algorithm showed by computer
simulation (Gascuel, 2000) that this “balanced” approach is more appropriate than the unweighted one
for reconstructing phylogenies with evolutionary distances estimated from sequences. Finally, the balanced
minimum evolution principle can be shown to be statistically consistent using a proof (to be published
elsewhere) along the lines of Rzhetsky and Nei’s (1993). Therefore, it was tempting to test the performance
of this balanced version of the minimum evolution principle.
Unfortunately, this new version does not have all of the good properties of the OLS version: the edge
length estimates given by Equations (3) and (4) now depend on the topology of the corner subtrees, simply
because the balanced average distances between these subtrees depend on their topologies. As we shall
see, this makes the algorithms more complicated and more expensive in computing time than with OLS.
However, the same tree length formula, Equation (5), holds with 1 being replaced by 1T and ¸ by 1=2,
and, fortunately, we still have the good property that tree lengths l.A/; l.B/; l.C/, and l.D/, as well as
average root/leaf distances 1TajA ; 1TbjB ; 1TcjC , and 1TdjD , remain unchanged when B and C or B and D
are swapped. Edge lengths within the corner subtrees may change when performing a swap, but their
(balanced) sums remain identical (proof in Appendix 2).
This section presents two algorithms for phylogenetic inference. The rst constructs an initial tree by
the stepwise addition of taxa to a growing tree, while the second improves this tree by performing local
rearrangements (or swapping) of subtrees. Both follow a greedy approach and tend, at each step, to min-
imize the OLS version of the minimum evolution criterion. This approach does not guarantee that the
global optimum will be reached, but only a local optimum. However, this kind of approach has proven to
be effective in many optimization problems (Cormen et al., 2000, 329–356), and we shall see that further
optimizing the minimum evolution criterion would not yield signi cant improvement in terms of topolog-
ical accuracy. Moreover, such a combination of heuristic and optimization algorithms is used in numer-
ous phylogenetic reconstruction methods, for example those from the PHYLIP package (Felsenstein, 1989).
Consider the tree T of Fig. 3, where k is inserted between subtrees C and A [ B, and assume that we
have the length l.T / D L of this new tree. Consider now the tree T 0 of Fig. 3, which is obtained from T
by exchanging k and A. Using Equation (5) and our above remarks we have
1£ ¤
l.T 0 / D L C .¸ ¡ ¸0 /.1kjA C 1BjC / C .¸0 ¡ 1/.1AjB C 1kjC / C .1 ¡ ¸/.1AjC C 1kjB / (7)
2
where
jAj C jBjjCj
¸D ;
.jAj C jBj/.jCj C 1/
and
jAj C jBjjCj
¸0 D :
.jAj C jCj/.jBj C 1/
In other words, the length of T 0 can be computed from the length of T . For this computation to be done
in O.1/ (i.e., constant) time, it is suf cient to have previously computed
1. all average distances 1kjS between k and any subtree S from Tk¡1 ;
2. all average distances between subtrees of Tk¡1 separated by two edges; for example, A and B in Fig. 3;
3. the number of leaves of every subtree.
Suppose we now consider the tree T 00 formed by moving the insertion of k to the edge e, where e is a sibling
edge to the insertion point of T 0 . The length of T 00 is computed by Equation (7) as l.T 00 / D L C f .e/,
where f .e/ depends on the computations for both T 0 and T 00 . We continue, searching every edge e of
Tk¡1 by recursively moving from one edge to its neighboring edges, and we obtain the cost c.e/ that
corresponds to the length of the tree Tk¡1 plus k inserted on e. Moreover, c.e/ can be written as L C f .e/.
Because we only seek to determine the best insertion point, we need not calculate the actual value of L,
as it is suf cient to minimize f .e/ with f D 0 for the rst insertion edge considered.
The algorithm can be summarized as follows:
² For k D 3, initialize the matrix of average distances between distant-2 subtrees and the array counting
the number of taxa per subtree. Form T3 with leaf set f1; 2; 3g.
² For k D 4 to n,
1. compute all 1kjS average distances;
2. starting from an initial edge e0 of Tk¡1 , set f .e0 / D 0 and recursively search each edge e to obtain
f .e/ from Equation (7);
3. select the best edge by minimizing f , insert k on that edge to form Tk , and update the average
distance between every pair of distant-2 subtrees as well as the number of taxa per subtree;
² return Tn .
To achieve Step 1, we recursively apply Equation (2), which requires O.k/ computing time (see Ap-
pendix 3). Step 2 is also done in O.k/ time, as explained above. Finally, to update the average distance
between any pair A; B of distant-2 subtrees, if k is inserted in the subtree A,
1 jAj
1fkg[AjB D 1kjB C 1AjB : (8)
1 C jAj 1 C jAj
Step 3 is also done in O.k/ time, because there are O.k/ pairs of distant-2 subtrees, and all the quantities
in the right-hand side of Equation (8) have already been computed. So we build Tk from Tk¡1 in O.k/
computational time, and thus the entire computational cost of the construction of T , as we sum over k,
is O.n2 / . This is much faster than NJ-like algorithms which require O.n3 / operations, and the FITCH
(Felsenstein, 1997) program which requires O.n4 / operations. As we shall see, this allows trees with
thousands of taxa to be constructed in few minutes. This algorithm is called GME (Greedy Minimum
Evolution) and additional details are described in Appendix 3.
The swap has to be performed when L.T / ¡ L.T 0 / > 0, and the best among all possible swaps (two
per internal edge) corresponds to the largest difference between L.T / and L.T 0 /. Moreover, assuming
that the average distances between the corner subtrees have already been computed, L.T / ¡ L.T 0 / can be
obtained in O.1/ time via Equation (9). Instead of computing the average distances between the corner
subtrees (which change when swaps are realized), we compute the average distances between every pair
of nonintersecting subtrees. This takes place before evaluating the swaps and requires O.n2 / time, using
an algorithm that is described in Appendix 4. The whole algorithm can be summarized as follows:
Step 1 requires O.n2 / time, Step 2 requires O.n/ time and Step 3 also requires O.n/ time because the
total number of subtrees is O.n/. Thus, the total complexity of the algorithm is O.n2 C pn/, where p is
the number of swaps performed. In practice, p is much smaller than n, as we shall see in Section 4, so this
algorithm has a practical time complexity of O.n2 /. It is very fast, able to improve trees with thousands of
taxa, and we call it FASTNNI (fast nearest neighbor interchanges). More details are given in Appendix 4.
Rzhetsky and Nei (1993) describe a procedure that requires O.n2 / to compute every branch length. In
one NNI, ve branch lengths are changed, so evaluating a single swap is in O.n2 /, searching for the best
694 DESPER AND GASCUEL
swap in O.n3 /, and their whole procedure in O.pn3 /. This can be improved using Bryant and Waddell’s
(1998) results, but the implementation in the PAUP environment of these ideas is still in progress (David
Bryant, personal communication). In any case, our O.n2 / complexity is optimal since it is identical to the
data size.
Neither FASTNNI nor GME needs to explicitly compute the length of the whole tree until the nal
topology is reached. To obtain the length of the nal tree, we use Equations (3) and (4), which requires
O.n/ time as the average distances between corner subtrees have already been computed during the
execution of FASTNNI.
The balanced averaging scheme lends itself both to an insertion-based approach and to tree swapping
from an initial topology, and the algorithms are essentially the same as with OLS. The main difference is
that updating can no longer be achieved using a fast method as expressed by Equation (8), because the
balanced average distance between A [ fkg and B now depends of the position of k within A.
1h T i
L.T 0 / D L C .1AjC C 1TkjB / ¡ .1TAjB C 1TkjC / : (10)
4
Step 1 is identical and provides all 1TkjS distances by recursively applying Equation (6). The main difference
is the updating performed in Step 3. Equation (10) requires only the balanced average distances between
distant-2 subtrees, but to iteratively update these distances, we use (and update) a data structure that
contains all distances between every pair of nonintersecting subtrees (as for FASTNNI).
When k is inserted into Tk¡1 , we must calculate the average 1TXjY k
[fkg for any subtree Y of Tk¡1 such
that Y [ fkg is a subtree of Tk , and any subtree X disjoint from Y [ fkg. We can enumerate all such pairs by
considering their respective roots. Let x be the root of X and y the root of Y . Regardless of the position
of k, any node of Tk could serve as the root x of X. Then, considering a xed x, any node in the path
from x to k could serve as the root y. Thus there are O.k £ [Link] // such pairs, where diam is the tree
diameter, i.e., the maximum number of edges between two leaves.
T k
Given such a pair, X; Y , let us consider how we may quickly calculate 1XjY [k from known quantities.
Consider the situation as depicted in Fig. 4. Suppose k is inserted by creating a new node w which pushes
the subtree Y1 farther away from B. Suppose there are .l ¡ 1/ edges in the path from w to y and the
subtrees branching off this path are Y2 ; Y3 ; : : : ; Yl , in order from w to y. Then
l
X
T T
1TXjY
k ¡l T k
[fkg D 2 .1kjX C 1XjY1 / C
k¡1
2¡.lC1¡i/ 1XjY
k¡1
i
:
iD2
Thus
T T
1TXjY
k k¡1 ¡l T k
[fkg D 1XjY C 2 .1kjX ¡ 1XjY1 /:
k¡1
(11)
The upper bound on the number of pairs is worst when Tk is a chain, with k inserted at one end. In
this case, the diameter is proportional to k and both the number of distances to update and the bound
are proportional to k 2 . However, the diameter is usually much lower. Assuming, as usual, a Yule-Harding
speciation process (Yule, 1925; Harding, 1971), the expected diameter is [Link].k// (Erdös et al., 1999),
which implies an average complexity of the updating step in O.k log.k//. Other (e.g., uniform) distributions
on phylogenetic treespare discussed by Aldous (2001), and by McKenzie and Steel (2000), with expected
diameter at most O. k/.
Therefore, the time p complexity of the whole insertion algorithm is O.n3 / in the worst case and
2 2
O.n log.n// (or O.n n/) in practice. This is still less than NJ and allows trees with thousands of
taxa to be constructed within a reasonable amount of time. This algorithm is called BME (Balanced
Minimum Evolution) and additional details are given in Appendix 5.
1
l.e/ D 1TA[BjC[D ¡ .1TAjB C 1TCjD /
2
and
1 T
l.e/ D .1 C 1TijB ¡ 1TAjB /;
2 ijA
for the external and internal branches, respectively (see Fig. 2 for the notation).
4. RESULTS
4.1. Protocol
We used simulations based on random trees with parameter values chosen so as to cover the features of
most real data sets, as revealed by the compilation of the phylogenies published in the journal Systematic
Biology during the last few years. This approach induces much smaller contrasts between the tested
696 DESPER AND GASCUEL
methods than those based on model trees (e.g., Gascuel, 1997a; Bruno, Socci, and Halpern, 2000). Indeed,
model trees are generally used to emphasize a given property of the studied methods, for example, their
performance when the molecular clock is strongly violated. Thus, model trees are often extreme and their
use tends to produce strong and possibly misleading differences between the tested methods. On the other
hand, random trees allow comparisons with a large variety of tree shapes and evolutionary rates and provide
a synthetic and more realistic view of the average performances.
We used 24- and 96-taxon trees, and 2,000 trees per size. For each of these trees, a true phylogeny, de-
noted as T , was rst generated using the stochastic speciation process described by Kuhner and Felsenstein
(1994), which corresponds to the usual Yule-Harding distribution on trees (Yule, 1925; Harding, 1971).
Using this generating process makes T ultrametric (or molecular clock–like). This hypothesis does not
hold in most biological data sets, so we created a deviation from the molecular clock, using a method
similar to that of Guindon and Gascuel (2002). Every branch length of T was multiplied by 1:0 C ¹X,
where X followed the standard exponential distribution .P .X > ´/ D e ¡´ / and ¹ was a tuning factor to
adjust the deviation from molecular clock; ¹ was set to 0.8 with 24 taxa and to 0.6 with 96 taxa. The
average ratio between the mutation rate in the fastest evolving lineage and the rate in the slowest evolving
lineage was then equal to about 2.0 with both tree sizes. With 24 taxa, the smallest value (among 2,000)
of this ratio was equal to about 1.2 and the largest to 5.0 (1.0 corresponds to the strict molecular clock),
while the standard deviation was approximately 0.5. With 96 taxa, the extreme values became 1.3 and 3.5,
while the standard deviation was 0.33.
These (2 £ 2,000) trees were then rescaled to obtain “slow,” “moderate” and “fast” evolutionary rates.
With 24 taxa, the branch length expectation was set to 0.03, 0.06, and 0.15 mutations per site, for the slow,
moderate, and fast conditions, respectively. With 96 taxa, we had 0.02, 0.04, and 0.10 mutations per site,
respectively. For both tree sizes, the average maximum pairwise divergence was of about 0.2, 0.4, and 1.0
substitutions per site, with a standard deviation of about 0.07, 0.14, and 0.35 for 24 taxa, and of about 0.04,
0.08, and 0.20 for 96 taxa. These values are in good accordance with real data sets. The maximum pairwise
divergence is rarely above 1.0 due to the fact that multiple alignment from highly divergent sequences is
simply impossible. Moreover, with such a distance value, any correction formula, e.g., Kimura’s (1980),
becomes very doubtful, due to our ignorance of the real substitution process and to the fact that the larger
the distance the higher the gap between estimates obtained from different formulae. The medium condition
(»0.4) corresponds to the most favorable practical setting, while in the slow condition (from »0.1 to »0.3)
the phylogenetic signal is only slightly perturbed by multiple substitutions, but it can be too low, with
some short branches being not supported by any substitution.
SeqGen (Rambaut and Grassly, 1997) was used to generate the sequences. For each tree T (among 3 £
2 £ 2,000), these sequences were obtained by simulating an evolving process along T according to the
Kimura (1980) two-parameter model with a transition/transversion ratio of 2.0. The sequence length was
set to 500 sites. Finally, DNADIST from the PHYLIP package (Felsenstein, 1989) was used to compute
the pairwise distance matrices, assuming the Kimura model with known transition/transversion ratio. The
data les are available on our web page ([Link]
Every inferred tree, denoted as Tb, was compared to the true phylogeny T (i.e., that used to generate the
sequences and then the distance matrix) with a topological distance equivalent to Robinson and Foulds’
(1981). This distance is de ned by the proportion of internal branches (or bipartitions) which are found
in one tree and not in the other one. This distance varies between 0.0 (both topologies are identical) and
1.0 (they do not share any internal branch). The results were averaged over the 2,000 test sets for each
tree size and evolutionary rate. Finally, to compare the various methods to NJ, we measured the relative
error reductions .PM ¡ PNJ /=PNJ , where M is any tested method different from NJ and PX is the average
topological distance between T b and T when using method X.
other programs as input for FASTNNI and BNNI. All of GME, BME, FASTNNI, and BNNI are available
at our web page ([Link] and via ftp at [Link]
We also measured how far from the true phylogeny one gets with NNIs. This served as a measure of the
limitation of each of the minimum evolution frameworks, as well as a performance index for evaluating
our algorithms. Ordinarily, the (OLS or balanced) minimum evolution criterion will not, in fact, observe a
minimum value at the true phylogeny. So, starting from the true phylogeny and running FASTNNI or BNNI,
we end up with a tree with a signi cant proportion of false branches. When this proportion is high, the
corresponding criterion can be seen as poor regarding topological accuracy. Thus this proportion represents
the best possible topological accuracy that can be achieved by optimizing the considered criterion, since
we would not expect any algorithm optimizing this criterion in the whole tree space to nd a tree closer
to the true phylogeny than the tree that is obtained by “optimizing” the true phylogeny itself.
Results are displayed in Table 1 and Table 2.
The performances of the basic algorithms are strongly correlated with the number of computations that
they perform. Both O.n2 / algorithms are clearly worse than NJ. BME (in O.n2 log.n//) is still worse than
NJ, but becomes very close with 96 taxa, which indicates the strength of the balanced minimum evolution
framework. BIONJ, in O.n3 / like NJ and having identical computing times, is slightly better than NJ in
all conditions, while WEIGHBOR, also in O.n3 / but requiring complex numerical calculations, is better
than BIONJ. Finally, FITCH, which is in O.n4 /, is the best with 24 taxa, but was simply impossible to
evaluate with 96 taxa (see the computing times below).
After FASTNNI, we observe that the output topology does not depend much on the input topology.
Even the poor HGT topology becomes close to NJ, which indicates the strength of NNIs. However,
Slow rate
True Tree .109 ¡1.6% .104 ¡6.2%
FITCH .109 ¡1.9% .113 2.0% .107 ¡3.4%
WEIGHBOR .109 ¡1.8% .112 1.7% .107 ¡3.0%
BIONJ .111 ¡0.3% .113 2.0% .107 ¡3.6%
NJ .111 0% .113 2.0% .107 ¡3.5%
BME .118 7.1% .113 1.9% .107 ¡2.8%
GME .122 10% .113 2.1% .107 ¡3.4%
HGT/FP .334 202% .112 1.1% .107 ¡2.9%
Moderate rate
True Tree .092 3.7% .083 ¡5.8%
FITCH .085 ¡4.9% .094 6.0% .085 ¡4.0%
WEIGHBOR .085 ¡4.3% .094 6.2% .085 ¡4.0%
BIONJ .087 ¡2.0% .094 6.5% .085 ¡4.2%
NJ .088 0% .094 6.6% .085 ¡4.0%
BME .100 13% .094 6.3% .084 ¡4.9%
GME .107 21% .095 7.1% .084 ¡4.8%
HGT/FP .326 268% .095 7.5% .088 ¡0.2%
Fast rate
True Tree .088 6.5% .076 ¡8.3%
FITCH .076 ¡7.8% .090 8.8% .077 ¡7.0%
WEIGHBOR .077 ¡6.8% .089 8.2% .077 ¡6.9%
BIONJ .079 ¡3.6% .090 9.0% .077 ¡6.6%
NJ .082 0% .090 9.1% .077 ¡6.9%
BME .098 19% .090 9.1% .076 ¡7.1%
GME .105 28% .090 9.8% .076 ¡7.6%
HGT/FP .329 300% .090 9.8% .083 0.8%
a The rst number indicates the average topological distance between the inferred tree and the true phylogeny.
The second number (percentage) provides the relative difference in topological distance between the method
considered and NJ; the more negative this value, the better the method was relative to NJ.
698 DESPER AND GASCUEL
Slow rate
True Tree .172 ¡5.6% .167 ¡8.8%
WEIGHBOR .178 ¡2.5% .181 ¡0.7% .173 ¡5.2%
BIONJ .180 ¡0.9% .182 ¡0.3% .173 ¡5.1%
NJ .183 0% .182 ¡0.2% .173 ¡5.2%
BME .186 1.9% .181 ¡0.6% .173 ¡5.3%
GME .199 8.8% .183 0.3% .173 ¡5.3%
HGT/FP .512 185% .185 1.5% .175 ¡4.3%
Moderate rate
True Tree .132 ¡3.0% .115 ¡15.4%
WEIGHBOR .129 ¡5.4% .137 0.5% .118 ¡13.0%
BIONJ .134 ¡1.9% .138 1.3% .118 ¡13.0%
NJ .136 0% .139 1.8% .119 ¡12.9%
BME .137 1.0% .138 1.1% .118 ¡13.2%
GME .158 16% .140 2.7% .118 ¡13.2%
HGT/FP .480 253% .143 5.2% .123 ¡9.3%
Fast rate
True Tree .115 0.6% .088 ¡23.4%
WEIGHBOR .103 ¡10% .119 3.8% .091 ¡21.0%
BIONJ .112 ¡2.5% .121 5.1% .090 ¡21.7%
NJ .115 0% .121 5.5% .090 ¡21.3%
BME .117 1.8% .120 4.4% .090 ¡21.4%
GME .144 25% .122 6.3% .091 ¡21.1%
HGT/FP .465 306% .126 9.4% .098 ¡14.7%
a See note to Table 1.
except for the true phylogeny in some cases, this output is worse than NJ. This con rms our previous
results (Gascuel, 2000), which indicated that the OLS version of minimum evolution is reasonable but
not excellent for phylogenetic inference. But this phenomenon is much more visible with 24 taxa than
with 96, so we expect the very fast GME C FASTNNI O.n2 / combination to be equivalent to NJ with
large n.
After BNNI, all initial topologies become better than NJ. Moreover, they converge to each other inde-
pendently of the starting point (results not shown), which demonstrates, among other things, that it would
be useless to jumble the taxa ordering in GME or BME, as is done optionally in PHYLIP programs to
improve the tness. With 24 taxa, the performance is equivalent to that of FITCH, while with 96 taxa the
results are far better than those of WEIGHBOR, especially in the Fast condition where BNNI improves
NJ by 21%, against 10% for WEIGHBOR. These results are somewhat unexpected, since BNNI has a
low O.n2 C pn log.n// average time complexity. Moreover, as explained above, we do not expect very
high contrast between any inference method and NJ, due to the fact that our data sets represent a large
variety of realistic trees and conditions, but do not contain extreme trees, notably concerning the maximum
pairwise divergence. Preliminary experiments indicate that maximum parsimony is close to BNNI in the
Slow and Moderate conditions, but worse in the Fast condition, the contrast between these methods being
in the same range as those reported in Tables 1 and 2. The combination of BNNI with BME or GME can
thus be seen as remarkably ef cient and accurate. Moreover, regarding the true phylogeny after BNNI, it
appears that little gain could be expected by further optimizing this criterion, since our simple optimization
approach is already close to these optimal values.
summarizes the average computational times (in hours:minutes:seconds) required by the various programs
to build phylogenetic trees. The leftmost two columns were averaged over two thousand 24- and 96-taxon
trees, the third over ten 1,000-taxon trees, and the nal column over four 4,000-taxon trees. Stars indicate
entries where the algorithm was deemed to be too slow to bother with that test.
FITCH takes approximately 25 minutes to make a 96-taxon tree, which made the simulations impossible
(12,000 trees were considered) and would make impractical its application to such taxa number, because
real studies often include bootstrapping which requires the construction of a large number of trees (usually
1,000). In practice, if one wishes to nd a tree minimizing the weighted least-squares criterion, it is faster
(»1 minute for 96 taxa) to use PAUP (Swofford, 1996) to perform a weighted least-squares NNI search
from a tree created via some other fast method, i.e., to combine weighted least-squares with a tree building
strategy analogous to ours.
For similar reasons, we did not test the running time of WEIGHBOR for 1000- or 4000-taxon trees.
WEIGHBOR’s running time increased more than 60-fold when moving from 24-taxon trees to 96-taxon
trees; thus, we judged it infeasible to run WEIGHBOR on even one 1,000-taxon tree.
The fastest programs in Table 3 were the ME, ME C FASTNNI, and HGT/FP algorithms. The HGT/FP
algorithm was the only one which was able to maintain the fast speed of the GME C FASTNNI com-
bination, but it did so at a serious cost in terms of topological accuracy. The BME and BME C BNNI
combinations lagged behind their OLS-based counterparts, but were still signi cantly faster than NJ and
BIONJ. Of particular interest is the GME C BNNI combination, which was not only markedly faster
than NJ, but also produced superior topologies. Unfortunately, computational constraints made a thorough
testing of algorithm performance at the 1,000- and 4,000-taxon levels dif cult to achieve, so we cannot
claim statistical signi cance for the relative speeds for the larger data sets. However, we suspect that
implementation re nements such as those used in PAUP’s NJ could be used to make our algorithms still
much faster.
Table 4 contains the number of NNIs performed by each of the three combinations which appear in
Table 3. Not surprisingly, the largest number of NNIs was consistently required when the initial topology
was made to minimize the OLS ME criterion, but NNIs were chosen to minimize the balanced ME criterion
(i.e., GME C BNNI). This table shows the overall superiority of the BME tree over the GME tree, when
combined with BNNI. In all of the cases considered, the average number of NNIs considered for each
value of n was considerably less than n itself.
5. DISCUSSION
We have presented a new greedy implementation of minimum evolution tree topology searching that
is considerably faster than most distance algorithms currently in use. The currently most popular fast
algorithm is Neighbor-Joining, an O.n3 / algorithm. Our greedy ordinary least-squares minimum evolution
tree construction algorithm (GME) runs at O.n2 /, the size of the input matrix. Although the GME tree is
not quite as accurate as the NJ tree, it is a good starting point for nearest neighbor interchanges (NNIs).
The combination of GME and FASTNNI, which achieves NNIs according to the ordinary least-squares
criterion, also in O.n2 / time, has a topological accuracy very close to that of NJ, especially with large
numbers of taxa.
However, the balanced minimum evolution framework appears much more appropriate for phylogenetic
inference than the ordinary least-squares version. This is likely due to the fact that it gives less weight to the
topologically long distances, i.e., those containing numerous edges, while the ordinary least-squares method
puts the same con dence on each distance, regardless of its length. Even when the usual and topological
lengths are different, they are strongly correlated. The balanced minimum evolution framework is thus
conceptually close to weighted least-squares (Fitch and Margoliash, 1967), which is more appropriate than
ordinary least-squares for evolutionary distances estimated from sequences. A preliminary look at PAUP’s
weighted least-squares topology-searching ability leads us to hypothesize that minimizing weighted least-
squares criterion or minimizing the balanced minimum evolution criterion would lead to output trees of
roughly the same quality. Studying the formal relationship between weighted least-squares and balanced
minimum evolution is an important direction for further research.
The balanced NNI algorithm (BNNI) achieves outstanding performance, superior to those of NJ, BIONJ,
and WEIGHBOR. BNNI is an O.n2 Cnp diam.T // algorithm, where diam.T / is the diameter of the inferred
tree, and p the number of swaps performed. With p diam.T / D O.n/ for most data sets, the combination
of GME and BNNI effectively gives us an O.n2 / algorithm with high topological accuracy.
APPENDIX
Lemma 1.1. If T is the OLS tree for its topology and for the matrix 1, and u is a node in T which
separates the three subtrees X; Y; and Z, we then have 1XjY D 1TXjY . (And by symmetry this identity also
holds for XjZ, and Y jZ.)
where
jBjjCj C jAjjDj
¸D :
.jAj C jBj/.jCj C jDj/
It is clear that
and similarly
1CjD D 1TCjD D 1cjC C l.w; c/ C l.w; d/ C 1djD :
Thus,
l.v; a/ C l.v; b/ C l.w; c/ C l.w; d/ D 1AjB C 1CjD ¡ .1ajA C 1bjB C 1cjC C 1d jD /: (14)
We substitute the right-hand side of Equation (14) and the OLS value for l.v; w/ from Equation (3) into
Equation (13) to achieve the desired Equation (5).
A2. Constant subtree lengths under tree swapping in the balanced scheme
First, we consider Equation (3) for internal edge length estimation when using balanced weights. Since
¸ D 1=2, the equation simpli es to:
µ ¶
1 1 T
l T .e/ D .1AjC C 1TBjD C 1TAjD C 1TBjC / ¡ .1TAjB C 1TCjD /
2 2
1 T
D 1TA[BjC[D ¡ .1 C 1TCjD /: (15)
2 AjB
The balanced edge length formula for external edges is essentially the same: Equation (4) simpli es to
1 T
l T .e/ D 1TijA[B ¡ 1 : (16)
2 AjB
Now let’s consider a topology T with three subtrees A; B; and C, which meet at the vertex v. Let a; b;
and c be the roots of these three subtrees, with b1 and b2 the children of b and c1 , c2 the children of c,
determining leaf subsets B1 ; B2 ; C1 , and C2 , respectively. By Equation (15),
1
l.v; b/ D 1TA[CjB ¡ .1TB1 jB2 C 1TAjC /
2
1 T
D .1AjB C 1TBjC ¡ 1TB1 jB2 ¡ 1TAjC /:
2
Similarly,
1 T
l.v; c/ D .1 C 1TBjC ¡ 1TC1 jC2 ¡ 1TAjB /;
2 AjC
and thus
1 T
l.v; b/ C l.v; c/ D 1TBjC ¡ .1 C 1TC1 jC2 /: (17)
2 B1 jB2
In particular, the right-hand side of Equation (17) is completely independent of the internal structure of A.
Thus, if we perform a tree swap internal to A, the sum l.v; b/ C l.v; c/ will remain constant. Analogous
arguments will show the same result if either b or c is a leaf. This indicates that neither the length of a
subtree (here B [ C) nor its average root/leaves distance (here, from v to any leaves of B [ C) is changed
when a swap is performed within a disjoint subtree (here, A).
² We root Tk¡1 at any taxon r and let d be its unique direct descendant.
² Let DFS-POST (Tarjan, 1983, 14–19) be the depth- rst post-order of the vertices of Tk¡1 .
702 DESPER AND GASCUEL
² Let DFS-PRE (Tarjan, 1983, 14–19) be the depth- rst preorder of the vertices of Tk¡1 .
² For any nonleaf node v (or w), let v1 and v2 (or w1 and w2 , respectively) denote its two children.
² For any node v of Tk¡1 , if v is a leaf, let L.v/ D fvg, otherwise let L.v/ be the set of all nodes of Tk¡1
which are descendants of v, including v itself. Let U .v/ the complement to L.v/ among the nodes of
Tk¡1 .
(b) Set 1kjU .d/ D 1kr : Loop over w in DFS-PRE ¡ fr; dg. Let s be the sibling of w and p be the
parent of s and w. Compute
This achieves the computation of all 1kjS average distances. All other steps of the algorithm are described
in Section 2.1.
iii. ELSE
jL.w1 /j1L.v/jL.w1 / C jL.w2 /j1L.v/jL.w2 /
1L.v/jL.w/ D :
jL.w/j
(b) To calculate all 1L.v/jU.d/ distances, loop v over DFS-POST ¡ fr; dg. IF v 2 [n], then 1L.v/jU .d / D
1vr , ELSE
jL.v1 /j1L.v1 /jU .d/ C jL.v2 /j1L.v2 /jU .d/
1L.v/jU .d/ D :
jL.v/j
(c) We now calculate all distances of the form 1L.v/jU.w/ , where w is an ancestor of v. Loop w over
DFS-PRE ¡ fr; dg. Let s be the sibling of w and p be the parent of s and w. Loop over v from
L.w/ via any manner. For each v, set
It is easily seen that every formula above uses already computed terms, and thus requires O.1/ time,
due to the computing orchestration based on the DFS-POST and DFS-PRE orders. Each pair of vertices
v; w is the subject of exactly one 1L.v/jL.w/ ; 1L.v/jU.w/ or 1L.w/jU .v/ calculation, thus the computational
time is O.n2 /, and we can store all of these average distances unambiguously in a matrix.
2. Create the heap of possible swaps. Loop e over the internal edges of T via any method.
(a) Using Equation (9) determine the change in total lengths s1 .e/; s2 .e/ for each of the two possible
tree swaps across e. Let s.e/ D max.0; s1 .e/; s2 .e//.
(b) Form a heap containing all the values of s.e/ which are positive.
3. Achieve the best swap and update the data.
(a) Assuming the heap is nonempty, let e D .v; w/ be the best edge on the heap. Let A; B; C; and D
denote the four subtrees which meet at e, with roots a; b; c; and d, as in Figure 2(a), such that a
and b are incident to v, while c and d are incident to w. Suppose B $ C is the indicated swap.
Remove edges .v; b/ and .w; c/ from the topology and add edges .v; c/ and .w; b/.
(b) Loop over the subtrees S of A [ C via any manner. Compute 1SjB[D by averaging 1SjB and 1SjD
using Equation (2). Achieve the same for B [ D.
(c) Set s.e/ D 0 and remove it from the heap. Let f range over the four edges incident to e, recalculate
s.f / by testing the two possible tree swaps across f , and, if s.f / > 0, insert s.f / into the heap.
(d) If the heap is nonempty, return to Step 3(a). Otherwise, use the matrix of average distances and
Equations (3) and (4) to assign branch lengths to all of the edges of the nal tree.
3. Let .s; u/ be the edge where k is inserted, with S and U the subtrees having roots s and u, respectively,
and Tk¡1 D S [ U and S \ U D ;.
(a) Loop over the subtrees Z of S.
i. Let Y be the complement of Z in Tk¡1 .
Tk
ii. Loop over X µ Z and use Equation (11) to calculate 1XjY [fkg .
(b) Repeat (a) with U in the place of S.
A similar adjustment is done for BNNI. Suppose we start with the topology T , as shown in Figure 2(a),
and swap subtrees B and C to form the topology T 0 . Suppose x and y are nodes in A [ fvg, with y on the
path from x to v, and l edges between y and v. Let X and Y be the nonintersecting subtrees with roots x
and y, respectively. We allow for the possibility that l D 0, in which case we choose X µ A. (See Fig. 5.)
Our updating equation is:
0
1TXjY D 1TXjY ¡ 2¡.lC2/ 1TXjB C 2¡.lC2/ 1TXjC (18)
0 1 T
1TXjB[D D .1 C 1TXjD /:
2 XjB
0
Perform analogous computations to compute 1TY jA[C for all subtrees Y ½ B [ D.
(c) Calculate
0 1 T
1TA[CjB[D D .1 C 1TAjD C 1TCjB C 1TCjD /
4 AjB
ACKNOWLEDGMENTS
Special thanks go to Stéphane Guindon, who generated the data sets used in Section 4, and to Mike
Steel for his help and advice.
REFERENCES
Aldous, D.J. 2001. Stochastic models and descriptive statistics for phylogenetic trees from Yule to today. Statist. Sci.
16, 23–34.
Bruno, W.J., Socci, N.D., and Halpern, A.L. 2000. Weighted neighbor joining: A likelihood-basedapproach to distance-
based phylogeny reconstruction. Mol. Biol. Evol. 17, 189–197.
Bryant, D., and Waddell, P. 1998. Rapid evaluation of least-squares and minimum-evolution criteria on phylogenetic
trees. Mol. Biol. Evol. 15, 1346–1359.
Bulmer, M. 1991. Use of the method of generalized least squares in reconstructing phylogenies from sequence data.
Mol. Biol. Evol. 8, 868–883.
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 2000. Introduction to Algorithms, MIT Press, Cambridge, MA.
Cs Íurös, M. 2002. Fast recovery of evolutionary trees with thousands of nodes. J. Comp. Biol. 9, 277–297.
Denis, F., and Gascuel, O. 2002. On the consistency of the minimum evolution principle of phylogenetic inference.
Discr. Appl. Math. In press.
Erdös, P.L., Steel, M., Székély, L., and Warnow, T. 1999. A few logs suf ce to build (almost) all trees: Part II. Theo.
Comp. Sci. 221, 77–118.
Felsenstein, J. 1989. PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166.
Felsenstein, J. 1997. An alternating least-squares approach to inferring phylogenies from pairwise distances. Syst. Biol.
46, 101–111.
Fitch, W.M., and Margoliash, E. 1967. Construction of phylogenetic trees. Science 155, 279–284.
Gascuel, O. 1997a. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data.
Mol. Biol. Evol. 14, 685–695.
Gascuel, O. 1997b. Concerning the NJ algorithm and its unweighted version, UNJ. In Mirkin, B., McMorris, F., Roberts,
F., and Rzetsky, A., eds. Mathematical Hierarchies and Biology, American Mathematical Society, Providence, RI.
Gascuel, O. 2000. On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. Mol.
Biol. Evol. 17, 401–405.
ALGORITHMS FOR MINIMUM EVOLUTION 705
Gascuel, O., Bryant, D., and Denis, F. 2001. Strengths and limitations of the minimum evolution principle. Syst. Biol.
50, 621–627.
Guindon, S., and Gascuel, O. 2002. Ef cient biased estimation of evolutionary distances when substitution rates vary
across sites. Mol. Biol. Evol. 19, 534–543.
Harding, E. 1971. The probabilities of rooted tree-shapes generated by random bifurcation. Adv. Appl. Probab. 3,
44–77.
Kidd, K., and Sgaramella-Zonta, L. 1971. Phylogenetic analysis: Concepts and methods. Am. J. Human Genet. 23,
235–252.
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies
of nucleotide sequences. J. Mol. Evol. 16, 111,120.
Kuhner, M.K., and Felsenstein, J. 1994. A simulation comparison of phylogeny algorithms under equal and unequal
rates. Mol. Biol. Evol. 11, 459–468.
McKenzie, A., and Steel, M. 2000. Distributions of cherries for two models of trees. Math. Biosci. 164, 81–92.
Nei, M., and Jin, L. 1989. Variances of the average numbers of nucleotide substitutions within and between populations.
Mol. Biol. Evol. 6, 290–300.
Pauplin, Y. 2000. Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51, 41–47.
Rambaut, A., and Grassly, N.C. 1997. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence
evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238.
Robinson, D., and Foulds, L. 1981. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147.
Rzhetsky, A., and Nei, M. 1993. Theoretical foundation of the minimum-evolution method of phylogenetic inference.
Mol. Biol. Evol. 10, 1073–1095.
Saitou, N., and Nei, M. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees.
Mol. Biol. Evol. 4, 406–425.
Sneath, P.H.A., and Sokal, R.R. 1973. Numerical Taxonomy, 230–234, W.K. Freeman, San Francisco.
Swofford, D. 1996. PAUP—Phylogenetic Analysis Using Parsimony (and other methods), Version 4.0.
Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. 1996. Phylogenetic inference. In Hillis, D., Moritz, C.,
and Mable, B., eds., Molecular Systematics, 470–514, Sinauer, Sunderland, MA.
Tarjan, R.E. 1983. Data Structures and Network Algorithms, SIAM, Philadelphia.
Vach, W. 1989. Least squares approximation of addititve trees. In Opitz, O., ed. Conceptual and Numerical Analysis
of Data, Springer-Verlag, Berlin.
Yule, G. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis. Philos. Trans. Roy.
Soc. London Ser. B, Biological Sciences 213, 21–87.
E-mail: gascuel@[Link]