0% found this document useful (0 votes)

11 views78 pages

Molecular Phylogeny

The document provides an overview of molecular phylogenetics, detailing the processes of evolution, the significance of phylogenetic trees, and methods for constructing these trees. It discusses the importance of genetic data in understanding evolutionary relationships and the various models and techniques used for phylogenetic reconstruction, including maximum likelihood and Bayesian methods. Additionally, it highlights the challenges and considerations in estimating genetic distances and the role of bootstrapping in assessing the reliability of phylogenetic trees.

Uploaded by

trangnt.m22bio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views78 pages

Molecular Phylogeny

Uploaded by

trangnt.m22bio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Molecular phylogenetic

Introduction and application

University of Science and Technology of Hanoi

MSc. class
Chung The Hao, PhD
Why evolution?

Universality Diversity
Darwin’s finches
Peter and Rosemary Grants
Natural selection in ground finches.
The processes of evolution

Mutation: Genetic variability that evolution will act upon.

Natural selection: Differential survival and reproduction of individuals due to

differences in phenotype. Under natural selection, mutations could be beneficial,
deleterious, or neutral.
What is phylogenetics?

A phylogeny is a diagram describing the ancestral relationships of organisms.

Phylogenetics looks for homology as evidence of common ancestry

Anatomical homology of
forelimb in animals.

Homology: similarity among organisms due to

inheritance from a shared common ancestor.
Why phylogenetics?
• The phylogeny is confident hypothesis/model of evolutionary
relationship of taxonomic groups.

• Provide an important evolutionary framework to understand

biology, and to ask further questions about population
structure, co-evolution, evolution rates.

Genomics in infectious diseases:

• Confidently differentiate organisms at strain-level, without the
need to reference for phenotypic data.
Evolutionary tree of Life
Why use molecular data?
• Similarity in nucleotide sequence almost always points to homology.

• For microbes, we can’t construct trees based on known relationships or morphology.

• Constructing tree using microbe’s sequence data to infer evolutionary relationships

and understand epidemiology.

Genetic comparison
between HIV-1 and Simian
Immunodeficiency Virus
(SIV)
How to read a tree?
Some terminologies

Mutation/Polymorphism: mutant allele that arises in a population

Substitution: The mutant allele that is fixed in the population (i.e.

it is carried through into the following generations).

Past
substitution
Phylogenetic mutation
is built on
substitutions.

Present
Heritable and phylogenetic
meaningful

Not phylogenetic meaningful

Some terms

Tips/Taxa/Leaves
Branches or
Lineages A
Represent the
B TAXA (genes,
populations,
species, etc.)
C used to infer
the phylogeny,
D i.e. YOUR
Most recent SEQUENCES
common ancestor E
Nodes or
(MRCA) Divergence Points
or ROOT of (represent hypothetical
the Tree ancestors of the taxa)
Similarity vs. relatedness
Sequence similarity and relatedness are not the same thing, even though
evolutionary relationships are based on certain types of similarity

Similarity: being similar in a measurable metric (substitution differences)

Relatedness: genetically/evolutionarily connectedness (historical fact)

Two taxa can be most similar without being most closely-related.

6
Taxon B
1
1 7 differences
3 Taxon C
This axis
means 1 Taxon A
nothing!
3 differences
5
Taxon D

This axis usually indicates genetic distance (occasionally time)

Unrooted trees
The possible unrooted trees of four taxa (A, B, C, D)

Tree 1 Tree 2 Tree 3

A C A B A B

B D C D D C
# Taxa (N) # Unrooted trees
3 1
• Unrooted trees tell us the similarity 4 3
5 15
among taxa, but not the ancestry nor 6 105
the origin 7
8
945
10,935
9 135,135
10 2,027,025
• The number of unrooted trees increases .
.
.
.
in a greater than exponential manner . .
. .
with the number of taxa 30 ≈3.58 x 1036
Finding a root
Inferring evolutionary relationship requires a rooted phylogeny
B
C
B
C

Unrooted tree Root

Root D
D
A
A
A B C A
D B
C D
Rooted tree

Root
Root

• Both tree are technically correct, but give us different evolutionary stories

• Finding a correct root is important and also difficult.

Finding a root
By outgroup:

• Use taxa (outgroup) that are Monophyletic

known to fall outside of the clade
group of interest

• Require some prior knowledge

Outgroup/
about the relationship between outgroup Basal
taxa taxon

By midpoint or distance:

d (A,D) = 10 + 3 + 5 = 18 • Default by many software

A Midpoint = 18 / 2 = 9

10
• Root the tree at the midway point
C
3 between the most two distant taxa.
2
2
B 5 D
• OFTEN WRONG.
How to build a tree?

A very short introduction

From sequences to tree
1. Formulate a hypothesis!

2. Gather appropriate sequences

• From your samples
• From publicly available databases

3. Align sequences
• Do you have an appropriate outgroup?

4. Run preliminary trees

5. Determine your evolution model(s)

6. Then run more trees, test, and run trees, and further
analyses
Our goal: to reconstruct evolution

ACAGAT
t7
C(2)>T (2)
evolutionary
What’s a phylogeny?

hypothesis G(4)>A(4)

t6
A(4)>G (4)
t5
A(5)>T (5)

observed ACAGAT ACAGTT ATAGAT ATAAAT

data t1=0 t2=0 t3=0 t4=0
DNA sequence alignment
• DNA sequence alignment (input) is the most important thing in
phylogenetic reconstruction.

• Purpose: pin-point homologous nucleotides

• Could be easy or difficult

Taxon 1 GCGGCCCA TCAGGTAGTT GGTGG

Taxon 2 GCGGCCCA
GCGTTCCA
TCAGGTAGTT
TCAGCTGGTT
GGTGG
GGTGG
Easy
Taxon 3
Taxon 4 GCGTCCCA TCAGCTAGTT GGTGG
Taxon 5 GCGGCGCA TTAGCTAGTT GGTGA
******** ********** *****
TTGACATG CCGGGG---A AACCG
TTGACATG CCGGTG--GT AAGCC Difficult due
TTGACATG -CTAGG---A ACGCG to insertions
TTGACATG -CTAGGGAAC ACGCG
TTGACATC -CTCTG---A ACGCG or deletions
******** ?????????? ***** (indels)
Always perform manual check after alignment
Estimating genetic distances
Simplest way: Genetic distance is the observed mutational differences between taxa

DNA sequence alignment

between HIV-1 and SIV.

Multiple substitutions at a
single site – hidden
information.

A T A A
A T
C C
Count 1 mutations when 3 have occurred Count 0 mutation when 3 have occurred.
The problem of multiple substitutions

• When % divergence is low, observed distance (p) is a good estimator of

genetic distance (d)

• When % divergence is high, p underestimates d and a correction statistic is

require (i.e. a model of DNA substitution.)
DNA substitution model
Models of DNA sequence evolution are required to recover missing
information through correcting the problem of multiple substitutions.

A model includes:
• The frequencies of each base (A, T, C, G)

• The probability of substitution between bases (A to C, C to T, …)

• The probability of substitution along a sequence

(Different sites/regions evolve at different rates).

A good model = A good tree

DNA substitution model

Simplest 1. Base frequencies are equal and

all substitutions are equally likely
(Jukes-Cantor)
Estimating genetic distance

2. Base frequencies are equal but transitions and

transversions occur at different rates
(Kimura 2-parameter)

3. Unequal base frequencies and transitions and

transversions occur at different rates
(Hasegawa-Kishino-Yano)

4. Unequal base frequencies and all

Most complex substitution types occur at different rates
(General Time Reversible)
A a C b C
A
a e
a a a c

a f

T a G T d G
Jukes-Cantor General Time Reversible

Most simple Most complex

Among-site rate variation

Frequent among-site
rate variation

Little among-site
rate variation

Biological sequences (genes) have

conserved and non-conserved regions Using Gamma distribution to model among-
site rate variation
for optimization of functionality
• Large alpha: little variation
à Different rates of evolution. • Small alpha: high variation (often the case)
Different genetic distances
Tree building methods
Methods for inferring phylogenies

Tree-Building Methods

No explicit model Explicit model

of DNA evolution of DNA evolution

Application Pairwise Statistical

of the comparison approach
parsimony of
principle sequences

Maximum likelihood
Parsimony Distance And Bayesian
Maximum parsimony phylogenies
Relying on finding the tree with the smallest number of
character changes (substitutions)

Advantages

Intuitive explanation: ‘simplest’ evolutionary scenario

Limitations:

• No measure of uncertainty for the tree obtained

• Computationally intensive.
• Ignore different types of substitutions
• No explicit model of DNA substitution
• Evolution in real life is not necessarily parsimonious
Distance-based phylogenetic reconstruction

Relying on agglomerative clustering algorithms

(UPGMA, Neighbor-joining)

Rationale

1. Compute pairwise genetic distance (D)

2. Group closest sequences

3. Update D

4. Go back to (2) until all sequences are grouped

Distance-based phylogenetic reconstruction
With chosen
evolution model
Tree-building
Distance-based phylogenetic reconstruction

Advantages

• Simple
• Flexible (many distance and clustering algorithms)
• Fast and scalable (to large datasets)

Limitations

• Sensitive to distance/clustering choice

• Return one single tree, no measure of uncertainty for
the tree built
• Oversimplifies most evolutionary relationships
• Rarely publishable.
Statistical phylogenetic reconstruction
Approaches relying on ta model of sequence evolution:

• Maximum Likelihood: find tree and evolutionary rates with highest likelihood
• Bayesian: find tree and evolutionary rates according to posterior probability.

Rationale:

1. Start from a random or pre-defined tree (Neighbor-joining tree)

2. Compute initial likelihood/posterior

3. Permute branches, sample new parameters and compute new

likelihood/posterior

4. Accept or deny new tree based on likelihood/posterior improvement.

5. Go back to (3) until convergence.

Maximum likelihood phylogenetic
• Likelihood is a quantity proportional to the probability of observing an
outcome/data/event, X, given a hypothesis, H.
– P ( X | H ) or P ( X | p )
– P ( Data | model of evolution)

• ML evaluates the probability of phylogenetic hypotheses (evolution model +

Tree-building

unrooted tree) that gives rise to the observed data.

• Performs many iterations of the tree, searching for tree topology with
highest likelihood.

• Returns 1 tree with highest likelihood

Hill Climbing

• Imagine tree ‘space’ is a hill

• Better trees (measured by likelihood) are higher
• We can find the best tree using a robot with a
simple program:
• Accept uphill moves
J
• Reject downhill moves
û
ü
ü
ü

‘Better’ trees
Hill Climbing

#$@*!
Hill Climbing
• Local maxima are a problem for methods using hill
climbing algorithms to find the best tree
• One way to reduce the probability of being stuck in
a local maximum is to do repeat analyses from
different starting points
• I.e. beam in a number of robots to different starting
positions
Hill Climbing
• Local maxima are a problem for methods using
hill climbing algorithms to find the best tree

J
• One way to reduce the probability of being stuck
in a local maximum is to do repeat analyses from
different starting points
• I.e. beam in a number of robots to different
starting positions

L
Statistical phylogenetic reconstruction

Advantages

• very ﬂexible
• consistent with an explicit model of evolution
• statistically consistent (allows for model comparison)
Tree-building

• parameter estimation (evolutionary rate, transition rate)

• (Bayesian) 1000s of trees → provides measure of uncertainty
• (Bayesian) hypothesis testing, complex models, demography

Limitations
• computer-intensive, complicated statistics & methodology
• (ML) no measure of uncertainty for the single tree obtained
• (Bayesian) not ideal for ‘beginners’
Phylogenetic reconstruction - summary

Evolutionary
Method Data used Tree search
Model
Pairwise Simple Can be
Distance
distance algorithm complex

Permutation with
replacement

Characters
Taxa 2 5 9 2 7 7 2 1 6
A C G C C T T C A A
B G G C G T T G A G
C G G G G T T G A A
D C C C C T T C T G
E C A C C T T C T A
Bootstrapping
Inferred “true” tree
Taxon A : ATG-CGA-GTT-TAG-CAG
A
Taxon B : ATG-CGA-GCT-TAA-CTG B
Taxon C : ATA-CTA-GCT-TAG-CTG C
D
Taxon D : ATG-CTA-TCT-TAG-GTG Node support
for trees

A+B : 4/4 = 100%

A
a phylogeny?

B C+D : 3/4 = 75%

C A+B+C+D : 4/4 = 100%
Alignment s1
D
What’s support

Alignment s2
A
B 100
A
C
B
Statistical

D
100
Alignment s3 A C
B
C
D
75 D
A
Alignment s4 B
C
D

A tree without measures of statistical support for the nodes

(bootstraps or posteriors) is meaningless!
Questions?
Phylogenetic analysis of full-length genomes of 2019-nCoV and
representative viruses of the genus Betacoronavirus
Vibrio cholerae genomic epidemiology in Yemen
Case studies in
molecular phylogeny
Study design
Always frame a hypothesis/research question first.

Which organisms/disease/serotypes are you interested in?

Sampling:
• How do you access the samples? Ethical approval?

• Is it the right sample set to answer your questions?

• Is the sampling representative?

• Is the sample size sufficient?

• When/where/how are the sample collected?

• Any other data associated with the samples?

The study’s results/interpretation are only as good as its sampling allows

Investigation
The global spread of
fluoroquinolone resistant
Shigella sonnei
The genus Shigella

• Top 4 bacterial pathogens for pediatric diarrheal disease.

• An Enterobacteriaceae with 4 species: dysenteriae, boydii, flexneri and sonnei.

• No licensed vaccine. First line recommended treatment is antimicrobial

fluoroquinolone (FQ) or 3rd generation cephalosporin (WHO 2005).

• Both S. sonnei and S. flexneri are increasingly multi-drug resistant

Shigella sonnei evolution

S. sonnei arose in Europe in late 17th

century, with lineage III the most
widespread and drug resistant.

Feil E., 2012

S. sonnei genomic phylogeny

Holt et al., 2012, Nat. Gen.
Shigella sonnei in Vietnam
Resistant to
nalidixic acid

Resistant to
ceftriaxone

Holt et al., 2013

S. sonnei (Global 3) was introduced in 1980s into Vietnam,

and underwent a clonal expansion by fixations of a colicin
plasmid, gyrA mutations and an ESBL plasmid
How about Shigella in other Asian
countries?
Fluoroquinolone resistant S. sonnei in Bhutan
Background: Diarrheal surveillance in Thimphu, Bhutan (2011-2013) reported
the presence of FQ resistant S. sonnei (Ruekit et la., 2014)

Objective: Using genomics to understand population of Bhutanese S. sonnei

Samples: 71 S. sonnei (Bhutan) + global collection

Phylogeny of S. sonnei in Bhutan

S. sonnei in Bhutan belongs to the Central Asia expansion of lineage III.

Phylogeny of S. sonnei in Bhutan

rC 7G
rA L

CI 80I
Gy 83

8
ln y

_D
_S

_S
tr

3
un

rA
SE
A

T
P
2
Co

SX
pS

Pa
sp
Bhutan
Pakistan
Sri Lanka
Cambodia
Thailand
South Korea
Morocco
Egypt
Senegal
Madagascar
France
Brazil

0.014
substitutions/site
Chung The et al., 2015, MGen

• FQ resistance is conferred by triple mutations in gyrA and parC.

A global problem?

FQ resistant S. sonnei around the globe share the same PFGE pattern.

De Lappe et al., 2015 - Ireland

Ruekit et al., 2014 - Bhutan

Gaudreau et al., 2011 - Canada

Nandy et al., 2011 - India

à Hypothesized that global FQ resistant S. sonnei is clonal

Global emergence of FQ resistant S. sonnei

Aims: To unravel the nature of the recent global surge in FQ resistant

S. sonnei, through the use of genomics and molecular microbiology

Sample: Global representative collection of 70 FQ resistant S. sonnei

Ciprofloxacin resistant isolates
Country Patient group Region of recent travel history (N)
(N)
Bhutan 12 Hospitalised children <5 years old NA

Vietnam 11 Hospitalised children < 5 years old NA

Thailand 1 Hospitalised children <5 years old NA

Cambodia 1 Hospitalised children <5 years old NA

Primarily patients with recent travel India (9), Germany (1), Morocco
Ireland 16
history (1), No travel (5), Unknown (4)

India (15), Cambodia (3), Thailand

Australia 19 Patients with recent travel history
(1), Southeast Asia (1).

USA 10 Unknown Unknown

Evolution FQ resistant S. sonnei

• All FQR S. sonnei

belong to the
Central Asia III clade

• FQR due to triple

mutations (gyrA,
parC)
à A global clonal
emergence of FQR
S. sonnei

Chung The et al., 2016, PLoS Med.

Global collection: 395 CenAsiaIII S. sonnei
Global phylogeny of FQ resistant S. sonnei

Chung The et al., 2019, Nat Comms.

• FQR S. sonnei arose from early 2007,
most likely originating from South Asia.

• Evidence of clonal expansion and

establishment in Southeast Asia and
likely Europe.
Your turn now
Ancient tuberculosis in North
America
Tuberculosis in Peruvian ancient skeletons
Tuberculosis in America today is
dominated by the European-derived
Mycobacterium tuberculosis lineages

à Tuberculosis was introduced to

America by European settlers.

How about pre-Columbian TB?

Three Peruvian skeletons (1028 –

1280 AD, with signs of active
tuberculosis, preserved sufficient
amount of tuberculosis DNA

à Bacterial DNA was sequenced and

compared to modern isolates.
What are these ancient TB?
Singapore Zika study
Dengue and Zika virus

Flaviviridae family of viruses

Timeline of Zika pandemic

Petersen et al 2016, NEJM

• As of 6 April 2016, Zika virus transmission was documented in a total of 62

countries and territories.

• Caused ~711,381 infections with 18 deaths.

• There is no specific treatment or vaccine for Zika virus

An introduction?
Molecular epidemiology
of global ZIKV
• Retrospective screening from
2014 to 2016: No confirmed
Zika cases until Aug 2016.

• The Singapore outbreak is not

linked to the South America
epidemic.

• Clinical and mosquito ZIKV

clustered in clade A

• The Singapore outbreak is

caused by ZIKV clade A, which
existed in May before the initial
detection in August 2016.
Salmonella monophasic
Typhimurium ST34 circulation
in Vietnam
Salmonella enterica subsp. enterica

• Enterobacteriaceae. Extremely diverse (>2,500 serovars)

• Ancient human and animal pathogens, including host-

generalist and human-adapted.

• Major burden of disease:

Ø Typhoid (S. Typhi, S. Paratyphi)
Ø Blood stream infections (S. Typhimurium, S.
Enteritidis, etc.)
Ø Gastroenteritis (multiple serovars)

• Estimated ~94 million gastroenteritis cases caused by

nontyphoidal Salmonella (NTS)
Salmonella outbreak

• Primarily foodborne, with various sources

• Food production, distribution and consumption at global level

à Potential for multi-country outbreaks.

• Sicken >450 patients in 17 countries

• Recall of ~3,000 tonnes of chocolate worldwide

• Outbreak caused by two strains of Salmonella monophasic Typhimurium ST34

• Linked to contamination in a dairy butter tank in Belgium.
Clonal expansions
in Vietnam

clone VBSI VN1 VN2 VN3 VN4

tMRCA 2003.16
AMR

plasmid
Clonal expansions
in Vietnam

• all clonal expansions

emerged in Vietnam
from 2003 - 2009

• SEA isolates carry

more clinically
important AMR genes
(blaCTX-M-55, qnrS1,
mcr3.1, mphA).

clone VBSI VN1 VN2 VN3 VN4

tMRCA 2003.16 2006.07 2007.39 2007.73 2008.64
AMR qnrS1 blaCTX-M-55, qnrS1, qnrS1, mphA qnrS1, mcr3.1,
mcr3.1 mphA
plasmid IncHI2 IncA/C IncHI2 IncA/C

The SSD Solution Composition and SSD Chemical Formula
82% (11)
The SSD Solution Composition and SSD Chemical Formula
2 pages
App Rating Prediction Project
100% (5)
App Rating Prediction Project
14 pages
ICTSAD609 Project Portfolio.v1.0
100% (2)
ICTSAD609 Project Portfolio.v1.0
21 pages
Phylogenetic Analysis1
No ratings yet
Phylogenetic Analysis1
62 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Donna Shoupe: Editor
100% (1)
Donna Shoupe: Editor
1,112 pages
Phylogenetics
No ratings yet
Phylogenetics
49 pages
Phylogenetic Analisys Course
No ratings yet
Phylogenetic Analisys Course
140 pages
Whitehack Third Edition (Tablet) (2021) PDF
50% (2)
Whitehack Third Edition (Tablet) (2021) PDF
160 pages
Slides Week03
No ratings yet
Slides Week03
49 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
5 - Duqm Refinery
100% (3)
5 - Duqm Refinery
17 pages
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
No ratings yet
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
20 pages
Ceng465 Week8
No ratings yet
Ceng465 Week8
40 pages
BIOL 401 - W22 - Lecture - Phylogenetic Inference
No ratings yet
BIOL 401 - W22 - Lecture - Phylogenetic Inference
39 pages
Ch.4 Estimating Evolutionary Trees - 2019.09.23
No ratings yet
Ch.4 Estimating Evolutionary Trees - 2019.09.23
51 pages
4 Phylogenetics
No ratings yet
4 Phylogenetics
43 pages
BTC 506 Phylogenetic Analysis
No ratings yet
BTC 506 Phylogenetic Analysis
58 pages
Phylogenetics Basics
No ratings yet
Phylogenetics Basics
28 pages
Final 2
No ratings yet
Final 2
85 pages
BDMH Phylogenetic
No ratings yet
BDMH Phylogenetic
32 pages
4 - Phylogenetics
No ratings yet
4 - Phylogenetics
30 pages
Phyl o Genetics
No ratings yet
Phyl o Genetics
58 pages
Introduction - Arbres - Phylogénique
No ratings yet
Introduction - Arbres - Phylogénique
36 pages
BIL-Note 2 Last
No ratings yet
BIL-Note 2 Last
44 pages
Design of SP
No ratings yet
Design of SP
105 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
11 pages
Introduction To Phylogeny
No ratings yet
Introduction To Phylogeny
57 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
31 pages
Disclaimer
No ratings yet
Disclaimer
36 pages
Lecture 11 (Phylogenetic)
No ratings yet
Lecture 11 (Phylogenetic)
24 pages
Lecture 9 - Phylogenetic Tree
No ratings yet
Lecture 9 - Phylogenetic Tree
16 pages
Phylogenetics 1 and 2
No ratings yet
Phylogenetics 1 and 2
30 pages
Intro To Phyl o Genetics
No ratings yet
Intro To Phyl o Genetics
44 pages
Phylogenic Tree
No ratings yet
Phylogenic Tree
42 pages
Computational Methods in Phylogenetic Analysis: Tutorial at CSB 2004 Tandy Warnow
No ratings yet
Computational Methods in Phylogenetic Analysis: Tutorial at CSB 2004 Tandy Warnow
89 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
Phylogeny Lars Arvestad
No ratings yet
Phylogeny Lars Arvestad
31 pages
Jan 2023 Maths P2
No ratings yet
Jan 2023 Maths P2
25 pages
VMQ PDF
No ratings yet
VMQ PDF
78 pages
Phylogenetic Analysis
100% (1)
Phylogenetic Analysis
27 pages
L13 PhylogenyTrees
No ratings yet
L13 PhylogenyTrees
19 pages
S1 French Test Practice
No ratings yet
S1 French Test Practice
16 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Molecular Phylogeny - Introduction
No ratings yet
Molecular Phylogeny - Introduction
12 pages
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
No ratings yet
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
27 pages
Sample Size Estimation in Clinical Research
No ratings yet
Sample Size Estimation in Clinical Research
9 pages
Phylogenetic Analysis Extra
No ratings yet
Phylogenetic Analysis Extra
13 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
No ratings yet
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
39 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
12 pages
PHYLOGENY
No ratings yet
PHYLOGENY
17 pages
Molecular Phylogenetics
No ratings yet
Molecular Phylogenetics
29 pages
Disclaimer
No ratings yet
Disclaimer
9 pages
Molecular Evolution and Phylogenetics Session.2
No ratings yet
Molecular Evolution and Phylogenetics Session.2
21 pages
Amphorae Bib
No ratings yet
Amphorae Bib
15 pages
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
No ratings yet
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
40 pages
Phylogenetic Tree Construction
No ratings yet
Phylogenetic Tree Construction
6 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
25 pages
Phylogenetic Analyses2
No ratings yet
Phylogenetic Analyses2
16 pages
Class16-Introduction To Molecular Phylogenetics
No ratings yet
Class16-Introduction To Molecular Phylogenetics
14 pages
Phylogeny
No ratings yet
Phylogeny
43 pages
Phylogenetic Tree Bioinformatics
No ratings yet
Phylogenetic Tree Bioinformatics
4 pages
4rth Phylogeny by MAtti Ullah KHanNiazi
No ratings yet
4rth Phylogeny by MAtti Ullah KHanNiazi
9 pages
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
No ratings yet
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
35 pages
National Agency For Food & Drug Administration & Control (NAFDAC)
No ratings yet
National Agency For Food & Drug Administration & Control (NAFDAC)
4 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Lab 3
No ratings yet
Lab 3
6 pages
Engineering Materials Lab Manual
100% (1)
Engineering Materials Lab Manual
14 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
6 pages
New Samaritan Amuletic Rings and Pendant
No ratings yet
New Samaritan Amuletic Rings and Pendant
38 pages
How To Cost A Bamboo Structure
No ratings yet
How To Cost A Bamboo Structure
6 pages
Systems Thinking For Health Systems Strengthening
No ratings yet
Systems Thinking For Health Systems Strengthening
2 pages
Phylogenetics
100% (1)
Phylogenetics
51 pages
Phylogeny Analysis
No ratings yet
Phylogeny Analysis
49 pages
Action Research
No ratings yet
Action Research
5 pages
Proposed Development
No ratings yet
Proposed Development
14 pages
TPSM Introduction
No ratings yet
TPSM Introduction
42 pages
YesBank Macquarie
No ratings yet
YesBank Macquarie
5 pages
Quiz
No ratings yet
Quiz
3 pages
Aula 11 - Maciel
No ratings yet
Aula 11 - Maciel
52 pages
Movement Disorders
No ratings yet
Movement Disorders
4 pages
Blockchain Script 2019
No ratings yet
Blockchain Script 2019
3 pages
Company Profile: Reliance Industries Limited (Ril) : Objective
No ratings yet
Company Profile: Reliance Industries Limited (Ril) : Objective
7 pages
WWW Learncbse in Ncert Solutions For Class 8 Science Friction
No ratings yet
WWW Learncbse in Ncert Solutions For Class 8 Science Friction
20 pages
Effect of Using Sodium Polyacrylate From Used Diapers
No ratings yet
Effect of Using Sodium Polyacrylate From Used Diapers
13 pages
DNV - PC 7202 - 09-2026
No ratings yet
DNV - PC 7202 - 09-2026
3 pages
Bce TT May August 2024 v5-1-5
No ratings yet
Bce TT May August 2024 v5-1-5
5 pages

Molecular Phylogeny

Uploaded by

Molecular Phylogeny

Uploaded by

Molecular phylogenetic

Introduction and application

University of Science and Technology of Hanoi

Mutation: Genetic variability that evolution will act upon.

Natural selection: Differential survival and reproduction of individuals due to

A phylogeny is a diagram describing the ancestral relationships of organisms.

Phylogenetics looks for homology as evidence of common ancestry

Homology: similarity among organisms due to

• Provide an important evolutionary framework to understand

Genomics in infectious diseases:

• For microbes, we can’t construct trees based on known relationships or morphology.

• Constructing tree using microbe’s sequence data to infer evolutionary relationships

Mutation/Polymorphism: mutant allele that arises in a population

Substitution: The mutant allele that is fixed in the population (i.e.

Not phylogenetic meaningful

Similarity: being similar in a measurable metric (substitution differences)

Relatedness: genetically/evolutionarily connectedness (historical fact)

Two taxa can be most similar without being most closely-related.

This axis usually indicates genetic distance (occasionally time)

Tree 1 Tree 2 Tree 3

Unrooted tree Root

• Finding a correct root is important and also difficult.

• Use taxa (outgroup) that are Monophyletic

• Require some prior knowledge

d (A,D) = 10 + 3 + 5 = 18 • Default by many software

A very short introduction

2. Gather appropriate sequences

4. Run preliminary trees

5. Determine your evolution model(s)

observed ACAGAT ACAGTT ATAGAT ATAAAT

• Purpose: pin-point homologous nucleotides

• Could be easy or difficult

Taxon 1 GCGGCCCA TCAGGTAGTT GGTGG

DNA sequence alignment

• When % divergence is low, observed distance (p) is a good estimator of

• When % divergence is high, p underestimates d and a correction statistic is

• The probability of substitution between bases (A to C, C to T, …)

• The probability of substitution along a sequence

A good model = A good tree

Simplest 1. Base frequencies are equal and

2. Base frequencies are equal but transitions and

3. Unequal base frequencies and transitions and

4. Unequal base frequencies and all

Most simple Most complex

Biological sequences (genes) have

No explicit model Explicit model

Application Pairwise Statistical

Intuitive explanation: ‘simplest’ evolutionary scenario

• No measure of uncertainty for the tree obtained

Relying on agglomerative clustering algorithms

1. Compute pairwise genetic distance (D)

2. Group closest sequences

4. Go back to (2) until all sequences are grouped

• Sensitive to distance/clustering choice

1. Start from a random or pre-defined tree (Neighbor-joining tree)

2. Compute initial likelihood/posterior

3. Permute branches, sample new parameters and compute new

4. Accept or deny new tree based on likelihood/posterior improvement.

5. Go back to (3) until convergence.

• ML evaluates the probability of phylogenetic hypotheses (evolution model +

unrooted tree) that gives rise to the observed data.

• Returns 1 tree with highest likelihood

• Imagine tree ‘space’ is a hill

• parameter estimation (evolutionary rate, transition rate)

Parsimony All sites Hill climbing Simple

A+B : 4/4 = 100%

B C+D : 3/4 = 75%

A tree without measures of statistical support for the nodes

Which organisms/disease/serotypes are you interested in?

• Is it the right sample set to answer your questions?

• Is the sampling representative?

• Is the sample size sufficient?

• When/where/how are the sample collected?

• Any other data associated with the samples?

The study’s results/interpretation are only as good as its sampling allows

• Top 4 bacterial pathogens for pediatric diarrheal disease.