MBE Advance Access published December 3, 2015
Page: 1 1–9
Wasabi: an integrated platform for evolutionary
sequence analysis and data visualisation
Andres Veidenberg, Alan Medlar and Ari Löytynoja⇤
Institute of Biotechnology, University of Helsinki, P.O.Box 65, Helsinki, Finland
⇤ Corresponding author: E-mail:
[email protected].
Abstract
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualises
sequence data together with a phylogenetic tree within a modern, user-friendly interface: the interface
hides extraneous options, supports context sensitive menus, drag-and-drop editing and displays additional
information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment
supports reproducibility by automatically storing intermediate analysis steps and includes built–in
functions to share data between users and publish analysis results. For computational analysis, Wasabi
supports PRANK and PAGAN for phylogeny–aware alignment and alignment extension, and it can be
easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access
remote data via URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl.
To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative
genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: these case
studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a
web browser and does not require any installation. One can start using it at http://wasabiapp.org. All
source code is licensed under the GPLv3.
Key words: Evolutionary sequence analysis, reproducible research, data visualisation
Introduction was central in the first progressive alignment
In evolutionary sequence analysis, phylogenetic algorithm (Hogeweg and Hesper, 1984). While
trees and sequence alignments are intrinsically popular alignment programs use a tree to
linked: sequence alignments define character guide the alignment procedure, they consider
homologies upon which phylogenetic inference the tree a nuisance parameter. The phylogeny-
is based and trees are integral in correcting aware algorithm (Löytynoja and Goldman,
for hierarchical dependencies among sequence 2005, 2008; Löytynoja et al., 2012) makes the
data. In multiple sequence alignment, this alignment/phylogeny dependence explicit again:
connection was noticed early (Sanko↵, 1975) and reconstructed ancestral sequences are aligned
according to a guide phylogeny and the gap
patterns created closely reflect the tree topology.
© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology
and Evolution. All rights reserved. For permissions, please e-mail: [email protected]
Page: 2 1–9
The phylogeny-aware approach has been shown Table 1. Minimum version of web browsers supporting
basic and full set of Wasabi features.
to outperform other alignment methods in IE Firefox Chrome iOS Android
Basic 9.0 3.0 4.0 3.2* 41
comparative sequence analysis (Fletcher and Full 10.0 10.0 12.0 3.2* 41
NOTE.—Desktop browsers: Internet Explorer (IE), Firefox, Chrome.
Yang, 2010; Jordan and Goldman, 2012). Mobile browsers: Safari on iOS, Chrome on Android. * Refers to iOS
version.
In this article we present Wasabi: a web-
based analysis environment that reflects the Wasabi allows users to save all intermediate
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
inter-dependence between the phylogeny and analysis steps to aid reproducibility and allows for
the alignment in its design and displays each sharing of project data and analysis steps between
sequence next to the corresponding node in collaborators.
the phylogenetic tree. The joint visualisation of RESULTS
tree and alignment allows for the assessment of Overview
phylogenies next to the sequence data they were Wasabi is a web-application for evolutionary
inferred from or for the evaluation of alignments sequence analysis that is compatible with all
in the context of the underlying phylogeny. web browsers supporting the HTML5 standard
By additionally integrating ortholog and (see Table 1). Wasabi has been designed with
inferred ancestral sequences, Wasabi highlights the goal of providing a comprehensive set of
information that cannot easily be shown with analysis tools enabling users to analyse their
either the tree or the alignment in isolation. own data, integrate it with external data sets, if
Wasabi is inspired by our earlier tool, necessary, and visualise the results. Data analysis
webPRANK (Löytynoja and Goldman, 2010), a is complicated because input data needs to be in
web interface to the PRANK alignment program. specific formats, programs have a large number
webPRANK was limited to the analysis of only of options and poor file management can lead to
small data sets and to a single analysis method. careless mistakes. Wasabi aims to incorporate a
Wasabi, however, was developed using modern wide set of tools, automatically converts between
web technologies to scale to much larger data file formats and provides a hierarchy of program
sets, while maintaining responsiveness. Wasabi options to highlight parameters of interest for
additionally contains features for integrating di↵erent scenarios.
heterogeneous data sets, file management and Visualisation
an intuitive, modern web interface (see Figure 1 Many evolutionary analysis methods output
for overview) for navigating and selecting additional information that is not generally
appropriate analysis options. Additionally, due visualised. For example, along with aligned
to the increasing complexity of data analysis, sequences, the programs PRANK and PAGAN
2
Page: 3 1–9
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
FIG. 1. Overview of Wasabi interface. The toolbar contains menus and buttons for data management, analysis tools, zoom
levels, undo/redo and server notifications. Pop-up menus are associated with tree nodes and reveal ancestral sequences which
are displayed in lighter colours next to the node. A windowing system provides access to other functionality (e.g. sequence
translation and analysis library, shown above).
can output inferred ancestral sequences and sequences and duplication event nodes from EPO
posterior probabilities of di↵erent evolutionary (Paten et al., 2008) and GeneTree (Vilella et al.,
processes (Löytynoja and Goldman, 2008). 2009) alignments imported from Ensembl.
Whereas a majority of sequence alignment Data import/integration/sharing
viewers omit this information (see Table 2),
Wasabi supports a large number of input file
Wasabi either displays it by default (e.g.
formats for importing sequence and phylogenetic
distinguishing between insertion and deletion
tree data, including: FASTA, ClustalW, Phylip,
gaps) or else provides access to it via context
Newick, Extended Newick (Cardona et al., 2008),
sensitive menus. Ancestral sequences are not
NEXUS (Maddison et al., 1997), HSAML and
shown by default, but can be shown or hidden
PhyloXML (Han and Zmasek, 2009). User and
for individual nodes or entire subtrees. Using
analysis data is stored in a structured, file-based
this function, the user can locate mutation
database. By default, Wasabi stores intermediate
events at specific tree branches and gain
results from an analysis in the form of a branching
additional insights e.g. about mutation events
tree to track parallel pipelines and associated
after a distant branching event, or contrast
metadata allowing for reproducibility.
the mutation processes in di↵erent evolutionary
Wasabi currently supports importing external
lineages. Wasabi incorporates similar metadata
data from Ensembl (homologous gene sets and
from external data sets including ancestral
alignment slices) using the Ensembl REST API
3
Page: 4 1–9
·
(Flicek et al., 2013): a dedicated import menu Table 2. Main Wasabi features in alternative programs.
Was. Clust. JalV. Mega Seav.
supports Ensembl IDs and allows querying data
Phylogeny integration X – – – –
Ancestral sequences X – – X –
sets e.g. with species and gene names. Wasabi can
Analysis history X – – X –
Sharing URL X – – – –
additionally read input files from URLs.
User accounts X – – – –
Results can be shared between users via URLs Web access X – X – –
Ensembl import X – X – X
allowing access to single data sets or even entire Plugin extension X – – – X
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
NOTE.—Programs versions: ClustalX 2.1, JalView 2.8, Mega 6.06,
analysis trees. The recipient of the URL can Seaview 4.
display the linked data and continue working on a
copy inside Wasabi before returning any modified Table 3. Time to open a FASTA file with random protein
sequence data (in seconds).
data to the sender, e.g. to be reintegrated into the Data size Wasabi ClustalX JalView Mega Seaview
1K x 1K 2.0 9.4 0.8 5.3 0.2
original analysis (see SFig. 1). 1K x 10K 2.8 87.5 4.6 5.2 2.9
1K x 100K 9.6 TO OM 6.4 22.8
Feature comparison NOTE.—Input data size as # sequences x # sites, in thousands.
TO=timeout. OM=out of memory. Program versions as in Table 2.
All programs tested with default settings on Mac OSX 10.9 running
Wasabi’s core features are largely lacking from on a Macbook Pro with a 2.4GHz CPU, 4GB RAM and an SSD
harddisk. Wasabi ver. 140615 in Chrome ver. 39. Time limit: 15
minutes.
existing sequence alignment viewers. In Table 2
we compare Wasabi to JalView 2.8 (Waterhouse
size (methodology described in Supplementary
et al., 2009), Mega 6.06 (Tamura et al., 2013),
material). We note that not all programs tested
Seaview 4.0 (Gouy et al., 2010) and ClustalX 2.1
perform exactly the same functions upon loading
(Larkin et al., 2007).
the data, for example, ClustalX calculates
Existing programs either display the phylogeny
conservation scores, JalView calculates consensus
in a separate window with limited editing and
scores, whereas other methods (Wasabi, Seaview)
annotation capabilities, or else omit the tree
do neither. Of all the applications tested, Wasabi
altogether (ClustalX). Other features, such as
and Mega have the most predictable behaviour,
data/analysis management, Ensembl integration
though we note that in the 1-10K sequence range,
and plugin extension, are either missing or have
Wasabi is twice as fast. While it is unlikely
only limited support.
users would want to visualise a 100K sequence
Performance alignment, it demonstrates both the scalability of
Wasabi’s design and that there are few limitations
Although Wasabi is a web application,
with being web-based.
its performance is comparable to a “native”
application. Table 3 shows the time taken by Example of Wasabi workflow
several sequence alignment viewers to load To demonstrate the capabilities of Wasabi,
and display protein alignments of increasing we performed a common analysis task: the
4
Page: 5 1–9
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
FIG. 2. The reanalysis of the EGLN1 gene data set consists of seven steps. (1) Import of Ensembl GeneTrees data set.
(2) Addition of tiger, lion and snow leopard sequences with PAGAN. (3) Removal of paralogs and incomplete or non-
mammalian sequences. (4) Correction of big cats’ placement (error caused by high sequence similarity and missing data
in cat). (5) Removal of alignment gaps. (6) Translation to codons. (7) visualisation of the resulting alignment and the
lysine-to-methionine substitution in snow leopard (circled).
comparison of a gene sequence from a species using PAGAN, removed paralogs and incomplete
of interest to its homologs. We reanalysed the sequences and visualised the results. This analysis
EGLN1 gene data set highlighted in the recent consists of a handful of easy steps, all performed
tiger genome study (Cho et al., 2013). EGLN1 within Wasabi’s graphical environment (Figure 2).
is involved in the cellular response to hypoxia. Our analysis confirms the lysine-to-methionine
Cho et al. identified a single amino acid change substitution in the snow leopard lineage. However,
that might be related to high-altitude adaptation our extended data set shows that the sequence
in snow leopards. We reassembled the EGLN1 position is not strictly conserved in other lineages
sequence for the big cats (performed on the and, interestingly, an identical, independent
command line, see Supplementary Material) and lysine-to-methionine substitution is seen in
then used Wasabi to download the corresponding alpaca, another species adapted to high altitudes.
Ensembl GeneTrees data set, extended the The resulting data sets, including intermediate
Ensembl alignment with the big cat sequences checkpoints, can be explored with Wasabi
5
Page: 6 1–9
at http://wasabiapp.org:8000?id=usecases. Other any column-wise data, e.g. to indicate sequence
examples are provided in Supplementary material. conservation or alignment uncertainty or to
show annotation tracks. Related to this, we are
DISCUSSION developing a plugin system that will ease the
Wasabi provides an intuitive interface and integration of other command line tools. Plugin
scalable platform for evolutionary sequence writers will not need to do any programming, but
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
analysis via the web. Despite being under active instead specify a JSON configuration file detailing
development, Wasabi is being used everyday by how command line options are exposed through
researchers to perform analyses and share results Wasabi (see online documentation for a minimal
with collaborators. plugin example http://wasabiapp.org/plugins).
Wasabi is a centralized service. Some benefits Finally, we are developing a stripped-down
from this, such as transparent sharing of results version of Wasabi for data visualisation only.
between user accounts, are obvious. Others are A version of this will be integrated in the
more indirect: new features and bug fixes can Ensembl browser for the display of Compara
be deployed immediately to the web, benefiting alignment data. The tool will be provided as a
many users at once. Wasabi, however, is open separate library for easy integration with other
source, allowing for local (communal lab server) web resources.
and personal (laptop/desktop) installations. For METHODS
example, a research group can host their own
Wasabi integrates a phylogenetic tree and
private Wasabi installation to keep their data in-
sequence alignment viewer together with a toolbar
house, while still being able to share analyses with
and pop-up dialog windows to access information
local colleagues.
and analysis options. It does so by leveraging
Future plans for Wasabi focus on looking
many of the features introduced in HTML5 and
beyond sequence alignment. The integration
therefore requires a modern web browser. For
of downstream analysis tools such as SLR
older web browsers, Wasabi is designed to degrade
and PAML (Massingham and Goldman, 2005;
gracefully using JavaScript or other means.
Yang, 2007) within Wasabi would allow an
investigator to go from raw data to alignment Architecture
and inference of selection without needing to Wasabi uses the Model-View-View-Model
leave the Wasabi environment. The additional (MVVM) design pattern implemented by the
visualisation elements required for the display Knockout JavaScript library to make interface
of selection inference results could be used for elements and their contents dynamically reflect
6
Page: 7 1–9
program state. This is used throughout the 2). This process continues until there is no more
Wasabi client interface. sequence data or the browser limit of total canvas
Phylogenetic trees are drawn using scalable size has been reached, in which case, non-visible
vector graphics (SVG) to support multiple zoom tiles are recycled.
levels, aspect ratios and screen resolutions. Backend
Rendering utilises a modified version of Web applications are restricted from writing
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
jsPhyloSVG (Smits and Ouverney, 2010). files to the local hard disk and from launching
All SVG elements are individually addressable for external applications. Wasabi relies on a separate
associating mouse interactions and dynamically backend server to manage user accounts, data
applying CSS styling, e.g. when a user edits the and analysis results, perform analyses and provide
tree by performing a drag-and-drop operation, task progress notifications to the client. The
eligible drop targets are highlighted as bolded Wasabi server backend is written in Python
tree branches. and supports Windows, Mac OS and Linux. By
Sequences are always positioned next to running the Wasabi backend server on an internet-
corresponding tree elements; moving and deleting addressable host, it becomes a centrally managed
tree nodes results in the same actions being web application and forms the basis for Wasabi’s
performed on the sequence. Sequence alignments collaborative sharing features.
are rendered using an HTML5 canvas element for Supplementary Material
performance reasons. Drawing the visible portion
Supplementary material is available at
of the alignment is optimised by pre-rendering
Molecular Biology and Evolution online
all glyphs using the current typeface and colour
(http://www.mbe.oxfordjournals.org/).
scheme to individual canvas elements, which are
Acknowledgments
then copied to the main canvas as needed.
We acknowledge the Ensembl team for considering
our feedback in the development of the Ensembl
Tile-based rendering
REST API.
Wasabi was designed to scale to large data sets
Funding: This work was supported by the
and breaks down the visualisation of sequence
Biocenter Finland, Biocentrum Helsinki and
data into smaller manageable pieces, called tiles.
Marie Curie Career Integration Grants.
Each tile is a fixed-size canvas element four times
References
the size of the viewport. In response to the user
Cardona, G., Rossell, F., and Valiente, G. 2008. Extended
scrolling the viewport beyond the rendered area, newick: it is time for a standard representation of
a new tile is added and filled with data (see SFig. phylogenetic networks. BMC Bioinformatics, 9(1): 532.
7
Page: 8 1–9
Cho, Y. S., Hu, L., Hou, H., Lee, H., Xu, J., Kwon, S., Hogeweg, P. and Hesper, B. 1984. The alignment of sets
Oh, S., Kim, H.-M., Jho, S., Kim, S., Shin, Y.-A., of sequences and the construction of phyletic trees: an
Kim, B. C., Kim, H., Kim, C.-U., Luo, S.-J., Johnson, integrated method. J Mol Evol , 20(2): 175–186.
W. E., Koepfli, K.-P., Schmidt-Küntzel, A., Turner, Jordan, G. and Goldman, N. 2012. The e↵ects of alignment
J. a., Marker, L., Harper, C., Miller, S. M., Jacobs, W., error and alignment filtering on the sitewise detection of
Bertola, L. D., Kim, T. H., Lee, S., Zhou, Q., Jung, H.- positive selection. Mol. Biol. Evol., 29(4): 1125–1139.
J., Xu, X., Gadhvi, P., Xu, P., Xiong, Y., Luo, Y., Pan, Larkin, M. a., Blackshields, G., Brown, N. P., Chenna, R.,
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
S., Gou, C., Chu, X., Zhang, J., Liu, S., He, J., Chen, Mcgettigan, P. a., McWilliam, H., Valentin, F., Wallace,
Y., Yang, L., Yang, Y., He, J., Liu, S., Wang, J., Kim, I. M., Wilm, a., Lopez, R., Thompson, J. D., Gibson,
C. H., Kwak, H., Kim, J.-S., Hwang, S., Ko, J., Kim, C.- T. J., and Higgins, D. G. 2007. Clustal W and Clustal
B., Kim, S., Bayarlkhagva, D., Paek, W. K., Kim, S.-J., X version 2.0. Bioinformatics, 23(21): 2947–2948.
O’Brien, S. J., Wang, J., and Bhak, J. 2013. The tiger Löytynoja, A. and Goldman, N. 2005. An algorithm
genome and comparative analysis with lion and snow for progressive multiple alignment of sequences with
leopard genomes. Nat. Commun., 4(May): 2433. insertions. Proceedings of the National Academy of
Fletcher, W. and Yang, Z. 2010. The e↵ect of insertions, Sciences of the United States of America, 102(30):
deletions, and alignment errors on the branch-site test of 10557–10562.
positive selection. Mol. Biol. Evol., 27(10): 2257–2267. Löytynoja, A. and Goldman, N. 2008. A model of evolution
Flicek, P., Amode, M. R., Barrell, D., Beal, K., Billis, and structure for multiple sequence alignment. Philos
K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, Trans R Soc Lond B Biol Sci, 363(1512): 3913–3919.
G., Fitzgerald, S., Gil, L., Girón, C. G., Gordon, L., Löytynoja, A. and Goldman, N. 2008. Phylogeny-aware gap
Hourlier, T., Hunt, S., Johnson, N., Juettemann, T., placement prevents errors in sequence alignment and
Kähäri, A. K., Keenan, S., Kulesha, E., Martin, F. J., evolutionary analysis. Science, 320(5883): 1632–1635.
Maurel, T., McLaren, W. M., Murphy, D. N., Nag, R., Löytynoja, A. and Goldman, N. 2010. webPRANK:
Overduin, B., Pignatelli, M., Pritchard, B., Pritchard, a phylogeny-aware multiple sequence aligner with
E., Riat, H. S., Ruffier, M., Sheppard, D., Taylor, K., interactive alignment browser. BMC Bioinformatics,
Thormann, A., Trevanion, S. J., Vullo, A., Wilder, 11(1): 579.
S. P., Wilson, M., Zadissa, A., Aken, B. L., Birney, E., Löytynoja, A., Vilella, A. J., and Goldman, N.
Cunningham, F., Harrow, J., Herrero, J., Hubbard, T. 2012. Accurate extension of multiple sequence
J. P., Kinsella, R., Mu↵ato, M., Parker, A., Spudich, alignments using a phylogeny-aware graph algorithm.
G., Yates, A., Zerbino, D. R., and Searle, S. M. J. 2013. Bioinformatics, 28(13): 1684–1691.
Ensembl 2014. Nucleic Acids Res., 42(Database issue): Maddison, D. R., Swo↵ord, D. L., and Maddison, W. P.
D749–55. 1997. Nexus: An extensible file format for systematic
Gouy, M., Guindon, S., and Gascuel, O. 2010. SeaView information. Systematic Biology, 46(4): 590–621.
version 4: A multiplatform graphical user interface for Massingham, T. and Goldman, N. 2005. Detecting
sequence alignment and phylogenetic tree building. Mol. amino acid sites under positive selection and purifying
Biol. Evol., 27(2): 221–224. selection. Genetics, 169(3): 1753–1762.
Han, M. V. and Zmasek, C. M. 2009. phyloXML: XML for Paten, B., Herrero, J., Fitzgerald, S., Beal, K., Flicek,
evolutionary biology and comparative genomics. BMC P., Holmes, I., and Birney, E. 2008. Genome-wide
Bioinformatics, 10(1): 356. nucleotide-level mammalian ancestor reconstruction.
8
Page: 9 1–9
Genome Res., 18: 1829–1843.
Sanko↵, D. 1975. Minimal mutation trees of sequences.
SIAM J. Appl. Math., 28(1): 35–42.
Smits, S. a. and Ouverney, C. C. 2010. jsPhyloSVG: A
javascript library for visualizing interactive and vector-
based phylogenetic trees on the web. PLoS One, 5(8):
6–9.
Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and
Kumar, S. 2013. MEGA6: Molecular evolutionary
genetics analysis version 6.0. Mol. Biol. Evol., 30(12):
2725–2729.
Vilella, A. J., Severin, J., Ureta-Vidal, A., Heng, L.,
Durbin, R., and Birney, E. 2009. EnsemblCompara
GeneTrees: Complete, duplication-aware phylogenetic
trees in vertebrates. Genome Res., 19(2): 327–335.
Waterhouse, A. M., Procter, J. B., Martin, D. M. A.,
Clamp, M., and Barton, G. J. 2009. Jalview version
2–a multiple sequence alignment editor and analysis
workbench. Bioinformatics, 25(9): 1189–1191.
Yang, Z. 2007. PAML 4: phylogenetic analysis by maximum
likelihood. Mol. Biol. Evol., 24(8): 1586–1591.