0% found this document useful (0 votes)
103 views9 pages

Wasabi: An Integrated Platform For Evolutionary Sequence Analysis and Data Visualisation

77777777777777777u888880
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views9 pages

Wasabi: An Integrated Platform For Evolutionary Sequence Analysis and Data Visualisation

77777777777777777u888880
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MBE Advance Access published December 3, 2015

Page: 1 1–9

Wasabi: an integrated platform for evolutionary


sequence analysis and data visualisation
Andres Veidenberg, Alan Medlar and Ari Löytynoja⇤
Institute of Biotechnology, University of Helsinki, P.O.Box 65, Helsinki, Finland
⇤ Corresponding author: E-mail: [email protected].

Abstract

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualises
sequence data together with a phylogenetic tree within a modern, user-friendly interface: the interface
hides extraneous options, supports context sensitive menus, drag-and-drop editing and displays additional
information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment
supports reproducibility by automatically storing intermediate analysis steps and includes built–in
functions to share data between users and publish analysis results. For computational analysis, Wasabi
supports PRANK and PAGAN for phylogeny–aware alignment and alignment extension, and it can be
easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access
remote data via URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl.
To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative
genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: these case
studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a
web browser and does not require any installation. One can start using it at http://wasabiapp.org. All
source code is licensed under the GPLv3.

Key words: Evolutionary sequence analysis, reproducible research, data visualisation

Introduction was central in the first progressive alignment

In evolutionary sequence analysis, phylogenetic algorithm (Hogeweg and Hesper, 1984). While

trees and sequence alignments are intrinsically popular alignment programs use a tree to

linked: sequence alignments define character guide the alignment procedure, they consider

homologies upon which phylogenetic inference the tree a nuisance parameter. The phylogeny-

is based and trees are integral in correcting aware algorithm (Löytynoja and Goldman,

for hierarchical dependencies among sequence 2005, 2008; Löytynoja et al., 2012) makes the

data. In multiple sequence alignment, this alignment/phylogeny dependence explicit again:

connection was noticed early (Sanko↵, 1975) and reconstructed ancestral sequences are aligned

according to a guide phylogeny and the gap

patterns created closely reflect the tree topology.

© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology
and Evolution. All rights reserved. For permissions, please e-mail: [email protected]
Page: 2 1–9

The phylogeny-aware approach has been shown Table 1. Minimum version of web browsers supporting
basic and full set of Wasabi features.
to outperform other alignment methods in IE Firefox Chrome iOS Android
Basic 9.0 3.0 4.0 3.2* 41
comparative sequence analysis (Fletcher and Full 10.0 10.0 12.0 3.2* 41
NOTE.—Desktop browsers: Internet Explorer (IE), Firefox, Chrome.
Yang, 2010; Jordan and Goldman, 2012). Mobile browsers: Safari on iOS, Chrome on Android. * Refers to iOS
version.
In this article we present Wasabi: a web-

based analysis environment that reflects the Wasabi allows users to save all intermediate

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


inter-dependence between the phylogeny and analysis steps to aid reproducibility and allows for

the alignment in its design and displays each sharing of project data and analysis steps between

sequence next to the corresponding node in collaborators.

the phylogenetic tree. The joint visualisation of RESULTS

tree and alignment allows for the assessment of Overview

phylogenies next to the sequence data they were Wasabi is a web-application for evolutionary

inferred from or for the evaluation of alignments sequence analysis that is compatible with all

in the context of the underlying phylogeny. web browsers supporting the HTML5 standard

By additionally integrating ortholog and (see Table 1). Wasabi has been designed with

inferred ancestral sequences, Wasabi highlights the goal of providing a comprehensive set of

information that cannot easily be shown with analysis tools enabling users to analyse their

either the tree or the alignment in isolation. own data, integrate it with external data sets, if

Wasabi is inspired by our earlier tool, necessary, and visualise the results. Data analysis

webPRANK (Löytynoja and Goldman, 2010), a is complicated because input data needs to be in

web interface to the PRANK alignment program. specific formats, programs have a large number

webPRANK was limited to the analysis of only of options and poor file management can lead to

small data sets and to a single analysis method. careless mistakes. Wasabi aims to incorporate a

Wasabi, however, was developed using modern wide set of tools, automatically converts between

web technologies to scale to much larger data file formats and provides a hierarchy of program

sets, while maintaining responsiveness. Wasabi options to highlight parameters of interest for

additionally contains features for integrating di↵erent scenarios.

heterogeneous data sets, file management and Visualisation


an intuitive, modern web interface (see Figure 1 Many evolutionary analysis methods output
for overview) for navigating and selecting additional information that is not generally
appropriate analysis options. Additionally, due visualised. For example, along with aligned
to the increasing complexity of data analysis, sequences, the programs PRANK and PAGAN

2
Page: 3 1–9

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


FIG. 1. Overview of Wasabi interface. The toolbar contains menus and buttons for data management, analysis tools, zoom
levels, undo/redo and server notifications. Pop-up menus are associated with tree nodes and reveal ancestral sequences which
are displayed in lighter colours next to the node. A windowing system provides access to other functionality (e.g. sequence
translation and analysis library, shown above).

can output inferred ancestral sequences and sequences and duplication event nodes from EPO

posterior probabilities of di↵erent evolutionary (Paten et al., 2008) and GeneTree (Vilella et al.,

processes (Löytynoja and Goldman, 2008). 2009) alignments imported from Ensembl.

Whereas a majority of sequence alignment Data import/integration/sharing


viewers omit this information (see Table 2),
Wasabi supports a large number of input file
Wasabi either displays it by default (e.g.
formats for importing sequence and phylogenetic
distinguishing between insertion and deletion
tree data, including: FASTA, ClustalW, Phylip,
gaps) or else provides access to it via context
Newick, Extended Newick (Cardona et al., 2008),
sensitive menus. Ancestral sequences are not
NEXUS (Maddison et al., 1997), HSAML and
shown by default, but can be shown or hidden
PhyloXML (Han and Zmasek, 2009). User and
for individual nodes or entire subtrees. Using
analysis data is stored in a structured, file-based
this function, the user can locate mutation
database. By default, Wasabi stores intermediate
events at specific tree branches and gain
results from an analysis in the form of a branching
additional insights e.g. about mutation events
tree to track parallel pipelines and associated
after a distant branching event, or contrast
metadata allowing for reproducibility.
the mutation processes in di↵erent evolutionary
Wasabi currently supports importing external
lineages. Wasabi incorporates similar metadata
data from Ensembl (homologous gene sets and
from external data sets including ancestral
alignment slices) using the Ensembl REST API

3
Page: 4 1–9
·

(Flicek et al., 2013): a dedicated import menu Table 2. Main Wasabi features in alternative programs.
Was. Clust. JalV. Mega Seav.
supports Ensembl IDs and allows querying data
Phylogeny integration X – – – –
Ancestral sequences X – – X –
sets e.g. with species and gene names. Wasabi can
Analysis history X – – X –
Sharing URL X – – – –
additionally read input files from URLs.
User accounts X – – – –
Results can be shared between users via URLs Web access X – X – –
Ensembl import X – X – X
allowing access to single data sets or even entire Plugin extension X – – – X

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


NOTE.—Programs versions: ClustalX 2.1, JalView 2.8, Mega 6.06,
analysis trees. The recipient of the URL can Seaview 4.

display the linked data and continue working on a

copy inside Wasabi before returning any modified Table 3. Time to open a FASTA file with random protein
sequence data (in seconds).
data to the sender, e.g. to be reintegrated into the Data size Wasabi ClustalX JalView Mega Seaview
1K x 1K 2.0 9.4 0.8 5.3 0.2
original analysis (see SFig. 1). 1K x 10K 2.8 87.5 4.6 5.2 2.9
1K x 100K 9.6 TO OM 6.4 22.8

Feature comparison NOTE.—Input data size as # sequences x # sites, in thousands.


TO=timeout. OM=out of memory. Program versions as in Table 2.
All programs tested with default settings on Mac OSX 10.9 running
Wasabi’s core features are largely lacking from on a Macbook Pro with a 2.4GHz CPU, 4GB RAM and an SSD
harddisk. Wasabi ver. 140615 in Chrome ver. 39. Time limit: 15
minutes.
existing sequence alignment viewers. In Table 2

we compare Wasabi to JalView 2.8 (Waterhouse


size (methodology described in Supplementary
et al., 2009), Mega 6.06 (Tamura et al., 2013),
material). We note that not all programs tested
Seaview 4.0 (Gouy et al., 2010) and ClustalX 2.1
perform exactly the same functions upon loading
(Larkin et al., 2007).
the data, for example, ClustalX calculates
Existing programs either display the phylogeny
conservation scores, JalView calculates consensus
in a separate window with limited editing and
scores, whereas other methods (Wasabi, Seaview)
annotation capabilities, or else omit the tree
do neither. Of all the applications tested, Wasabi
altogether (ClustalX). Other features, such as
and Mega have the most predictable behaviour,
data/analysis management, Ensembl integration
though we note that in the 1-10K sequence range,
and plugin extension, are either missing or have
Wasabi is twice as fast. While it is unlikely
only limited support.
users would want to visualise a 100K sequence

Performance alignment, it demonstrates both the scalability of

Wasabi’s design and that there are few limitations


Although Wasabi is a web application,
with being web-based.
its performance is comparable to a “native”

application. Table 3 shows the time taken by Example of Wasabi workflow


several sequence alignment viewers to load To demonstrate the capabilities of Wasabi,

and display protein alignments of increasing we performed a common analysis task: the

4
Page: 5 1–9

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


FIG. 2. The reanalysis of the EGLN1 gene data set consists of seven steps. (1) Import of Ensembl GeneTrees data set.
(2) Addition of tiger, lion and snow leopard sequences with PAGAN. (3) Removal of paralogs and incomplete or non-
mammalian sequences. (4) Correction of big cats’ placement (error caused by high sequence similarity and missing data
in cat). (5) Removal of alignment gaps. (6) Translation to codons. (7) visualisation of the resulting alignment and the
lysine-to-methionine substitution in snow leopard (circled).

comparison of a gene sequence from a species using PAGAN, removed paralogs and incomplete

of interest to its homologs. We reanalysed the sequences and visualised the results. This analysis

EGLN1 gene data set highlighted in the recent consists of a handful of easy steps, all performed

tiger genome study (Cho et al., 2013). EGLN1 within Wasabi’s graphical environment (Figure 2).

is involved in the cellular response to hypoxia. Our analysis confirms the lysine-to-methionine

Cho et al. identified a single amino acid change substitution in the snow leopard lineage. However,

that might be related to high-altitude adaptation our extended data set shows that the sequence

in snow leopards. We reassembled the EGLN1 position is not strictly conserved in other lineages

sequence for the big cats (performed on the and, interestingly, an identical, independent

command line, see Supplementary Material) and lysine-to-methionine substitution is seen in

then used Wasabi to download the corresponding alpaca, another species adapted to high altitudes.

Ensembl GeneTrees data set, extended the The resulting data sets, including intermediate

Ensembl alignment with the big cat sequences checkpoints, can be explored with Wasabi

5
Page: 6 1–9

at http://wasabiapp.org:8000?id=usecases. Other any column-wise data, e.g. to indicate sequence

examples are provided in Supplementary material. conservation or alignment uncertainty or to

show annotation tracks. Related to this, we are

DISCUSSION developing a plugin system that will ease the

Wasabi provides an intuitive interface and integration of other command line tools. Plugin

scalable platform for evolutionary sequence writers will not need to do any programming, but

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


analysis via the web. Despite being under active instead specify a JSON configuration file detailing

development, Wasabi is being used everyday by how command line options are exposed through

researchers to perform analyses and share results Wasabi (see online documentation for a minimal

with collaborators. plugin example http://wasabiapp.org/plugins).

Wasabi is a centralized service. Some benefits Finally, we are developing a stripped-down

from this, such as transparent sharing of results version of Wasabi for data visualisation only.

between user accounts, are obvious. Others are A version of this will be integrated in the

more indirect: new features and bug fixes can Ensembl browser for the display of Compara

be deployed immediately to the web, benefiting alignment data. The tool will be provided as a

many users at once. Wasabi, however, is open separate library for easy integration with other

source, allowing for local (communal lab server) web resources.

and personal (laptop/desktop) installations. For METHODS


example, a research group can host their own
Wasabi integrates a phylogenetic tree and
private Wasabi installation to keep their data in-
sequence alignment viewer together with a toolbar
house, while still being able to share analyses with
and pop-up dialog windows to access information
local colleagues.
and analysis options. It does so by leveraging
Future plans for Wasabi focus on looking
many of the features introduced in HTML5 and
beyond sequence alignment. The integration
therefore requires a modern web browser. For
of downstream analysis tools such as SLR
older web browsers, Wasabi is designed to degrade
and PAML (Massingham and Goldman, 2005;
gracefully using JavaScript or other means.
Yang, 2007) within Wasabi would allow an

investigator to go from raw data to alignment Architecture

and inference of selection without needing to Wasabi uses the Model-View-View-Model

leave the Wasabi environment. The additional (MVVM) design pattern implemented by the

visualisation elements required for the display Knockout JavaScript library to make interface

of selection inference results could be used for elements and their contents dynamically reflect

6
Page: 7 1–9

program state. This is used throughout the 2). This process continues until there is no more

Wasabi client interface. sequence data or the browser limit of total canvas

Phylogenetic trees are drawn using scalable size has been reached, in which case, non-visible

vector graphics (SVG) to support multiple zoom tiles are recycled.

levels, aspect ratios and screen resolutions. Backend


Rendering utilises a modified version of Web applications are restricted from writing

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


jsPhyloSVG (Smits and Ouverney, 2010). files to the local hard disk and from launching
All SVG elements are individually addressable for external applications. Wasabi relies on a separate
associating mouse interactions and dynamically backend server to manage user accounts, data
applying CSS styling, e.g. when a user edits the and analysis results, perform analyses and provide
tree by performing a drag-and-drop operation, task progress notifications to the client. The
eligible drop targets are highlighted as bolded Wasabi server backend is written in Python
tree branches. and supports Windows, Mac OS and Linux. By
Sequences are always positioned next to running the Wasabi backend server on an internet-
corresponding tree elements; moving and deleting addressable host, it becomes a centrally managed
tree nodes results in the same actions being web application and forms the basis for Wasabi’s
performed on the sequence. Sequence alignments collaborative sharing features.
are rendered using an HTML5 canvas element for Supplementary Material
performance reasons. Drawing the visible portion
Supplementary material is available at
of the alignment is optimised by pre-rendering
Molecular Biology and Evolution online
all glyphs using the current typeface and colour
(http://www.mbe.oxfordjournals.org/).
scheme to individual canvas elements, which are
Acknowledgments
then copied to the main canvas as needed.
We acknowledge the Ensembl team for considering

our feedback in the development of the Ensembl


Tile-based rendering
REST API.
Wasabi was designed to scale to large data sets
Funding: This work was supported by the
and breaks down the visualisation of sequence
Biocenter Finland, Biocentrum Helsinki and
data into smaller manageable pieces, called tiles.
Marie Curie Career Integration Grants.
Each tile is a fixed-size canvas element four times
References
the size of the viewport. In response to the user
Cardona, G., Rossell, F., and Valiente, G. 2008. Extended
scrolling the viewport beyond the rendered area, newick: it is time for a standard representation of
a new tile is added and filled with data (see SFig. phylogenetic networks. BMC Bioinformatics, 9(1): 532.

7
Page: 8 1–9

Cho, Y. S., Hu, L., Hou, H., Lee, H., Xu, J., Kwon, S., Hogeweg, P. and Hesper, B. 1984. The alignment of sets

Oh, S., Kim, H.-M., Jho, S., Kim, S., Shin, Y.-A., of sequences and the construction of phyletic trees: an

Kim, B. C., Kim, H., Kim, C.-U., Luo, S.-J., Johnson, integrated method. J Mol Evol , 20(2): 175–186.
W. E., Koepfli, K.-P., Schmidt-Küntzel, A., Turner, Jordan, G. and Goldman, N. 2012. The e↵ects of alignment

J. a., Marker, L., Harper, C., Miller, S. M., Jacobs, W., error and alignment filtering on the sitewise detection of

Bertola, L. D., Kim, T. H., Lee, S., Zhou, Q., Jung, H.- positive selection. Mol. Biol. Evol., 29(4): 1125–1139.

J., Xu, X., Gadhvi, P., Xu, P., Xiong, Y., Luo, Y., Pan, Larkin, M. a., Blackshields, G., Brown, N. P., Chenna, R.,

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


S., Gou, C., Chu, X., Zhang, J., Liu, S., He, J., Chen, Mcgettigan, P. a., McWilliam, H., Valentin, F., Wallace,

Y., Yang, L., Yang, Y., He, J., Liu, S., Wang, J., Kim, I. M., Wilm, a., Lopez, R., Thompson, J. D., Gibson,

C. H., Kwak, H., Kim, J.-S., Hwang, S., Ko, J., Kim, C.- T. J., and Higgins, D. G. 2007. Clustal W and Clustal

B., Kim, S., Bayarlkhagva, D., Paek, W. K., Kim, S.-J., X version 2.0. Bioinformatics, 23(21): 2947–2948.
O’Brien, S. J., Wang, J., and Bhak, J. 2013. The tiger Löytynoja, A. and Goldman, N. 2005. An algorithm

genome and comparative analysis with lion and snow for progressive multiple alignment of sequences with

leopard genomes. Nat. Commun., 4(May): 2433. insertions. Proceedings of the National Academy of

Fletcher, W. and Yang, Z. 2010. The e↵ect of insertions, Sciences of the United States of America, 102(30):
deletions, and alignment errors on the branch-site test of 10557–10562.

positive selection. Mol. Biol. Evol., 27(10): 2257–2267. Löytynoja, A. and Goldman, N. 2008. A model of evolution

Flicek, P., Amode, M. R., Barrell, D., Beal, K., Billis, and structure for multiple sequence alignment. Philos

K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, Trans R Soc Lond B Biol Sci, 363(1512): 3913–3919.
G., Fitzgerald, S., Gil, L., Girón, C. G., Gordon, L., Löytynoja, A. and Goldman, N. 2008. Phylogeny-aware gap

Hourlier, T., Hunt, S., Johnson, N., Juettemann, T., placement prevents errors in sequence alignment and

Kähäri, A. K., Keenan, S., Kulesha, E., Martin, F. J., evolutionary analysis. Science, 320(5883): 1632–1635.

Maurel, T., McLaren, W. M., Murphy, D. N., Nag, R., Löytynoja, A. and Goldman, N. 2010. webPRANK:
Overduin, B., Pignatelli, M., Pritchard, B., Pritchard, a phylogeny-aware multiple sequence aligner with

E., Riat, H. S., Ruffier, M., Sheppard, D., Taylor, K., interactive alignment browser. BMC Bioinformatics,

Thormann, A., Trevanion, S. J., Vullo, A., Wilder, 11(1): 579.

S. P., Wilson, M., Zadissa, A., Aken, B. L., Birney, E., Löytynoja, A., Vilella, A. J., and Goldman, N.
Cunningham, F., Harrow, J., Herrero, J., Hubbard, T. 2012. Accurate extension of multiple sequence

J. P., Kinsella, R., Mu↵ato, M., Parker, A., Spudich, alignments using a phylogeny-aware graph algorithm.

G., Yates, A., Zerbino, D. R., and Searle, S. M. J. 2013. Bioinformatics, 28(13): 1684–1691.

Ensembl 2014. Nucleic Acids Res., 42(Database issue): Maddison, D. R., Swo↵ord, D. L., and Maddison, W. P.
D749–55. 1997. Nexus: An extensible file format for systematic

Gouy, M., Guindon, S., and Gascuel, O. 2010. SeaView information. Systematic Biology, 46(4): 590–621.

version 4: A multiplatform graphical user interface for Massingham, T. and Goldman, N. 2005. Detecting

sequence alignment and phylogenetic tree building. Mol. amino acid sites under positive selection and purifying
Biol. Evol., 27(2): 221–224. selection. Genetics, 169(3): 1753–1762.

Han, M. V. and Zmasek, C. M. 2009. phyloXML: XML for Paten, B., Herrero, J., Fitzgerald, S., Beal, K., Flicek,

evolutionary biology and comparative genomics. BMC P., Holmes, I., and Birney, E. 2008. Genome-wide

Bioinformatics, 10(1): 356. nucleotide-level mammalian ancestor reconstruction.

8
Page: 9 1–9

Genome Res., 18: 1829–1843.

Sanko↵, D. 1975. Minimal mutation trees of sequences.

SIAM J. Appl. Math., 28(1): 35–42.


Smits, S. a. and Ouverney, C. C. 2010. jsPhyloSVG: A

javascript library for visualizing interactive and vector-

based phylogenetic trees on the web. PLoS One, 5(8):

6–9.

Downloaded from http://mbe.oxfordjournals.org/ at University of California, San Diego on December 5, 2015


Tamura, K., Stecher, G., Peterson, D., Filipski, A., and

Kumar, S. 2013. MEGA6: Molecular evolutionary

genetics analysis version 6.0. Mol. Biol. Evol., 30(12):

2725–2729.
Vilella, A. J., Severin, J., Ureta-Vidal, A., Heng, L.,

Durbin, R., and Birney, E. 2009. EnsemblCompara

GeneTrees: Complete, duplication-aware phylogenetic

trees in vertebrates. Genome Res., 19(2): 327–335.


Waterhouse, A. M., Procter, J. B., Martin, D. M. A.,

Clamp, M., and Barton, G. J. 2009. Jalview version

2–a multiple sequence alignment editor and analysis

workbench. Bioinformatics, 25(9): 1189–1191.


Yang, Z. 2007. PAML 4: phylogenetic analysis by maximum

likelihood. Mol. Biol. Evol., 24(8): 1586–1591.

You might also like