Workflow Hub for Automated Metagenomic Exploration (WHAM!
WHAM!: a web-based visualization suite for user-defined analysis of
metagenomic shotgun sequencing data https://doi.org/10.1186/s12864-
018-4870-z
WHAM! is an interactive and customizable tool for downstream
metagenomic and metatranscriptomic analysis providing a user-friendly
interface allowing for easy data exploration by microbiome and ecological
experts as well as for creating publication quality figures without the need
for command line interface or in-house bioinformatics.
As metagenomic and metatranscriptomic shotgun sequencing data
become both less expensive to generate and more readily available,
researchers have turned to automated pipelines such as MetaPhlAn [1],
HUMAnN2 [2, 3] MEGAN [4] and SAMSA [5] for annotation and analysis.
While these applications provide high quality functional and taxonomic
annotations, a computational hurdle still exists between the data output
and biologically interpretable results. Output formats from annotation
pipelines are typically cumbersome tables and large matrices of genes,
assigned taxa, and abundance or expression levels. Further because of
the size and density of information, data-driven discoveries are being
inhibited.
However, many of these tools have limitations for downstream
visualization and user-based data exploration. Also, these tools include a
visualization script to generate relative abundance plots for a particular
pathway or gene family of interest, users are limited in figure
customization and must use the command line.
WHAM! is described here as an easy to use, web-based, R-shiny
application that generates publication-quality figures for metagenomic
sequencing analyses (https://ruggleslab.shinyapps.io/wham_v1/). The
application employs a number of R packages including, ggplot2 [14],
psych [15], gplots [16], and plotly [17] for visualization (For source code
and full list of packages and dependencies please
see https://github.com/ruggleslab/jukebox/tree/master/wham_v1).
However, all dependencies are packaged within the application, so users
only need web access and input data.
Input options : TSV outputs of gene families, pathways or Gene Ontology
(GO) terms and their abundance or expression levels in the specified
format (based on the Huttenhower Biobakery pipeline )
The second input option is the European Bioinformatics Institute (EBI)
Metagenomics service, in which the user can upload up to two files
containing functional features (Interpro protein families, GO terms, etc.)
and/or a taxa file, in the specified formats
filtering out low variance features: in order to speed up differential
abundance calculations and visualizations based on a variance percentile
cutoff
Reduces Overfitting: By removing features that do not contribute
much to the prediction, it helps reduce overfitting.
Saves Computational Resources: Fewer features mean less
computational power is needed for training models.
Improves Model Performance: Removing low variance features can
improve the performance of machine learning models.
navigate to the ‘Groups’ tab: manually separate their samples into as
many as 10 experimental groups
Wilcoxon test p-value cutoffs
non-parametric Spearman correlation analysis
Benjamini-Hochberg correction