Published June 26, 2019 | Version v1
Poster Open

Machine-learning based spectral similarity measures to better identify different yet related compounds from large metabolomic datasets.

  • 1. Netherlands eScience Center
  • 2. School of Computing Science, University of Glascow
  • 3. Bioinforma�cs Group, Department of Plant Sciences, Wageningen University

Description

Extensive high-throughput mass spectrometry has become an important tool in many areas of life sciences and medicine. Analyzing and interpreting the resulting complex mass spectral data remains a challenging task, in particular for mixtures containing large numbers of unidentified compounds. One key challenge in extracting useful information from such data is to determine if spectra belong to identical or similar molecules. This is typically done by deriving spectral similarity scores, currently often based on comparing (intensities for matching) peak positions, for instance by calculating a modified cosine score (as used in GNPS molecular networking)  Those measures work well for spectra obtained for very similar compounds, but often perform poorly when used to find similarities between spectra of notably different yet related compounds  This is assessed by using Tanimoto coefficients between molecular fingerprints from a large set (>10,000) of MS/MS reference spectra as benchmark  We here propose a number of alternative approaches for measuring spectral similarity which are based on established machine-learning algorithms including techniques adapted from natural language processing, but also PCA and deep autoencoders. We will present several measures that outperform the modified cosine score in selecting spectra from structurally closely related molecules in datasets containing potentially unknown compounds. We further find that some of the presented measures show complementary characteristics which can either be combined or be used to address different types of similarity. Taken together, we conclude that these novel spectral similarity measures are a promising alternative for established measures

Files

Poster_2019_metabolomics_Huber_final.pdf

Files (3.8 MB)

Name Size Download all
md5:520653ddd45e6858650ce0b3ffcc2423
3.8 MB Preview Download