Global Phenotypic Data Sharing
Standards to Maximize Diagnostics and
Mechanism Discovery
Melissa Haendel, PhD @ontowonka
Prevailing clinical genomic pipelines leverage only a
tiny fraction of the available data
PATIENT EXOME
/ GENOME
PATIENT CLINICAL
PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENT
PUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,
CORRELATIONS
Under-utilized data
Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
SNOMED-CTMedical procedure coding
Environment Ontology
@ontowonka
The Human Phenotype Ontology
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
 11,813
phenotype
terms
 127,125 rare
disease -
phenotype
annotations
 136,268
common
disease -
phenotype
annotations
http://bit.ly/hpo-paper
Existing clinical vocabularies don’t adequately
cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
0
10
20
30
40
50
60
70
80
90
100
HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM
Percentcoverage
=> HPO is now in the UMLS
monarchinitiative.org
Why model organisms matter to patients
Model data can provide
up to
80% phenotypic coverage
of the human coding
genome
Fuzzy phenotype matching for diagnosis
Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
Bone et al.
Computational evaluation of exome sequence
data using human and model organism
phenotypes improves diagnostic efficiency
Genetics in Medicine (2015)
doi:10.1038/gim.2015.137
Phenotypic
profile
Genes
Heterozygous,
missense
mutation
STIM-1
Heterozygous,
missense
mutation
STIM-1
Stim1Sax/Sax
4.9% exomes w dual molecular diagnoses,
differentiated w deep phenotyping
Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.org
http://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data
Journals are now requiring HPO terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%
monarchinitiative.org
How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency
Genes Environment Phenotypes+ =
Biology central dogma
Standards for exchanging data
must be up to these challenges.
@ontowonka
Genes Environment Phenotypes
VCF PXFGFF
Standard exchange mechanisms exist for
genes … but for phenotypes? Environment?
BED
@ontowonka
Introducing PhenoPackets
A packet of phenotype data to be used
anywhere, written by anyone
http://phenopackets.org
What does a phenopacket look like?
 Alacrima
 Sleep Apnea
 Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
 Clinical labs
 Public databases
 Journals
Layperson HPO + Phenopackets
 Dry eyes
 Stops breathing during sleep
 Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries
• Social media
Standards are vital to realize a
mechanistic classification of disease
www.monarchinitiative.org
Leadership: Melissa Haendel, Chris Mungall, Peter Robinson,
Tudor Groza, Damian Smedley, Sebastian Köhler, Julie McMurry
Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C,
HHSN268201400093P;

Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanism Discovery

  • 1.
    Global Phenotypic DataSharing Standards to Maximize Diagnostics and Mechanism Discovery Melissa Haendel, PhD @ontowonka
  • 2.
    Prevailing clinical genomicpipelines leverage only a tiny fraction of the available data PATIENT EXOME / GENOME PATIENT CLINICAL PHENOTYPES PUBLIC GENOMIC DATA PUBLIC CLINICAL PHENOTYPE, DISEASE DATA POSSIBLE DISEASES DIAGNOSIS & TREATMENT PATIENT ENVIRONMENT PUBLIC ENVIRONMENT, DISEASE DATA PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES, CORRELATIONS Under-utilized data
  • 3.
    Genes Environment Phenotypes+= Computable encodings are essential Base pairs Variant notation (eg. HGVS) SNOMED-CTMedical procedure coding Environment Ontology @ontowonka
  • 4.
    The Human PhenotypeOntology Hyposmia Abnormality of globe location eyeball of camera-type eye sensory perception of smell Abnormal eye morphology Motor neuron atrophyDeeply set eyes motor neuronCL 34571 annotations in 22 species 157534 phenotype annotations 2150 phenotype annotations  11,813 phenotype terms  127,125 rare disease - phenotype annotations  136,268 common disease - phenotype annotations http://bit.ly/hpo-paper
  • 5.
    Existing clinical vocabulariesdon’t adequately cover phenotypic descriptions Winnenburg and Bodenreider, 2014 0 10 20 30 40 50 60 70 80 90 100 HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM Percentcoverage => HPO is now in the UMLS
  • 6.
    monarchinitiative.org Why model organismsmatter to patients Model data can provide up to 80% phenotypic coverage of the human coding genome
  • 7.
  • 8.
    Deep phenotyping and“fuzzy” matching algorithms improve diagnostics Bone et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency Genetics in Medicine (2015) doi:10.1038/gim.2015.137 Phenotypic profile Genes Heterozygous, missense mutation STIM-1 Heterozygous, missense mutation STIM-1 Stim1Sax/Sax 4.9% exomes w dual molecular diagnoses, differentiated w deep phenotyping
  • 9.
    Matchmaker Exchange forpatients, diseases, and model organisms to aid diagnosis and mechanistic discovery www.monarchinitiative.org http://bit.ly/Monarch-MME Goal: Get clinical sites & public databases to provide standardized phenotype data
  • 10.
    Journals are nowrequiring HPO terms Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
  • 11.
    HPO language translations Weneed your help! http://bit.ly/hpo-translations Translation of labels, synonyms, and text definitions Italian Spanish Russian French German English layperson Japanese Chinese 100%11% 12% 100% 19%19% near 100% 20%
  • 12.
    monarchinitiative.org How much phenotypingis enough? Enlarged ears (2)Dark hair (6) Female (4) Male (4) Blue skin (1) Pointy ears (1) Hair absent on head (1) Horns present (1) Hair present on head (7) Enlarged lip (2) Increased skin pigmentation (3) bit.ly/annotationsufficiency
  • 13.
    Genes Environment Phenotypes+= Biology central dogma Standards for exchanging data must be up to these challenges. @ontowonka
  • 14.
    Genes Environment Phenotypes VCFPXFGFF Standard exchange mechanisms exist for genes … but for phenotypes? Environment? BED @ontowonka
  • 15.
    Introducing PhenoPackets A packetof phenotype data to be used anywhere, written by anyone http://phenopackets.org
  • 16.
    What does aphenopacket look like?  Alacrima  Sleep Apnea  Microcephaly phenotype_profile: - entity: ”patient16" phenotype: types: - id: "HP:0000522" label: ”Alacrima" onset: description: “at birth” types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: "ECO:0000033" label: ”Traceable Author Statement" source: - id: ”PMID:"  Clinical labs  Public databases  Journals
  • 17.
    Layperson HPO +Phenopackets  Dry eyes  Stops breathing during sleep  Small head phenotype_profile: - entity: “Grace” phenotype: types: - id: "HP:0000522" label: “Alacrima" onset: description: “at birth" types: - id: "HP:0003577" label: "Congenital onset" evidence: - types: - id: “ECO:0000033” label: “Traceable Author Statement" source: - id: “ https://twitter.com/examplepatient/status/1 23456789” • Patient registries • Social media
  • 18.
    Standards are vitalto realize a mechanistic classification of disease
  • 19.
    www.monarchinitiative.org Leadership: Melissa Haendel,Chris Mungall, Peter Robinson, Tudor Groza, Damian Smedley, Sebastian Köhler, Julie McMurry Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C, HHSN268201400093P;

Editor's Notes

  • #4 The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  • #5 Represent organism as a biological subject Represent diseases/genotypes as collections of nodes in the graph Interoperable with other bioinformatics resources and leverage modern semantic standards
  • #7 Data from mouse, rat, zebrafish, worm, fruitfly Human:OMIM, clinvar Orthology via PANTHER v9
  • #9 Example showing how adding fuzzy phenotype matching improves disease diagnosis above using sequence based methodologies alone.
  • #12 Translation teams at: https://github.com/Human-Phenotype-Ontology/HPO-translations/blob/master/README.md Contact: [email protected]
  • #13 Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
  • #14 The classic G+E=P. But the = has a lot that can be applied to aid the linking. G-P or D (disease) causes contributes to is risk factor for protects against correlates with is marker for modulates involved in increases susceptibility to G-G (kind of) regulates negatively regulates (inhibits) positively regulates (activates) directly regulates interacts with co-localizes with co-expressed with P/D - P/D part of results in co-occurs with correlates with hallmark of (P->D) E-P contributes to (E->P) influences (E->P) exacerbates (E->P) manifest in (P->E) G-E (kind of) expressed in expressed during contains inactivated by
  • #15 The classic G+E=P. But the = has a lot that can be applied to aid the linking.
  • #19 This figure is adapted from National Research Council (U.S.). Committee on A Framework for Developing a New Taxonomy of Disease., Toward precision medicine : building a knowledge network for biomedical research and a new taxonomy of disease. 2011, Washington, D.C.: National Academies Press. xiii, 128 p. http://www.nap.edu/catalog/13284/toward-precision-medicine-building-a-knowledge-network-for-biomedical-research Figure 3.1 (page 52): Building a biomedical Knowledge Network for basic discovery and Medicine.
  • #20 Fully translational – from bench to bedside – group of stakeholders, contributors, and partners