New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo...

44
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    3

Transcript of New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo...

New York State Center of Excellence in Bioinformatics & Life Sciences

Biomedical Ontology in Buffalo

Part I: The Gene Ontology

Barry Smith and Werner Ceusters

New York State Center of Excellence in Bioinformatics & Life Sciences

Biomedical data is siloed

• Lab / pathology data

• Electronic Health Record data

• Clinical trial data

• Patient histories

• Medical imaging

• Microarray data

• Protein chip data

• Flow cytometry

• Genotype / SNP data2

New York State Center of Excellence in Bioinformatics & Life Sciences

Biomedical data is siloed

Data in PittsburghData owned by MedicareData owned by the NIHData owned by HIV researchersData owned by the Cleveland ClinicData owned by regional health organizations Data owned by mouse biologistsData owned by Dr McFritz

NIH mandates for data reusability

3

New York State Center of Excellence in Bioinformatics & Life Sciences

Ontology: An antidote to silos

4

Department of Philosophy135 Park HallUniversity at BuffaloBuffalo NY 14260

Department of Philosophy135 Park HallUniversity at BuffaloBuffalo NY 14260

promoting:

• information retrieval

• information consistency, and thus continuity and cumulation

• information integration

• reasoning

New York State Center of Excellence in Bioinformatics & Life Sciences

Uses of ‘ontology’ in PubMed abstracts

5

New York State Center of Excellence in Bioinformatics & Life Sciences

By far the most successful: GO (The Gene Ontology)

You’re interested in which genes control heart

muscle development

17,536 results

7

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cyclehemocyanin

Defense responseImmune response

Response to stimulusToll regulated genes

JAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabloismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Microarray datashows changed

expression ofthousands of genes.

How will you spot the

patterns?

8

You’re interested in which of your hospital’s patient

data is relevant to understanding how genes

control heart muscle development

9

Lab / pathology dataEHR data

Clinical trial dataFamily history data

Medical imagingMicroarray data

Model organism dataFlow cytometry

Mass specGenotype / SNP data

How will you spot the patterns?How will you find the data you

need?10

New York State Center of Excellence in Bioinformatics & Life Sciences

GO provides a controlled system of 25,000 categories for use in annotating data

• multi-species (model organism research)

• multi-disciplinary

• open source

11

12

Definitions

13

Gene products involved in cardiac muscle development in humans 14

Hierarchical view representing relations between represented types

15

The GO categorizations are organized in a way which provides a tool for algorithmic

reasoning

New York State Center of Excellence in Bioinformatics & Life Sciences

$100 mill. invested in literature curation using GO

16

over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO

experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO

New York State Center of Excellence in Bioinformatics & Life Sciences

One standard method

Sjöblöm T, et al. analyzed13,023 genes in 11 breast and 11 colorectal cancers

using baseline functional information captured by GO for given gene product types

identified 189 genes as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention.

Science. 2006 Oct 13;314(5797):268-74. 17

New York State Center of Excellence in Bioinformatics & Life Sciences

Uses of GO in studies of:

• Persistent changes in spinal cord gene expression after recovery from inflammatory hyperalgesia: a preliminary study on pain memory. PMID: 18366630

• Spinal cord transcriptional profile analysis reveals protein trafficking and RNA processing as prominent processes regulated by tactile allodynia. PMID: 17069981

• Immune system involvement in abdominal aortic aneurisms (PMID 17634102)

• Biomedical discovery acceleration, with applications to craniofacial development. PMID: 19325874

18

New York State Center of Excellence in Bioinformatics & Life Sciences

Ontology in Buffalo

Part 2: Problems of Clinical Ontologies

New York State Center of Excellence in Bioinformatics & Life Sciences

Source of all data

Reality !

20

New York State Center of Excellence in Bioinformatics & Life Sciences

Ultimate goal

A digital copy of the world21

New York State Center of Excellence in Bioinformatics & Life Sciences

Requirements for this digital copy

• R1: A faithful representation of reality• R2 … of everything that is digitally registered,

what is generic scientific theories

what is specific what individual entities exist and how they relate

• R3 … which is computable, in order to … … allow queries over the world’s past and present

… make predictions (diagnostic support, early warnings …)

… fill in gaps

… identify mistakes

...

22

New York State Center of Excellence in Bioinformatics & Life Sciences

… the ultimate crystal ball

23

New York State Center of Excellence in Bioinformatics & Life Sciences

The ‘binding’ wall

How to do it right ?

A cartoon of the world 24

New York State Center of Excellence in Bioinformatics & Life Sciences

“Better Information” must cover …

• EHR-EMR-ENR-…• PHR• Various modality-related

databases– Lab, imaging, …

• Textbooks

• Classification systems

• Terminologies

• Ontologies

Patient-specific information

Scientific “knowledge”

1

2

3

25

New York State Center of Excellence in Bioinformatics & Life Sciences

Key question

How to extend to clinical medicine the standard of quality of the GO and other ontologies based in biological science?

26

New York State Center of Excellence in Bioinformatics & Life Sciences

NCI Thesaurus (April 2008)2

27

New York State Center of Excellence in Bioinformatics & Life Sciences

NCI Thesaurus (April 2008)

?

2

28

New York State Center of Excellence in Bioinformatics & Life Sciences

MeSH: some paths from top to Wolfram Syndrome

Wolfram Syndrome

All MeSH Categories

Diseases Category

Nervous System Diseases

Cranial Nerve Diseases

Optic Nerve Diseases

Optic Atrophy

Optic Atrophies,Hereditary

NeurodegenerativeDiseases

HeredodegenerativeDisorders,

Nervous System

Eye Diseases

Eye Diseases, Hereditary

Optic Nerve Diseases

Male UrogenitalDiseases

Urologic Diseases

Kidney Diseases

Diabetes Insipidus

Female Urogenital Diseasesand Pregnancy Complications

Female Urogenital Diseases

2

32

New York State Center of Excellence in Bioinformatics & Life Sciences

What would it mean if used in the context of a patient ?

Wolfram Syndrome

All MeSH Categories

Diseases Category

Nervous System Diseases

Cranial Nerve Diseases

Optic Nerve Diseases

Optic Atrophy

Optic Atrophies,Hereditary

has

NeurodegenerativeDiseases

HeredodegenerativeDisorders,

Nervous System

Eye Diseases

Eye Diseases, Hereditary

Optic Nerve Diseases

Female Urogenital Diseasesand Pregnancy Complications

Female Urogenital Diseases

Male UrogenitalDiseases

Urologic Diseases

Kidney Diseases

Diabetes Insipidus

???

has

3 ???

33

New York State Center of Excellence in Bioinformatics & Life Sciences

Biomedical Ontology in BuffaloPart

3: What we do

New York State Center of Excellence in Bioinformatics & Life Sciences

The GO is amazingly successful in overcoming silo problemsbut it covers only generic biological entities of three sorts:

– cellular components

– molecular functions

– biological processes

and it does not provide representations of diseases, symptoms, …

35

New York State Center of Excellence in Bioinformatics & Life Sciences

The core of biomedical ontology in Buffalo

– extending the methodology of high quality ontologies to other domains of biology and medicine, and to EHRs and coding systems

– combining ontology with referent tracking

36

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

The Open Biomedical Ontologies (OBO) Foundry37

New York State Center of Excellence in Bioinformatics & Life Sciences

NCBO

NIH Roadmap Center for Biomedical Computing

Collaboration of:

Stanford Biomedical Informatics Research

Mayo Clinic

University at Buffalo

National Center for Biomedical Ontology(NCBO)

38

New York State Center of Excellence in Bioinformatics & Life Sciences

National Center for Ontological Research(NCOR)

• Army Net-Centric Data Strategy Center of Excellence – Biometrics Ontology

– Command and Control Ontology

– Universal Core Semantic Layer

39

New York State Center of Excellence in Bioinformatics & Life Sciences

Current funded biomedical ontology projects

• Protein Ontology (PRO) (NIH/NIGMS)

• Infectious Disease Ontology (IDO) (NIH/NIAID)

• Realism-Based Versioning for Biomedical Ontologies (SNOMED) (NIH/NLM)

• Ontology for Risks Against Patient Safety (RAPS) (EU)

• DSM Ontology (to support work on revision of Diagnostic and Statistical Manual of Mental Disorders

• Cleveland Clinic Semantic Database in Cardiothoracic Surgery

40

New York State Center of Excellence in Bioinformatics & Life Sciences

IDO Consortium

• MITRE, Mount Sinai, UTSouthwestern – Influenza

• IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever

• Duke University – Tuberculosis

• Cleveland Clinic – Infective Endocarditis

• University of Michigan – Brucilosis

41

New York State Center of Excellence in Bioinformatics & Life Sciences

“Better Information” must cover …

• EHR-EMR-ENR-…• PHR• Various modality-related

databases– Lab, imaging, …

• Textbooks

• Classification systems

• Terminologies

• Ontologies

Patient-specific information

Scientific “knowledge”

1

2

3

42

New York State Center of Excellence in Bioinformatics & Life Sciences

Ontologies

Keeping track of what is general (diabetes, malaria, nasal bone, nose …)

43

New York State Center of Excellence in Bioinformatics & Life Sciences

Referent tracking

Keeping track of what is particular (this particular nasal bone, this particular fracture, this particular swimming pool, this particular image …)

44

New York State Center of Excellence in Bioinformatics & Life Sciences

eyeGENE

45

New York State Center of Excellence in Bioinformatics & Life Sciences

Ontology for Risks Against Patient Safety

46

New York State Center of Excellence in Bioinformatics & Life Sciences

REMINE: RT-based adverse event analysisIUI Particular description Properties

#1 the patient who is treated #1 member C1 since t2 #2 #1’s treatment #2 instance_of C3

#2 has_participant #1 since t2

#2 has_agent #3 since t2

#3 the physician responsible for #2 #3 member C4 since t2 #4 #1’s arthrosis #4 member C5 since t1 #5 #1’s anti-inflammatory treatment #5 part_of #2

#5 member C2 since t3 #6 #1’s physiotherapy #6 part_of #2 #7 #1’s stomach #7 member C6 since t2 #8 #7’s structure integrity #8 instance_of C8 since t0

#8 inheres_in #7 since t0 #9 #1’s stomach ulcer #9 part_of #7 since t3 #10 coming into existence of #9 #10 has_participant #9 at t3 #11 change brought about by #9 #11 has_agent #9 since t3

#11 has_participant #8 since t3

#11 instance_of C10 at t3 #12 noticing the presence of #9 #12 has_participant #9 at t3+x

#12 has_agent #3 at t3+x

#13 cognitive representation in #3 about #9 #13 is_about #9 since t3+x

47