Text-based Discovery in Biomedicine The Architecture of the DAD -system

31
Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture of the DAD-system Marc Weeber 1,2 , Henny Klein 1 , Alan R. Aronson 2 , Jim G. Mork 2 , Lolkje T. W. de Jong - van den Berg 1 , Rein Vos 1,3 1 Department of Social Pharmacy and Pharmacoepidemiology, Groningen University Institute for Drug Exploration, The Netherlands 2 Lister Hill National Center for Biomedical Communication, National Library of Medicine, Bethesda, MD 3 Health Ethics and Philosophy, Faculty of Health Sciences, University of Maastricht, The Netherlands

description

Text-based Discovery in Biomedicine The Architecture of the DAD -system. Marc Weeber 1,2 , Henny Klein 1 , Alan R. Aronson 2 , Jim G. Mork 2 , Lolkje T. W. de Jong - van den Berg 1 , Rein Vos 1,3. - PowerPoint PPT Presentation

Transcript of Text-based Discovery in Biomedicine The Architecture of the DAD -system

Page 1: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Text-based Discovery in Biomedicine

The Architecture of the DAD-system

Marc Weeber1,2, Henny Klein1,Alan R. Aronson2, Jim G. Mork2,

Lolkje T. W. de Jong - van den Berg1, Rein Vos1,3

1Department of Social Pharmacy and Pharmacoepidemiology, Groningen University Institute for Drug Exploration, The Netherlands

2Lister Hill National Center for Biomedical Communication, National Library of Medicine, Bethesda, MD

3Health Ethics and Philosophy, Faculty of Health Sciences, University of Maastricht, The Netherlands

Page 2: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Introduction• Goal:

Finding new biomedical knowledge through the combination of existing knowledge as represented in the medical literature

• Motivation:

Prevention of re-inventing the wheel, re-usage of specific knowledge outside the original domain of discovery

Page 3: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Swanson

• AB: Raynaud’s disease is characterized by high blood viscosity and high platelet aggregation

• BC: Fish oil is known to reduce blood viscosity and platelet aggregation

A CB?

Page 4: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Vos and Rikken

• Drugs instead of diet factors

• Intermediate (B) terms are adverse drug reactions

• Drug – Adverse drug reactions – Disease: The DAD-system

• Vos (1991) Drugs looking for diseases

Page 5: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Existing Techniques

• Swanson & Smalheiser:• Single words/multi word terms• MEDLINE titles• No statistics

• Gordon & Lindsay:• Single words/multi word terms• Information Retrieval statistics• Replication of Swanson’s discoveries

Page 6: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

New Techniques

• Use of UMLS concepts

• PubMed

• MetaMap: mapping free text (MEDLINE titles and abstracts) to concepts

• Interactive web interface

Page 7: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Two-step Approach

• Open discovery, generating a hypothesis

A ??

• Closed discovery, testing a hypothesis

A C?

Page 8: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Why UMLS Concepts?

• Use of only biomedically relevant information

• Useful transition from single word to multi word term

• Semantic information (semantic types) for filtering (e.g. select only Disease or Syndrome)

Page 9: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

SpecialistLexicon

PubMed SemanticNetwork

MetaMapKS

Page 10: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

MySQLDatabase

SpecialistLexicon

PubMed SemanticNetwork

MetaMap

FilterTxt2ConQuery ShowSelect

KS

Page 11: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

DAD-system

Meta-thesaurus

MySQLDatabase

SpecialistLexicon

PubMed SemanticNetwork

MetaMap

FilterTxt2ConQuery ShowSelect

KS

Page 12: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Query (user input):

raynaud’s disease

Page 13: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Mapping text to concept through MetaMap:

Raynaud's Disease [Disease or Syndrome]

Page 14: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Synonym lookup:

Raynaud's syndrome Raynaud's disease /phenomenon

•Variant generation:

e.g. syndrome / syndromes

Page 15: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•PubMed query:

raynaud OR raynauds

•Processing: query in titles and abstracts

•Result: 1,246 MEDLINE citations

Page 16: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Text to concept mapping of all citations

•Sentences with Raynaud’s disease

•Result: 1,278 UMLS concepts

Page 17: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A

Open Discovery

•Select functional/physiological concepts•Semantic types in filter:

Body Location or RegionBiologic FunctionCell FunctionPhenomenon or ProcessPhysiologic FunctionTissue

Page 18: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Result: 57 Concepts

•Frequency range:

1- 18

Page 19: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Selected B-concepts:

Plasma Viscosity LevelBlood ViscosityPlatelet AdhesivenessPlatelet AggregationEffects, Blood

Coagulation

Page 20: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Variants:

plasma, plasmasviscosity, viscous,aggregation, aggregations,

aggregatingcoagulation, coagulating

Page 21: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•PubMed query:

blood coagulation OR blood viscosity OR plasma viscosity OR platelet

adhesiveness OR platelet aggregation

•Result: 10,611 MEDLINE citations

Page 22: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Concepts in sentences with B-concepts:

7,702

•Concepts not in Raynaud sentences:

6,747

Page 23: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B

Open Discovery

•Filter for dietary related concepts

•Semantic types in filter:

VitaminLipidElement, Ion, or Isotope

Page 24: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A B C

Open Discovery

Eicosapentaenoic AcidFish OilFatty Acids, Omega 3MAXEPAOmega-3

PolyunsaturatedFatty Acid

Cod Liver OilSalmon Oil

•Result: 206 Concepts

•Rank order on relations

•Fish oil related concepts:

Page 25: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

Eicosapentaenoic AcidFish OilFatty Acids, Omega 3MAXEPAOmega-3

PolyunsaturatedFatty Acid

Cod Liver OilSalmon Oil

Raynaud’s Disease

Page 26: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

1,246 citations1,278

concepts

479 common concepts

463 citations1,795

concepts

Page 27: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

Functional / Physiological Filter

45 B-concepts

Page 28: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

A C

Closed Discovery

B

•Known concepts:

Plasma viscosity level

Blood ViscosityPlatelet

AdhesivenessPlatelet AggregationEffects, Blood

Coagulation

•New concepts:

VasodilatationVeins, CapillariesDinoprostoneFibrinolysisDeformabilityRheology

Page 29: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Juxtaposition

Page 30: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Success / Failure +Simulation of Raynaud’s disease – fish

oil and migraine – magnesium

+Discovery of new therapeutic applications for thalidomide

- Mapping (Mg = milligram / magnesium)

- Association defined by co-occurrence

Page 31: Text-based Discovery in Biomedicine The Architecture of the  DAD -system

Social Pharmacy andPharmacoepidemiology

Lister Hill National Center forBiomedical Communications

Future

• Better semantic analysis:increase(A,B) and decrease(B,C)

• Better user interface

• More databasese.g. finding genetic bases for diseases