Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity...
-
Upload
godwin-casey -
Category
Documents
-
view
215 -
download
1
Transcript of Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity...
Citation
Biomedical InformaticsData ➜ Information ➜ Knowledge
BMI
Biomedical Named Entity Recognition
Ramakanth Kavuluru
NLP Seminar – 8/21/2012
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
Biologically Active Substance
Drug
Disorder
Organic Chemical
Enzyme
Cell
BMI
What are named entities?
• The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.
• Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
Cholesterol lowering drugs
Drug
Biological Function
BMI
Why do we need to extract them?
• To provide effective semantic search– Find all discharge summaries of patients that
have a history of diabetes and obesity and have taken statins as part of their treatment.
– Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders.
Clinical Trial Recruitment
Literature Review
BMI
Why do we need to extract them?
• To use as features in machine learning for effective text classification
• To build semantic clusters of textual documents to understand evolving themes
• Reduce noise by avoiding key words that are not indicative of the classes or clusters
• Recently, as a first step in relation extraction and hence in knowledge discovery
BMI
A major task in text mining
• Extract information from textual data
• Use this information to solve problems
• What type of information?– relevant concepts - a medical condition or
finding, a drug, a gene or protein, an emotion (hope, love, …)
– Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease
• What are the typical questions?– Does a pathology report indicate a reportable
case?
– Which patients satisfy the criteria for a clinical trial?
BMI
Knowledge Discovery
• VIP Peptide – increases – Catecholamine Biosynthesis
• Catecholamines – induce – β-adrenergic receptor activity
• β-adrenergic receptors – are involved – fear conditioning
VIP Peptide – affects – fear conditioning ?????
In Cattle
In Rats
In Humans
BMI
Clinical NER
Concept Type Attributes
• Disorder/Symptom
• Medication
• Procedures
Present/historical/absent, Acute? Uncertain?
Present/historical/future
BMI
Why is NER Hard?
BMI
Linguistic Variation
• Derivational variation: cranial, cranium
• Inflectional variation: coughed, coughing
• Synonymy– nuerofibromin 2, merlin, NF2 protein, and
schwannomin.
– Addison’s disease, adrenal insufficiency, hypocortisolism, bronzed disease
– Feeding problems in newborn – The mother said she was having trouble feeding the baby.
BMI
Polysemy
• Merlin – both a bird and protein in UMLS
• Discharge– Patient was prescribed codeine upon discharge
– The discharge was yellow and purulent
• Abbreviations– APC: Activated protein C, Adenomatosis
polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex
BMI
Negation
• Nearly half of all clinical concepts in dictated narratives are negated– There is no maxillary sinus tenderness
• Implied absence without negation– Lungs are clear upon auscultation
So,
– Rales: Absent
– Rhonchi: Absent
– Wheezing: Absent
BMI
Controlled Terminologies
Controlled vocabularies or taxonomies
– Gene Ontology (gene products)• most cited, 450 per year in PubMed
• Total of 33000+ terms
– SNOMED CT (about 300K+ concepts)
– NCI Thesaurus , ICD-9/10, ICD-0-3, LOINC, MedlinePlus
– UMLS Metathesaurus (integration of 140+ vocabularies)• 2.3 million concepts
BMI
more Metathesaurus
• CUIs
• LUIs
• SUIs
• AUIs
BMI
Semantic Types and Relations
• NLM Semantic Network, the type system behind UMLS Metathesaurus– Semantic Types (135)
• Semantic Groups (15)
– Semantic Relations (54)
• Specialist Lexicon– Malaria, malarial
– Hyperplasia, hyperplastic
How do we extract named entities?
BMI
Metamap from NLM
Identify phrases: Use SPECIALIST parser
Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network
BMI
Output of syntactic analysis
• Syntactic Analysis – “ocular complications of myasthenia gravis” – Ocular (adj), complications (noun), of (prep),
myasthenia (noun), gravis (noun)
– gives noun phrases (NP): “Ocular complications” and “Myasthenia gravis”
– Prepositions are ignored
– In a given NP, you have a head and modifiers:• Ocular (mod) and complications (head)
• How about “male pattern baldness”?
BMI
Variant Generation
BMI
Variant Generation
BMI
Candidate identification
• Look for all variants in Metathesaurus strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring
• Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings– Optic complication
– Eyes complication
– Opthalmic complicated
– ….
BMI
Mapping and Evaluation
• So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate.
• Use several measures to compute a rank– Centrality (involvement of head)
– Variation (average of inverse distance scores)
– Coverage
– Cohesivness
BMI
Final Score
BMI
Metamap Options
• Types of variants: include or exclude derivational variants
• Word sense disambiguation– Discharge (bodily secretion VS release the
patient)
• Concept gaps– Obstructive apnea mapping to “obstructive
sleep apnea” or “obstructive neonatal apnea”
• Term processing– Process the input string as a single concept,
that is, don’t split it into noun phrases
BMI
Output options
• Human readable format
• XML format
• Restrictions based on certain vocabularies: consider only ICD-9
• Restrictions based on certain types: consider only pharmacological substances (i.e., drugs)
DEMO TIME: Daniel Harris
BMI
References
• An overview of Metamap: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang
• Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson
• Comparison of LVG and Metamap Functionality, Alan Aronson
• Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider