An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((•...

104
1 Leonard D’Avolio Dina DemnerFushman Wendy W. Chapman 1 An Introduc=on to Clinical Natural Language Processing

Transcript of An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((•...

Page 1: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

1

Leonard  D’Avolio  Dina  Demner-­‐Fushman  Wendy  W.  Chapman    

 

1

An  Introduc=on  to    Clinical  Natural    

Language  Processing    

Page 2: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

2

Ques&ons  addressed  in  this  ½  day  tutorial        

 

 

•  What  is  natural  language  processing  (NLP)?  

•  Why  does  it  maDer?  

•  How  is  it  being  used?  •  What  are  the  basic  approaches  to  it?  

•  What  considera=ons  are  there  in  using  it?  

•  How  should  you  evaluate  it?  •  Where  is  the  field  today?  

•  Where  is  it  headed?  

•  How  can  I  learn  more?   2

Page 3: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

3

Format      

 

 

•  Focus  on  clinical  NLP  •  Some  discussion  of  literature  &  phenotyping  

•  70%  basic,  30%  intermediate  

•  A  lot  of  material  covered  at  a  high  level  

•  PLEASE  interrupt  with  ques=ons  •  Planned  15  minute  break  

•  Don’t  forget  your  survey  •  Part  2  in  Jefferson  East  

3

Page 4: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

4

Outline      

 

 

1.  What  is  NLP  and  how  is  it  used  in  medicine?  (Dina)  

2.  Goals  and  challenges  of  clinical  NLP  (Wendy)  

3.  The  methods  of  NLP  (Leonard)  

4.  Annota=on  &  evalua=on  (Dina)  5.  Implementa=on  considera=ons  (Wendy)  

6.  Current  state,  future  progress,  available  resources  (Leonard)  

4

Page 5: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

5 5

 Dina  Demner-­‐  Fushman  

What  is  NLP  and    how  is  it  being  used    

in  medicine?    

Page 6: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Why  natural  language  processing?  

•  Increasing  amounts  of  biomedical  literature  

–  Extrac=ng  facts,  rela=ons,  events    into  knowledge  repositories  (text  mining)  

–  Model  organism  database  cura=on  

–  Ques=on  answering  (TREC  Genomics  track)  

–  Literature  based  discovery  

•  Increasing  demands  for  use  of  EMR  data  –  Phenotyping  for  genomic-­‐

related  analysis  –  Linking  evidence  for  

Evidence-­‐based  medicine  –  Biosurveillance  –  Quality  measures  

•  Majority  of  EMR  data  is  free  text  

Page 7: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

7  

•  Classify  

•  Extract  

•  Summarize  

What  is  natural  language  processing?      Electronic  Medical  Records  

MEDLINE  Ar=cles  /  Abstracts  

Natural  Language  Processing  

Structured  Data  (Machine  

interpretable)  

Page 8: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

8  

Examples  of  Uses  of  Clinical  NLP    

•  Classify    

•  Extract  

•  Summarize  

BioNLP  Examples  •  Classify    

•  Extract  

•  Summarize  

Classify  a  chief  complaint  into  a  syndrome  category  

“SOB/cough”  =  Respiratory  

8  

Triage  of  ar=cles  likely  to  have  experimental  evidence    

Find  evidence  to    assign  top-­‐level  GO  terms  

Page 9: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

9  

Examples  of  Uses  of  Clinical  NLP    

#  of  lymph  nodes  removed  during  colorectal  cancer  surgery  

 

9  

Extract  bio-­‐molecular  events  

 phosphoryla=on  of  TRAF2  -­‐>  (Type:Phosphoryla=on,  

Theme:TRAF2)    

•  Classify    

•  Extract  

•  Summarize  

•  Classify    

•  Extract  

•  Summarize  

 

BioNLP  Examples  

Page 10: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

10  

Examples  of  Uses  of  Clinical  NLP    •  Classify    

•  Extract  

•  Summarize  

From  a  H&P  note,  list  chronic  condi=on  

Summarize  family  history  of  prostate  cancer  

10  

BioNLP  Examples  •  Classify    

•  Extract  

•  Summarize  

Summarize  full  text  documents    

 

Gene  Reference  into  func=on  (GeneRif)  

Page 11: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Biomedical  Usage  

11

Page 12: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

12 12

 Wendy  Chapman  

Goals  and  Challenges    of  Clinical  NLP  

 

Page 13: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Detect  Nosocomial  Infec=ons  

An=bio=c  Assistant*  (LDS  Hospital)    

 

*  Evans  RS,  et  al.  N  Eng  J  Med  1998  

temperature  

white  blood  cell  count  

infiltrate  compa=ble  with  pneumonia  

.  .  .  

1)  Alert  physician:  pa=ent  might  need  An=infec=ve  therapy  

2)  Suggest  type  and  dose  of  an=bio=c  -­‐  allergies  -­‐   insurance  -­‐   age  -­‐   renal  func=on  

infiltrate  compa=ble  with  pneumonia   chest  x-­‐ray  report  

13  

Page 14: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Phenotyping  Iden=fy  symptoms  that  co-­‐occur  with  lung  cancer  

ED    Report  

NLP  System  

Feature  1:  Feature  2:  

…  Feature  n:  

Classifier  

Predic=ve  of  Lung  Cancer  

Not  

14  

Page 15: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Two  Simple  NLP  Tasks  

1.  Find  all  relevant  phrases  in  ED  Report  

2.  Map  individual  phrases  to  standard  features  

Your  Task  

•  Highlight  every  instance  of  features  in  sample  report  

•  Mark  most  specific  instance  –  E.g.,  “chest  pain”  preferred  over  “pain”  

•  Do  not  mark  =me,  nega=on,  or  uncertainty  

 

 

15

Page 16: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Find  Relevant  Features  in  ED  Report  

Produc=ve  cough  

Dyspnea  

Sinusi=s  

Pneumonia  

Wheezing  

Tachypnea  

Fever  

Rales  

Cervical  adenopathy  

 

Possible  values:  acute,  historical,  absent  16

Page 17: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

How  did  you  do?  

17

Page 18: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Why  is  NLP  Difficult?  

Named  en=ty  recogni=on  

Linguis=c  varia=on  Polysemy  

Finding  valida=on  Implica=on  

Contextual  aDribute  assignment  

Nega=on    Uncertainty  Temporality  

Discourse  processing  

Report  structure  Coreference  

18

Page 19: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Linguis=c  Varia=on    Different  Words  with  the  Same  Meaning  

Deriva=on  medias=nal  =  medias=num  

Inflec=on  opacity  =  opaci=es;  cough  =  coughed  

Synonymy  Addison’s  Disease:  Addison  melanoderma,  adrenal  insufficiency,  adrenocor=cal  insufficiency,  asthenia  pigemntosa,  bronzed  disease,  melasma  addisonii,  …  

Chest  wall  tenderness:  chest  wall  did  demonstrate  some  slight  tenderness  when  the  pa=ent  had  pressure  applied  to  the  right  side  of  the  thoracic  cage   19

Page 20: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Polysemy  One  Word  With  Mul=ple  Meanings  

General  polysemy  Pa=ent  was  prescribed  codeine  upon  discharge  The  discharge  was  yellow  and  purulent  

Acronyms  and  Abbrevia=ons    APC:  ac=vated  protein  c,  adenomatosis  polyposis  coli,  adenomatous  polyposis  coli,  an=gen  presen=ng  cell,  aerobic  plate  count,  advanced  pancrea=c  cancer,  age  period  cohort,  alfalfa  protein  concentrated,  allophycocyanin,  anaphase  promo=ng  complex,  anoxic  precondi=oning,  anterior  piriform  cortex,  an=body  producing  cells,  atrial  premature  complex,  …  

20

Page 21: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Nega=on  Approximately  half  of  all  clinical  concepts  in  dictated  

reports  are  negated*  

Explicit  nega=on  “The  medias=num  is  not  widened”  

Medias=nal  widening:  absent  

Implied  absence  without  nega=on  “Lungs  are  clear  upon  ausculta=on”  

Rales/crackles:  absent  Rhonchi:  absent  Wheezing:  absent  

*Chapman  WW,  Bridewell  W,  Hanbury  P,  Cooper  GF,  Buchanan  BG.  Evalua=on  of  nega=on    phrases  in  narra=ve  clinical  reports.  Proc  AMIA  Sym.  2001:105-­‐9.  

21

Page 22: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Uncertainty  

Unsure  

 treated  for  a  presump=ve  sinusi=s  

Reasoning  

 It  was  felt  that  the  pa=ent  probably  had  a  cerebrovascular  accident  involving  the  lev  side  of  the  brain.    Other  differen=als  entertained  were  perhaps  seizure  and  the  pa=ent  being  post-­‐ictal  when  he  was  found,  although  this  considera=on  is  less  likely    

Reason  for  exam  

 R/O  out  pneumonia.   22

Page 23: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Temporality  Clinical  reports  tell  a  story  

Past  medical  history  History  of  CHF  presen=ng  with  shortness  of  lev-­‐sided  chest  pain.  

Hypothe=cal  or  non-­‐specific  men=ons  He  should  return  for  fever  or  increased  shortness  of  breath.  

Temporal  course  of  disease  Pa=ent  presents  with  chest  pain  …  Aver  administra=on  of  nitroglycerin,  the  chest  pain  resolved.  

23

Page 24: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Finding  Valida=on    Men=on  of  a  finding  in  the  text  does  not  guarantee  the  pa=ent  has  the  finding  

She  received  her  influenza  vaccine  His  temperature  was  taken  in  the  ED  

Some  findings  require  values  Fever  

Temperature  38.5C  Oxygen  desatura=on  

Oxygen  satura=on  low  Oxygen  satura=on  85%  on  room  air  

 24

Page 25: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Implica=on  

Audience  for  pa=ent  reports  is  physicians    Lay  people  less  accurate  at  determining  if  a  chest  x-­‐ray  report  shows  evidence  of  Pneumonia    Pneumonia  not  men=oned  in  2/3  of  posi=ve  reports  

Sentence  level  inference  “There  were  hazy  opaci=es  in  the  lower  lobes”  à  

Localized  infiltrate  Report  level  inference  

Localized  infiltrates  à  Probable  pneumonia  

25

Page 26: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Report  Structure  

Anatomic  Loca=on  some=mes  in  sec=on  header  NECK:  no  adenopathy.  

Some  sec=ons  carry  more  weight    IMPRESSION:  atelectasis  

Some  reports  contain  pasted  text  difficult  to  process  

 Cardiovascular:  [  ]  Angina  [  ]  MI  [x  ]  HTN  [  ]  CHF  [  ]  PVD  [  ]  DVT  [  ]  Arrhythmias  [  ]  Previous  PTCA  [  ]  Previous  Cardiac  Surgery  [  ]  Nega=ve  -­‐  Denies  CV  problems    

  26

Page 27: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Coreference  

 Chest  x-­‐ray  again  shows  a  well-­‐circumscribed  nodule  located  in  the  lev  upper  lobe.  The  tumor  has  increased  in  size  since  the  last  exam  with  a  diameter  of  approximately  2  cm.    How  big  is  the  nodule?  Has  the  nodule  increased  in  size?  Where  is  the  tumor?  

27

Page 28: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

References  "Mutalik PG, Deshpande A, Nadkarni PM. Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS. J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609."

"Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001 Oct;34(5):301-10."

"Uzuner O, Zhang X, Sibanda T. Machine learning and rule-based approaches to assertion classification. J Am Med Inform Assoc. 2009 Jan-Feb;16(1):109-15."

"Sneiderman CA, Rindflesch TC, Aronson AR. Finding the findings: identification of findings in medical literature using restricted natural language processing. Proc AMIA Annu Fall Symp. 1996:239-43"

"Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ. Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc. 2000 Nov-Dec;7(6):593-604."

"Zhou L, Melton GB, Parsons S, Hripcsak G. A temporal constraint structure for extracting temporal information from clinical narrative. J Biomed Inform. 2005 Sep 15."

"Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009 Oct;42(5):839-51."

"

 

28

Page 29: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

29 29

 Leonard  D’Avolio  

The  Methods  of    Clinical  NLP  

how  this  stuff  works  

Page 30: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

30

 Developing  /  using  NLP  is  a  process    The  NLP  Process        

 

Page 31: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

31

Find  the  right  documents  

The  NLP  Process        

 

Page 32: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

32

Create  the  “gold  standard”  

The  NLP  Process        

 

Page 33: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

33

Train  the  system  The  NLP  Process        

 

Page 34: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

34

Evaluate  the  system  

The  NLP  Process        

 

Page 35: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

35

Methods  of  NLP    

 

•  A  number  of  approaches  have  evolved    

•  Simple  rules-­‐based  

•  Symbolic,  gramma=cal  NLP  

•  Machine  learning  

•  NLP  can  be  considered  a  series  of  transforms  

Think  “PIPELINE”  

35

Page 36: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

36

Research  Scenario    Posi=ve  margins  aver  RRP  =  2  x  4  =mes  risk  of  cancer  recurrence  

Goal:    EXTRACT  MARGIN  STATUS  

36

Page 37: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

37

37

Page 38: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

38

Simple  Rules-­‐Based  Approach    

38

Page 39: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

39

Simple  Rules-­‐Based  Approach    

Heuris=cs,  Probabili=es,  Combina=on  of  the  two  

39

Page 40: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

40

Simple  Rules-­‐Based  Approach    

40

Page 41: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

41

Simple  Rules-­‐Based  Approach    

41

Page 42: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

42

Simple  Rules-­‐Based  Approach    

42

Page 43: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

43

Simple  Rules-­‐Based  Approach    

Pros  

Simple  

Regular  expressions  included  in  many  programming  languages  

Great  for  semi-­‐structured  (consistently  formaDed)  targets  

Cons  

PaDerns  must  consider  all  possible  configura=ons.  

43

Page 44: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

44

   

Symbolic  or  Gramma=cal  NLP  Approach    

Many  of  the  same  components…  

…plus…  

44

Page 45: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

45

   

Symbolic  or  Gramma=cal  NLP  Approach    

POS  tagging  &  phrase  chunking  are  ac=ve  areas  of  research  

45

Page 46: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

46

   

46

Some=mes  called  “concept  mapping”  

Symbolic  or  Gramma=cal  NLP  Approach    

Page 47: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

47

   

47

Symbolic  or  Gramma=cal  NLP  Approach    

Page 48: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

48

   

Pros  

Robust  –  reduces  complexity  by  mapping  to  standard  terms  

Great  for  mapping  large  numbers  of  concepts  

Cons  

Complex  –  more  steps,  more  opportuni=es  to  introduce  error  

Which  controlled  vocabulary?  

Can  be  slow  48

Symbolic  or  Gramma=cal  NLP  Approach    

Page 49: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

49

   

Classifica=on  Model  

Several  open  source  ML  packages  available  

(decision  trees,  SVMs,  neural  nets)  

49

Machine  Learning  Approach    

Page 50: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

50

   

50

Machine  Learning  Approach    

Page 51: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

51

   

Machine  Learning:    Which  ‘features’  to  learn  from?  

 

51

Page 52: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

52

   

Pros  

Targeted  approach  =  high  accuracy  

Capable  of  learning  from  examples  

Great  for  extrac=ng  few  predetermined  targets  

Cons  

Requires  manual  training  

New  target  =  new  training  effort  

52

Machine  Learning  Approach    

Page 53: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

53

   

Also  used  increasingly  in  POS  tagging  &  mapping  to  ontologies  

53

Machine  Learning  Approach  Not  limited  to  a  step  in  the  pipeline    

Page 54: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

54

   

What  if  RegExs  don’t  cut  it?  

Swap  them  out  for  Gramma=cal  NLP  approach  

54

The  Hybrid  Approach    

Page 55: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

References  

Natural  language  processing:  Manning  &  Schutze.  Founda=ons  of  Natural  Language  Processing.  MIT  Press.  1999  

Regular  Expressions:  Java  Tutorial,  hDp://java.sun.com/docs/books/tutorial/essen=al/regex/  

Machine  learning:  WiDen  &  Frank.  Data  Mining,  Prac=cal  Machine  Learning  Tools  and  Techniques  with  Java  Implementa=ons.  Academic  Press.  2001  

   

Page 56: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

56 56

Dina  Demner-­‐Fushman  

Annota=on  and    Evalua=on  

 

Page 57: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

   Manual  Annota=on        

•  Purposes  •  Levels  •  Guidelines  •  Methods  (manual,  assisted)    •  Tools  •  Format  (embedded/standoff)  •  Collec=on  size  (number  needed,  representa=ve  sample)  

•  Annotators  (linguists,  domain  experts,    crowdsourcing)  

•  Annotator  agreement  •  Preserva=on/dissemina=on/repor=ng   57

Page 58: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  purposes  

•  System  development    –  Rule  genera=on  (manual  or  automa=c)    –  Sta=s=cal  modeling  –  supervised  machine  learning  (training  +  valida=on/op=miza=on)  

•  Evalua=on  –  Tes=ng  on  a  held-­‐out  set  or  cross-­‐valida=on  

•  Clinical  data  quality  assurance  •  Reusable  collec=on  (corpus)  

58

Page 59: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  levels  •  Meta    –  informa=on  about  the  corpus  

•  Document    –  type,  relevancy  to  topic,  quality,  structure  

•  Pragma=c    –  purpose  of  a  sentence  interpreted  in  context  using  world  knowledge,  involves  inference  

•  Discourse    –  contextual  features,  links  between  instances  of  concepts,  or  concepts  across  sentences  

•  Seman=cs    –  formal  representa=on  of  meaning  using  concepts,  frames  

•  Syntax  –  part  of  speech,  phrases,  rela=ons  between  phrases  

•  Lexical     59

Page 60: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

60

Meta:    XYZ  hospital,  respiratory  problems,  …  Document:    Pa=ent  #13,  Discharge  summary  #  1,  …  

Annota=on  levels  example  

Page 61: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  guidelines  

•  Define  task  and  annota=on  purpose  •  Be  clear  •  Be  concise  •  Avoid  bias  •  Itera=vely  refine  using  representa=ve  sample  

•  Come  to  consensus  

•  Finalize  before  annota=ng  reference  standard  

61

Page 62: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  methods    Trade-­‐off  of  manual  vs  assisted  annota=on  

Bias  

Accuracy  /  consistency  

Speed-­‐up  

Training  

 

62

Page 63: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  tools  

•  Read  and  write  formaDed  text  (markup  language)  

•  Allow  to  define/  link  annota=on  schema  

•  Provide  for  span  selec=on  &  markup  (color-­‐coding)  

•  Minimize  annota=on  steps  and  naviga=on  

•  Link  ontologies  •  Compute  inter-­‐annotator  agreement  

•  Provide  for  reconcilia=on  of  annotator  disagreement  

•  Provide  web-­‐service/API  63

Page 64: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

References  Linguis=c  annota=on:  Wynne  M  (editor).  2005.  Developing  

Linguis4c  Corpora:  a  Guide  to  Good  Prac4ce.  Oxford:  Oxbow  Books.  Available  from  hDp://ahds.ac.uk/linguis=c-­‐corpora/  

Issues:  Hovy  E,  Lavid  J.  Corpus  annota=on  tutorial.  hDp://www.lrec-­‐conf.org/lrec2008/IMG/pdf/Corpus_annota=on.Tutorial-­‐outline.pdf  

Clinical  text  AMIA  NLP-­‐SIG  annota=on  project  Available  from  hDp://understandit.net/r02.01.11/index.php?=tle=Annota=onProjectAnnota=onSchema    

Annotator  agreement:  Hripcsak  G,  Rothschild  AS.  Agreement,  the  f-­‐measure,  and  reliability  in  informa=on  retrieval.  J  Am  Med  Inform  Assoc.  2005  May-­‐Jun;12(3):296-­‐8.  hDp://www.ncbi.nlm.nih.gov/pmc/ar=cles/PMC1090460/  

Page 65: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Overview  Judges  Metrics  

Evalua=on  methods  Large-­‐scale  evalua=ons  

65

Dina  Demner-­‐Fushman  

Evalua=ng  NLP  

Page 66: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Evalua=on  roots  Human/biomedical  studies  

Subjects  Outcomes/Sta=s=cs  

Sovware/NLP  evalua=on  Quality  of  the  algorithm  Quality  of  implementa=on  Quality  of  results  

Human-­‐computer  interac=on  Usability  tes=ng  

Heuris=c  User-­‐centered  Scenario-­‐based  

66

Page 67: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

What  is  evaluated?  

Sovware    System  components  Black/Glass-­‐box  (results/algorithm  and  implementa=on)  

Task-­‐specific  (intrinsic/extrinsic)  Manual/automa=c  

Applica=on    Interface  (HCI)  

Qualita=ve/quan=ta=ve  Access  (API,  service)  

Impact/Outcome  Healthcare  process  Pa=ent’s  experience  

67

Page 68: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Judges:  who  is  evalua=ng?  

•  Experts  vs.  convenience  popula=on  vs.  end-­‐users  •  How  many?  

•  Consensus  (reliability,  agreement)  vs.  pyramid  

•  Capturing  judgments  in  reusable  test  collec=ons  

68

Page 69: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

69 69

Evalua&on  Metrics    

 

Reference  Standard  

NLP  output    

posi&ve   nega&ve  

posi&ve     a  (TP)     b  (FP)  

nega&ve   c  (FN)   d  (TN)  

Recall  (Sensi=vity)  =  a  /  (a  +  c)    Precision  (PPV)  =  a  /  (a  +  b)    Fall-­‐out  (1-­‐Specificity)  =  b  /  (b  +  d)  =  1  -­‐  d/(b+d)  

69

Page 70: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

70 70

Evalua&on  Metrics    

 

F-­‐measure:  harmonic  mean  of  precision  and  recall      

What  if  enumera=ng  all  true  posi=ve/nega=ve  examples  is  not  possible  or  prac=cal?  

Mean  average  precision,  binary  preference  70

Reference  Standard  

NLP  output    

posi&ve   nega&ve  

posi&ve     a  (TP)     b  (FP)  

nega&ve   c  (FN)   d  (TN)  

Page 71: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Sovware  evalua=on  

•  Establish  strong  baseline  –  For  extrac=on  of  pa=ent-­‐oriented  outcomes  from  MEDLINE  abstracts  selec=ng  

3  last  sentences  achieves  75%  accuracy  

•  Select  evalua=on  metrics  appropriate  for  the  task  –  U=lity  measure  for  text  categoriza=on  (Genomics  track)  –  “This  measure  contains  coefficients  for  the  u=lity  of  retrieving  a  relevant  and  

retrieving  a  nonrelevant  document  normalized  by  the  best  possible  score”  hDp://ir.ohsu.edu/genomics/2005protocol.html  

71

Page 72: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

End-­‐user  evalua=on  

Use  Log  files  observa=on  Surveys  

Impact  Cost  /  =me  Outcomes  

72

Page 73: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Community-­‐wide  evalua=ons  

Format    Post-­‐hoc  (  TREC  )  Gold  standard  provided    Pros/cons  cost  vs.  coverage  

Clinical  I2b2  cmc  

73

Page 74: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

References  Friedman  CP,  WyaD  JC.  Evalua=on  Methods  in  Biomedical  Informa=cs.  2nd  

ed.,  New  York:  Springer,  2006.  

Sparck-­‐Jones  K,  Galliers  JR.  Evalua=ng  Natural  Language  Processing  Systems.  Springer,  1996.  

van  Rijsbergen  C.J.  Informa=on  Retrieval,  2nd  ed.  London:  BuDerworths,  1979.  hDp://www.dcs.gla.ac.uk/Keith/pdf/Chapter7.pdf  

Hripcsak  G,  Wilcox  A.  Reference  standards,  judges,  and  comparison  subjects:  roles  for  experts  in  evalua=ng  system  performance.  J  Am  Med  Inform  Assoc.  2002  Jan-­‐Feb;9(1):1-­‐15.    Available  online  from  hDp://www.ncbi.nlm.nih.gov/pmc/ar=cles/PMC349383  

Passonneau  RJ,  Nenkova  A.  Evalua=ng  Content  Selec=on  in  Human-­‐  or  Machine-­‐Generated  Summaries:  The  Pyramid  Scoring  Method  hDp://www1.cs.columbia.edu/~library/TR-­‐repository/reports/reports-­‐2003/cucs-­‐025-­‐03.pdf  

Page 75: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

75 75

 Wendy  Chapman  

Implementa=on    Considera=ons  

Page 76: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

76 76

Implemen&ng  NLP    

• Ge�ng  an  NLP  system  up  and  running  •  Case  study    

76

Page 77: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

77

Preprocessing   Post-­‐processing  NLP  System  

Page 78: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

The  devil  is  in  the  details  

Remove  extraneous  characters  control  characters  foreign  characters  (é)  

Remove  extra  line  feeds,  etc.  

 pul-­‐_monary  

Preserve/enhance  sec=on  labels  “IMPRESSION:_”  

Reformat  to  improve  readability  

De-­‐iden=fy  

 

Preprocessing   Post-­‐processing  NLP  System  

Obtain  source  feeds  

Assess  completeness  

De-­‐duplicate  

Clean,  “sec=onize,”  format  

De-­‐iden=fy  

Load  database  

Hand-­‐off  to  NLP  system  

Quality  assurance  

Slide  courtesy  David  Carrell  78

Page 79: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Preprocessing   Post-­‐processing  NLP  System  

Obtain  source  feeds  

Assess  completeness  

De-­‐duplicate  

Clean,  “sec=onize,”  format  

De-­‐iden=fy  

Load  database  

Sample  

Hand-­‐off  to  NLP  system  

Quality  assurance  

Human Subjects/IRB

Source system manager

Network/database administrator

Programmer

Investigator

Informatics/NLP expert

Clinician (“domain expert”)

Chart abstractor

Slide  courtesy  David  Carrell  A  lot  of  tasks  and  a  lot  of  people  79

Page 80: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

80 Slide  courtesy  David  Carrell  

Page 81: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Which  CUIs  map  to  Produc4ve  Cough?  

Which  combina=on  of  radiological  findings  &  aDributes  =  evidence  of  acute  bacterial  pneumonia?  

Does  the  pa=ent  have  a  recurrent  breast  cancer?  

    81

Preprocessing   Post-­‐processing  NLP  System  

Map  NLP  output  to  your    vocabulary  and  

your  task    

Instance  

Report  

Pa=ent  

Page 82: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Case  Study  Case-­‐control  observa=onal  GWAS  study        

 Hypothesis  Biomarkers  in  pa=ents  with  prostate  cancer  can    be  used  to  predict  =me  to  survival,  informing  course  of  care      

82

Page 83: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Targeted  Phenotype    

   

•  Prostate  cancer  •  Co-­‐morbidi=es  •  Basic  demographics  •  Disease  characteris=cs  

•  TNM  Staging  •  Gleason  score  •  PSA  

•  Treatments  administered  •  Surgery  •  Chemo  •  Watchful  wai=ng  

83

Page 84: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

84

The  NLP  Process        

 

Defining  the  target        

 

Page 85: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Data  challenges    

   

Prostate  cancer  •  ICD-­‐9  codes  don’t  cut  it    

•  VA  Boston:  18%  of  path  reports  60  days  before  /  aver  1st  ICD-­‐9  were  prostate  cancer  related  

•  No  standardized  =tles  on  path  reports  •  Biopsy?  •  Post-­‐op?  

 Into  NLP  just  to  find  the  right  documents  Phenotyping  with  NLP  is  really  several  projects  in  one    

 

85

Page 86: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

86

The  NLP  Process        

 

Extrac=ng  key  variables        

 

Page 87: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Annota=on  challenges    

   

TNM  Staging  Gleason  score  PSA  

•  Gleason  score  different  on  post-­‐op  than  biopsy  •  Pathological  vs.  es=mate  

•  Pa=ent  level  vs.  document  level  •  PSA  at  4  visits  •  Conflic=ng  Gleason  scores  

87

Page 88: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

   

•  Start  with  a  wish  list  and  whiDle  down  •  Cost  vs.  benefit  will  become  clear  

•  Define  categorical  variables    •  Versus  highligh=ng  strings  

•  Create  clear  instruc=ons  •  Training  •  Pilot,  pilot,  pilot  

•  Plan  for  several  itera=ons  

88

Page 89: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

89

The  NLP  Process        

 

Crea=ng  your  training  /  test  sets        

 

Page 90: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Designing  your  “gold  standard”      

•  Several  variables  =  several  measures  of  accuracy  •  What  if  tumor  staging  F  –  measure  is  .97  but  co-­‐

morbidi=es  is  .6?  •  Effects  must  be  accounted  for  in  study  design      

90

Page 91: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

91

The  NLP  Process        

 

Extrac=ng  key  variables        

 

Page 92: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

NLP  algorithm  development      

•  Are  you  reinven=ng  the  wheel?*  •  Is  it  important  that  it  scale  

•  Other  projects?  •  Beyond  your  ins=tu=on?    

*PraD,  AW.  1969“Automated  processing  of  medical  English”  Interna=onal  Conference  on  Computa=onal  Linguis=cs,  Sweden  

92

Page 93: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

93 93

Current  state,  future  progress,  available  

resources    

 Leonard  D’Avolio  

Page 94: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

94

State  of  the  Science    

 

NLP  is  not  “off  the  shelf”  • Opportunity  to  reduce  effort  

Several  approaches  can  yield  similar  performance  

• i2b2  challenge  First  increase  in  open  source  components  

• Weka,  MMTx,  Stanford  parser  

• Lots  of  ‘glue  code’  Now  increase  in  open  source  frameworks  

• GATE,  UIMA  

End-­‐to-­‐end  informa=on  retrieval  using  open  source  frameworks  

•   ARC   94

Page 95: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

95

Progression  of  Field—More  Resources    

 

95

“Closed”    Concept  Mapping  Systems  •     MedLEE  •     Knowledge  Map  •     MVCS  

 

Open  Components  •     Stanford  Parser  •     IBM  Parser  •     OpenNLP  •     Weka  (ML)  •     MALLET  (ML)  •     UMLS  •     NegEx  (nega=on)  

 

Open  Frameworks  •     UIMA  •     GATE  

 

Open    Concept  Mapping  Systems  •     MetaMap  •     HITEx  (GATE)  •     Topaz  (GATE)  •     cTAKES  (UIMA)  •     MedKAT  (UIMA)    

Open  Corpora  •     Cincinna=    •     PiDsburgh  NLP          Repository  •     i2b2  •     MIMIC  1  &  2  

Open  IR  Systems  •     ARC  (UIMA  +  MALLET)  

 Tools  Registries  •     RDS  •     ORBIT  •     Eagle-­‐I    Hosted  Environments  •     iDASH  •   VINCI  

 

Page 96: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

96

Future  of  NLP    

 

Informa=on  quality  –  context  is  key  •  Error  propagates  in  pipelines  •  Informa=on  not  captured  for  our  secondary  uses  •  Scrap  idealized  test  sets  

Greater  code  reuse      •  Less  glue  code  •  Will  allow  focus  on  improving  specific  components  

Increase  in  open  source  data  sets  &  shared  task  challenges    Drive  adop=on  of  NLP  

•  More  data  driving  greater  demand  /  new  uses  •  Reduce  current  dependency  on  system  developers    

96

Page 97: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Current  Process  

D’Avolio  et.  al.  “Evalua=on  of  a  generalizable  approach  to  informa=on  retrieval  using  the  Automated  Retrieval  Console.”  2011.  17(4)  

Page 98: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

What  it  should  be  

D’Avolio  et.  al.  “Evalua=on  of  a  generalizable  approach  to  informa=on  retrieval  using  the  Automated  Retrieval  Console.”  2011.  17(4)  

Page 99: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

99

Best  approach  to  NLP?  

VS.  

99

Page 100: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

100

100

Best  approach  to  NLP?  

Page 101: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

101

101

Worst  approach  to  NLP?  

Page 102: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Resources  WEKA:  hDp://www.cs.waikato.ac.nz/ml/weka/  

MALLET:  hDp://mallet.cs.umass.edu/  

MetaMap:  hDp://mmtx.nlm.nih.gov/  

UMLS:  hDp://www.nlm.nih.gov/research/umls/  

OpenNLP:  hDp://opennlp.sourceforge.net/  

HITEx  (hosted  by  i2b2):  hDps://www.i2b2.org/resrcs/hive.html  

cTAKES:  hDps://cabig-­‐kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documenta=on_and_Downloads    

UIMA:  hDp://incubator.apache.org/uima/  

GATE:  hDp://gate.ac.uk/  

ARC:  hDp://research.maveric.org/mig/arc.html  

 

 

 

Page 103: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

Resources  (cont)  Topaz:  hDp://www.dbmi.piD.edu/blulab/resources.asp#Topaz  

NegEx:  hDp://code.google.com/p/negex/  

ConText:  hDp://www.dbmi.piD.edu/chapman/ConText.html  

Cincinna=  Pediatric  Corpus:    hDp://www.computa=onalmedicine.org/project/cpc.php  

PiDsburgh  NLP  Repository:  hDp://www.dbmi.piD.edu/blulab/nlprepository.html  

MIT  MIMIC  Repository  (structured  and  unstructured):  

 hDp://mimic.mit.edu/mimic-­‐ii-­‐database.html  

ORBIT  Project:  hDp://orbit.nlm.nih.gov/  

iDASH:  hDp://iDash.ucsd.edu  

 

 

 

Page 104: An(Introduc=on(to(( Clinical(Natural(( … Ques&ons!addressed!in!this!½!day!tutorial!!! (((• Whatis(natural(language(processing((NLP)?(• Why(does(itmaer?(• How(is(itbeing(used?

104

Contact  informa&on    

 

Leonard  D’Avolio,  PhD  Associate  Center  Director  for  Biomedical  

Informa=cs    MAVERIC,  VA  Boston  Healthcare  System  [email protected]    

Wendy  Chapman,  PhD  Division  of  Biomedical  Informa=cs  University  of  CA,  San  Diego  [email protected]  

   

Dina  Demner-­‐Fushman,  MD,  PhD  Na=onal  Library  of  Medicine  [email protected]    

       

 

104