Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted...
![Page 1: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/1.jpg)
Determining the Syntactic Structure of Medical Terms in Clinical Notes
Bridget T. McInnesTed Pedersen
Serguei V. Pakhomov
![Page 2: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/2.jpg)
Goal
The goal of this presentation is to present a simple but effective approach to identify the
syntactic structure of three word terms
![Page 3: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/3.jpg)
Importance
Potentially improve the analysis of unrestricted medical text Mapping of medical text to standardized
terminologies
Unsupervised syntactic parsing
![Page 4: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/4.jpg)
Syntactic Structure of Terms
w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3
Monolithic
Non-branching Right-branchingLeft-branching
blue = independencegreen = dependence
![Page 5: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/5.jpg)
Example
small bowel obstruction
![Page 6: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/6.jpg)
Syntactic Structure of Example
small bowel obstruction
small bowel obstruction small bowel obstruction small bowel obstruction small bowel obstruction
Monolithic
Non-branching Right-branchingLeft-branching
![Page 7: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/7.jpg)
Method used to determine the structure of a term
The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur
Probability of Term Occurring-----------------------------------
Expected Probability of Term
![Page 8: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/8.jpg)
Log Likelihood Ratio
The expected probability of a term is often based on the Non-branching (Independence) Model
P(small bowel obstruction)-----------------------------------
P(small) P(bowel) P(obstruction)
EXPECTED PROBABILITY
OBSERVED PROBABILITY
![Page 9: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/9.jpg)
Extended Log Likelihood Ratio
The expected probabilities can be calculated using two other hypothesis (models)
Non-branching Right-branchingLeft-branching
P(small)P(bowel)P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)
![Page 10: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/10.jpg)
Three Log Likelihood Ratio Equations
P(small bowel obstruction)-----------------------------------
P(small) P(bowel) P(obstruction)
P(small bowel obstruction)-----------------------------------
P(small bowel) P(obstruction)
P(small bowel obstruction)-----------------------------------
P(small) P(bowel obstruction)
Non-branching
Right-branching Left-branching
![Page 11: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/11.jpg)
Expected Probability
The expected probability of a term differs as does the Log Likelihood Ratio
Non-branching Right-branchingLeft-branching
P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)
LL = 11,635.45 LL = 5,169.81 LL = 8,532.90
![Page 12: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/12.jpg)
Model Fitting
The model with the lowest Log Likelihood Ratio best describes the underlying structure of the
term
Non-branching Right-branchingLeft-branching
P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)
LL = 11,635.45 LL = 5,169.81 LL = 8,532.90
![Page 13: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/13.jpg)
ReCap
The Log Likelihood Ratio is calculated for each possible model Non-branching
Right-branching
Left-branching
The probabilities for each model are obtained from a corpus
The term is assigned the structure whose model has the lowest Log Likelihood Ratio
![Page 14: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/14.jpg)
Test Set
Contains 708 three word terms from the SNOMED-CT
73 terms
Monolithic
Non-branching Right-branchingLeft-branching
6 terms 378 terms251 terms
![Page 15: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/15.jpg)
Test Set (cont)
Syntactic structure of each term was determined through the consensus of two medical text index experts (kappa = 0.704)
The probabilities were obtained from over 10,000 Mayo Clinic clinical notes
![Page 16: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/16.jpg)
Monolithic Results
Left branching Right branching Our Method0
10
20
30
40
50
60
70
80
Agreement
Technique
Per
cen
tag
e ag
reem
ent
wit
h h
um
an e
xper
ts
35.5
53.4
74.8
![Page 17: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/17.jpg)
Results without Monolithic Terms
Left branching Right branching Our Method0
10
20
30
40
50
60
70
80
Agreement
Technique
Per
cen
tag
e ag
reem
ent
wit
h h
um
an e
xper
ts
39.5
59.5
83.5
![Page 18: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/18.jpg)
Limitations
Monolithic structures possibly identify through collocation extraction or
dictionary lookup
As the number of words in a term grows so does the number of hypothesis (models) to be evaluated only consider adjacent models
limit the length of the terms to 5 or 6 words
![Page 19: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/19.jpg)
Conclusions
Present a simple but effective method to identify the structure of three word terms
The method uses the Log Likelihood Ratio
Could be extended to identify the structure of for four, five and six word terms
![Page 20: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/20.jpg)
Future Work
Improve accuracy of method explore other measures of association
Chi-squared, Phi, Dice coefficient ...
incorporate multiple measures together
Extend our method to four and five word terms difficulty: finding a test set
![Page 21: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/21.jpg)
Thank you
Software:
Ngram Statistic Package (NSP)www.d.umn.edu/~tpederse/nsp.html
Log Likelihood Ratio Modelswww.cs.umn.edu/~bthomson/mti.html
![Page 22: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/22.jpg)
Log Likelihood Equation
2 * ∑xyz ( nxyz * log(nxyz / mxyz) )
![Page 23: Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d425503460f94a1d2f1/html5/thumbnails/23.jpg)
Expected Values
2 * ∑xyz ( nxyz * log(nxyz / mxyz) )
Non-branching: mxyz = nx++ * n+y+ * n++z / n+++
Left-branching: mxyz = nxy+ * n++z / n+++
Right-branching: mxyz = nx++ * n+yz / n+++