I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
-
Upload
meghan-gibbs -
Category
Documents
-
view
226 -
download
1
Transcript of I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.
![Page 1: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/1.jpg)
I2B2 Shared Task 2011
Coreference Resolution in Clinical Text
David Hinote Carlos Ramirez
![Page 2: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/2.jpg)
What is coreference resolution?
• Nouns, pronouns, and phrases that refer to the same object, person or idea are coreferent.o Example: "Alexander was playing soccer yesterday. He
fell and broke his knee."o "Alexander", "he", and "his" refer to the same person, so
they are said to be coreferent.
![Page 3: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/3.jpg)
The i2b2 Challenge
I2B2: "Informatics for Integrating Biology and the Bedside"• This program has issued a challenge in NLP involving
Coreference resolution.• The challenge is to find co-referential relations within a given
medical document. • The concepts that can be corefered are all annotated.• There are 5 classes of concepts:
o Problemo Persono Treatmento Testo Pronoun
![Page 4: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/4.jpg)
Concept MentionsPeople• Any mention that refers to a person, or group of people
o Dr. Lightman, The patient, cardiologyProblems• A mention that refers to the reason the subject of the
document is in the hospitalo Heart attack, blood pressure, broken leg
Tests• Tests performed by doctors
o EKG, temperature, CAT scanTreatments• Solutions to the problem mentions, or work performed to cure
patientso Brain surgery, ice pack, Tylenol
Pronouns can refer to any of the four other types of mentions
![Page 5: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/5.jpg)
Approaches for Competing
• Using tools already made & publicly availableo Stanford NLPo BART Coreferenceo LingPipeo CherryPickero Reconcileo ARKRefo Apache Open NLP
• Coding our own Coreference Tool
![Page 6: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/6.jpg)
Other Coreference Tools
• We obtained versions of other Coreference tools and tested them on our data.
• All tools we found were either still in their initial development stages, or were built for their specific purpose and left alone after. (i.e. Coreference on the MUC datasets)
• Testing shows that at best, the other tools we found do not perform acceptably with our data.
• After attempts to train other tools using our data failed, we felt it best to code our own approach.
![Page 7: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/7.jpg)
Other Tools Statistics
![Page 8: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/8.jpg)
Algorithm
• Because the data we are working on is so specific, we chose to use a rule based approach to coreference resolution.
• This means that we try to learn the characteristics of each coreferent link ourselves, and program a method for the link manually.
• We examine concepts in a file, and if they meet our criteria, we create a method to link them.
• The idea is to create specific rules, yet generalized enough to apply to similar mentions in all documents.
![Page 9: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/9.jpg)
Our Application
• To help visualize coreferent links and see what links our program detects, we use a GUI created with Java.
• Our program is developed by us using the Mecurial version control system to allow us to keep each others code up to date.
• Uses our coded algorithms to determine coreferent links between the given concept mentions.
• It displays coreferent links as lines.o Blue for true links.o Red for links that are detected by our algorithm.
![Page 10: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/10.jpg)
Our Application
Programmed in Java, our application can utilize databases, and the internet to gather information about concept mentions being tested.• We have set up a database to hold data that gives meaning
to concept mentions being tested, or to certain key words in a sentence that contains a mention. If words or phrases meet our criteria, they can be added to the appropriate table straight from the program window.
• For each mention, information is extracted by the program from Google.com searches as well, which can give the program a wealth of information about the mentions.
![Page 11: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/11.jpg)
Sample file• Viewing Concepts & I2B2 Chain
![Page 12: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/12.jpg)
File with both UHD and I2B2 Links Shown
![Page 13: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/13.jpg)
Statistics for our System
![Page 14: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/14.jpg)
Progress
• We are currently at around 75% F1 score. (Averaged over all test files.)
• Most algorithms for resolving coreference tend to have accuracy in the 60% range.
• With the time we have left, we will definitely increase this score.
• We still haven't added detection for "Treatment" type concepts, which constitute a significant percentage of the concepts not found when computing our F1 score.
• Detection for "Test" type concepts still needs work.
![Page 15: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/15.jpg)
Current work
Test Mentions• Precision on "Test" type concepts is relatively low (30%).• Mainly this is because many of the tests involve specific
body parts (e. g. "chest x-ray" and "chest CT" are sometimes linked by our rules).
• Tests also often involve times (e. g. "an x-ray was performed on 5 Aug." would link with "the x-ray on... December 10, 2010").
• They also involve position (e.g. "x-ray on left lung" "x-ray on right lung")
![Page 16: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/16.jpg)
Current Work
Problem Mentions• Work on these mentions is about 50% complete• To finish, a few more database tables will need to be set up,
and certain types of medical vocabulary loaded into them.• We will also need a system for finding phrases made of
different words, but mean the same thing AKA a thesaurus
![Page 17: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/17.jpg)
Possible future problems
• The main risk with a rule based approach is that our rules might be too specific to work with the contest data once it's distributed.
• Given the execution speed of our program, we should have enough time to do any necessary modifications in the three days between contest data being sent and results submitted.
• There is also a slight problem with the fact that our application is made for a very specific purpose and is probably hard to generalize beyond the context of medical documents.
• Most coreference resolution tools are this way though.• Not being able to code fast enough!
![Page 18: I2B2 Shared Task 2011 Coreference Resolution in Clinical Text David Hinote Carlos Ramirez.](https://reader033.fdocuments.us/reader033/viewer/2022042718/56649ee15503460f94bf2440/html5/thumbnails/18.jpg)
Future Necessities• A reliable way to find the temporal setting of a particular
sentence. o Did an injury described happen 20 years ago, or is the
doctor giving instructions for a future case? These are not coreferent even though they may be the same word
• Thesaurus worko finding phrases that mean the same thing, but
use completely different words• Output
o The program will not output files in the I2B2 competition format, we will have this feature made as the competition deadline draws near.