Post on 01-Apr-2015
Automatic Mapping of Clinical Documentation
to SNOMED CT
Holger Stenzhorn Saarland University Hospital, Homburg, Germany
Edson Pacheco
Percy Nohama
Stefan Schulz Freiburg University Medical Center, Germany
Federal Technological University of Paraná, Brazil
Introduction Methods Results Conclusion
Background
• Important role of narrative content in the EHR
• Manual coding: cost, quality and scope problems
• Increasing demand for high-quality structured data
• SNOMED CT as a new terminological standard claims to
represent the whole clinical process
Can language technology help semantically enrich narratives in the Electronic Health Record ?
Introduction Methods Results Conclusion
Case study
• Source:
– discharge summaries from the
cardiology department of the
Hospital de Clínicas de
Porto Alegre, Brazil
– Language:
Portuguese
• Target
– SNOMED Clinical Terms, 01/2009
– Languages: English, Spanish
Introduction Methods Results Conclusion
Sample Discharge Summary
# HAS # DM # Miocardiopatia dilatada chagásica (FE 35%) # Ca de prostata -
orquiectomia (2004) # Cardiopatia isquêmica - IAM em 2005, com colocação de
stent em DA e lesão severa inoperável em CD Pct vem a emergência em 20/03
com quadro de dor torácica típica, sem elevação enzimática, com diagnóstico
de angina instável e fibrilação atrial não identificada em avaliações prévias.
Adicionalmente, apresentava descompensação do diabetes com sindrome
hiperosmlar não cetótica. Recebe tratamento clínico para otimização do quadro
e é submetido a novo cateterismo em 28/03, que demonstra CD ocluída no
terço proximal, DA com stent rpoximal com lesão de 40% no seu interior e Mg
de Cx com lesão de 60-65%. Recebe alta em bom estado geral, sem dor
torácica, anticoagulado, com plano de retorno ambulatorial para equipe de
cardiopatia isquêmica e para o ambulatório de anticoagulação.
Acronyms
Abbreviations
Punctuationerrors
Typing errors
Telegram Style
Introduction Methods Results Conclusion
Introduction Methods Results Conclusion
NLP pipeline
sentencedetecting
spellchecking
acronymexpansion
NErecognition
POStagging
NPextraction
contextdetection
morpho-semanticabstraction
SCT - EN
SCT - SP
subsetcreation
morpho-semanticabstraction
MID-RepresentationSNOMED CT
MID-Representation
Term candidates
Introduction Methods Results Conclusion
Language processing tools implemented
• Sentence splitter, POS tagger: openNLP, trained with
manually annotated texts
• Acronym expander: RegExp matching against acronym
database, disambiguation by local context (token
cooccurrence in a three token window)
• Noun phrase detector: driven by typical POS patterns in
Spanish SNOMED CT descriptions (with few adaptations to
Portuguese, due to the similarity between the two
languages)
• Not yet implemented: spell checker, NE-recognizer,
context (e.g. negation) detector
Introduction Methods Results Conclusion
Morphosemantic Abstraction
• Using MSI (morphosemantic indexing) toolkit
(Averbis GmbH, Freiburg)
• Extraction of significant word fragments (subwords) and mapping to
semantic identifiers (MIDs):
• #derm = {heart, cardiac, herz, kard, corac, cardiac, coeur, … }
• #inflamm = { inflamm, -itic, -itis, -phlog, entzuend, -itis, inflam, flog, inflam, flog, ... }
• Thesaurus ~ 21.000 equivalence classes
• Lexicon entries:– English: ~23.000– German: ~24.000– Portuguese: ~15.000– Spanish : ~11.000– French: ~ 8.000– Swedish: ~10.000– Italian: ~ 4.000
muscle
myo
muskel
muscul
inflamm
-itis
inflam
entzünd
Eq Class
subword herzheart
card
corazon
card
INFLAMMMUSCLE
HEART
Introduction Methods Results Conclusion
Methods: NLP pipeline
sentencedetecting
spellchecking
acronymexpansion
NErecognition
POStagging
NPextraction
contextdetection
morpho-semanticabstraction
SCT - EN
SCT - SP
subsetcreation
morpho-semanticabstraction
MID-RepresentationSNOMED CT
MID-Representation
Term candidates
MappingHeuristics
Introduction Methods Results Conclusion
SNOMED CT Concepts as Subwords
SNOMED CTConcept Description
MIDs
ENG: Congestive heart failure
#abund #cardiac #deficien
ENG: Congestive heart disease
#abund #cardiac #disorder
ENG: Congestive cardiac failure
#abund #cardiac #deficien
SPA: Insuficiencia cardíaca
#insuff #cardiac
SPA: Insuficiencia cardíaca congestiva
#insuff #cardiac #abund
Introduction Methods Results Conclusion
Mapping heuristics
• For each term candidate
• decide whether there is a matching SNOMED description
• if yes, find the best SNOMED description
• map to the pertaining SNOMED description
• Preference criteria:
• matching with “term-typical” POS patterns
• MID coincidence (weighted by tf-idf)
• threshold: 60%
• In case of failure: test whether term candidate corresponds
to two SNOMED concepts. Plausibility of concept
coordinations using SNOMED relationship table
Introduction Methods Results Conclusion
Gold standard (kappa = 0.89)
Introduction Methods Results Conclusion
First results
Number of tokens (MIDs) Correct Mappings2 66%3 71%4 80%5 89%6 79%7 80%8 75%9 45%10 25%
Introduction Methods Results Conclusion
ConclusionConclusion
• Work in progress– Encouraging preliminary results
– SNOMED mapping possible across language boundaries
• Future work– Implement and test pipeline elements not implemented so
far
– Measure impact of each pipeline element for mapping quality
– Scientific challenges:
• Automated context (e.g. plan, order, negation) identification
• Use of SNOMED CT’s ontological structure for improving mapping result
Introduction Methods Results Conclusion
Acknowledgements
• German Research Foundation (DFG)
• International Bureau of the
German Ministry of Research (BMBF-IB)
• Brazilian National Research Council (CNPq)
• Hospital de Clínicas de Porto Alegre (HCPA)
• Averbis GmbH, Germany