Panel: Automatic Clinical Text De-Identification: Is It ... · Kvinna med hjrtsvikt,...

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?

Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV) [email protected]

Background

•  Starting 2007 •  Karolinska University Hospital, Stockholm •  Greater Stockholm (City Council) 2 million inhabitants •  1800 beds/inpatients •  550 clinical units

Hercules Dalianis, MEDINFO 2013 2

TakeCare EPR system

•  Swedish electronic patient record system, now owned by CompuGroup Medical

•  Centralized, text file based •  Built on APL programming language •  Data transferred to MySQL database to make it

manageable (Intelligence)


Ethical permission

•  What type of research will be carried out •  How will it be carried out •  No social security number •  No personal names •  Safe guard of data


Encryption and safe guard

•  Encrypted server •  Password protected •  Locked into an alarmed room •  Server locked to a rack •  No Internet connection •  Few people have access to this server (that have

to sign security paper) => Probably safer than at the hospital


Trust, Trust and more Trust •  Good contacts with hospital management •  They decide for the whole hospital/all clinical units •  No psychiatric or veneric diseases, no paperless refugees


•  We obtained 1 million patient records from 550 clinical units from the year 2006-2010

•  In several extracts that also continue •  Each patient have an unique social security

number, from birth to dead Replaced by a serial number

•  All patient names removed •  The rest including sensitive text is present


Stockholm EPR Corpus

DEID work

•  Yes, we did it also to obtain an overview of what problems may occur

•  We followed HIPAA*) but adapted it for Swedish conditions

*) Health Insurance Portability and Accountability Act


Hercules Dalianis

The Stockholm EPR PHI*) corpus

•  100 electronic patient records (EPRs) in Swedish

•  Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition

•  20 patients from each clinic, 50% men, 50% women •  380 000 tokens •  Three annotators annotated the whole corpus

*) Protected Health Information 9

Hercules Dalianis 10

28 PHI-classes

•  Account_Number, Age, Age_Over_89, Biometric_Identifier, Date_Part, Full_Date, Year,

First_Name, Last_Name, Patient_First_Name,

Patient_Last_Name, Relative_First_Name,

Relative_Last_Name, Clinician_First_Name,

Clinician_Last_Name, Location, Country, Municipality,

Organization, Street_Address, Town, Health_Care_Unit,

Device_Identifier_and_Serial_Number, Ethnicity,

Fax_Number, Phone_Number, Relation, Uncertain

Consensus eight annotation classes

•  Age •  Date_Part •  Full_Date •  First_Name •  Last_Name, •  Health_Care_Unit •  Location •  Phone_Number


Annotation classes and instances

•  Age 56 •  Full date 710 •  Date part 500 •  First name 923 •  Last name 928 •  Location 1 021 •  Health care unit 148 •  Phone number 135 Sum: 4 421


•  380 000 tokens •  4 421 sensitive instances •  ~ 1 percent sensitive information


Eight annotation classes training and test using Stanford NER-CRF


•  0.95-0.74 precision, •  0.83-0.36 recall •  0.90-0.49 F-score •  The 8 annotation classes and the words •  The rest is Black box

–  Window breadth –  Distance between words etc


Conditional Random fields à la Stanford NER

Research on Stockholm EPR Corpus

•  DEID and Resynthesis •  Factuality level detection of diagnoses •  Negation detection •  Detecting the amount of hospital-acquired

infections (HAI) •  Detection of adverse drug events •  Comorbidities


Conclusion

•  Preferably to work on original data •  Too costly and difficult to de-identify data •  Not safe enough •  De-identification makes the data too noisy.


References

•  Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson. 2009. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics (2009), doi:10.1016/j.ijmedinf.2009.04.005

•  Dalianis, H. and S. Velupillai. 2010. De-identifying Swedish Clinical Text - Refinement of a Gold Standard and Experiments with Conditional Random Fields, Journal of Biomedical Semantics 2010, 1:6 (12 April 2010)


•  Alfalahi, A., S. Brissman and H. Dalianis. 2012. Pseudonymisation of person names and other PHIs in an

annotated clinical Swedish corpus. In the Proceedings of the

Third Workshop on Building and Evaluating Resources for

Biomedical Text Mining (BioTxtM 2012) held in conjunction

with LREC 2012, May 26, Istanbul, pp 49-54


Comorbidities in Comorbidity-view

•  Which ICD-10 codes co-occur with which other ones



Comorbidity View


123 H - IVA 322916614D 2007-08-21 9:12 1944 Kvinna Anamnesis Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med.

Example record (Anonymized manually)

23 H - IVA 322916614D 2008-08-21 10:54 1944 Kvinna Bedömning Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509



(English translation) 123 H - IVA 322916614D 2008-08-21 9:12 1944 Woman Anamnesis

Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan.


123 H - IVA 322916614D 2008-08-21 10:54 1944 Woman Assessment/Plan Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) 20-25%. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509.

Automatic Clinical Text De-Identification: Is It Worth It, and

Could It Work for Me?

Stéphane M. Meystre Biomedical Informatics, University of Utah, USA Hercules Dalianis Computer and Systems Sciences, Stockholm University, Sweden

Pierre Zweigenbaum ILES, LIMSI-CNRS, France

Medinfo 2013Copenhagen, August 23, 2013

De-identificationPrivacy and confidentiality of clinical dataIn the U.S., the HIPAA (Health Insurance Portability and Accountability Act) protects the confidentiality of patient data.The Common Rule protects the confidentiality of research subjects. These laws typically require the informed consent of the patient and approval of the IRB to use data for research purposes, but these requirements are waived if data are de-identified.

De-identification means that explicit identifiers are hidden or removed.Often used interchangeably with anonymization, but the latter implies that the data cannot be linked to identify the patient (i.e., de-identified is often far from anonymous). Scrubbing is also sometimes used as a synonym of de-identification.

De-identification (cont.)According to the HIPAA, the Safe Harbor Methodology requires the following PHI to be removed:

1. Names2. All geo-subdivisions smaller

than a State3. All elements of dates

(except year)4. Phone numbers5. Fax numbers6. Electronic mail addresses7. Social Security numbers8. Medical record numbers9. Health plan beneficiary

numbers

10.Account numbers11.Certificate/license numbers12.Vehicle identifiers and serial numbers13.Device identifiers and serial numbers14.Web Universal Resource Locators15.Internet Protocol address numbers16.Biometric identifiers, including finger and

voice prints17.Full face photographic images and any

comparable images18.Any other unique identifying number,

characteristic, or code

De-identification (cont.)Manual text de-identification is a lengthy and costly process (about 90 s per document).

NLP can be used to automatically de-identify electronic clinical documents.

Several NLP-based applications have been developed for clinical text de-identification, but:

• they are developed for one or a few clinical note types,• in a specific institution or specialty,• to detect and remove/hide certain categories of PHI only...Overall, their generalizability is a problem, but a problem that can be improved.

PresentersHercules Dalianis, PhD

Professor in Computer and Systems Sciences, at the Stockholm University, Sweden.De-identifying Swedish health records

Pierre Zweigenbaum, PhD

Director of Research at the CNRS, in the LIMSI, Orsay, France.De-identification of French clinical records

Stéphane Meystre, MD, PhD

Assistant Professor in Biomedical Informatics, at the University of Utah, USA.De-identification of clinical documents at the U.S. VHA, and issues related with de-identification (impact, risk for re-identification)

Automatic VHA Clinical Text De-Identification

Stéphane M. MeystreBiomedical Informatics, University of Utah


VA clinical data de-identificationVA Center for Healthcare Informatics Research (CHIR) de-identification project:

National project to advance the methodology for automated de-identification of patient data with a systematic approach of evaluating existing de-identification systems, exploring innovative methods and techniques for de-identification, and combining the best-performing ones in a best-of-breed application.

Also includes the evaluation of the level of anonymity of de-identified clinical notes, and the impact of text de-identification on subsequent uses of the clinical notes.

Existing data de-identification evaluationLiterature review of related publications:Large variety of PHI categories detected

Existing data de-identification evaluationLiterature review of related publications:Large variety on methods used.

"Out-of-the-box" evaluation:

Text de-identification systems- Rule-based systems:

•

HMS Scrubber (Beckwith et al., 2006);•

MeDS (Friedlin and McDonald, 2008); and•

MIT deid system (Neamatullah et al., 2008).

- Machine learning-based systems: •

MITRE Identification Scrubber Toolkit (MIST) (Aberdeen et

al., 2010) •

Health Information DE-identification (HIDE) system

(Gardner and Xiong, 2009).

Traditional NER system• Stanford NER system (Finkel et al., 2005)

Existing data de-identification evaluation

"Out-of-the-box" evaluation (cont.):

Training:- Rule-based systems run "out-of-the-box"

- Machine learning-based systems trained with other corpus of 225 randomly selected VHA clinical documents, manually annotated for PHI (names).

- Stanford NER system run with trained models available with its distribution.

Testing with corpus of 50 randomly selected VHA clinical documents, manually annotated for PHI (names).


"Out-of-the-box" evaluation results:

System Precision Recall F2-measure

HMS Scrubber

MeDS

MIT deid

MIST

HIDE

Stanford NER

0.150 0.675 0.397

0.149 0.768 0.419

0.636 0.893 0.826

0.865 0.319 0.356

0.975 0.376 0.429

0.692 0.723 0.716


Our "best-of-breed" approach (BoB)

Ferrandez O, South BR, Shen S, Friedlin FJ, Samore MH, Meystre SM. BoB, a best-of-breed automated text de-identification system for VHA clinical documents. JAMIA. 2012 Sep 4.


Pre-processing



Pre-processing

High-sensitivity extraction component



Pre-processing

High-sensitivity extraction component

False positives filtering component


NLP pre-processing:• Sentence segmentation (adapted from OpenNLP and

retrained models with VHA clinical text)• Tokenization• Part-of-speech tagging (adapted from OpenNLP and cTAKES

trained models)• Phrase chunking (adapted from OpenNLP and cTAKES

trained models)• LVG normalization (NLM development)

Our "best-of-breed" approach

High-sensitivity extraction component:Mostly based on rules (context keywords and regex patterns) and dictionary lookups (Lucene with common English words and frequently occurring names from the 1990 U.S. Census).



Dependent on the quality of the patterns and dictionary completenessPHI formats and instances not supported will be missed!!



Add machine learning based on sequence labeling (traditional NER tasks): Stanford coreNLP library (CRF) trained to recognize person names (using our VHA training corpus).Goal is to maximize recall, even if precision is altered.

Dependent on the quality of the patterns and dictionary completenessPHI formats and instances not supported will be missed!!


False positives filtering component:Based on machine learning classifiers

• Classifies candidate annotations as true or false positives• Support Vector Machine classifier (LIBSVM, RBF kernel) and various features (lexical, morphological, syntactic, and method used to detect PHI (name) candidate)

PersonNames

Trainingmodel

Positive training examplescorrect annotations derived from the high-sensitivity extraction component

Negative training examplesincorrect annotations derived from the high-sensitivity extraction component


Summative evaluation with reference standard of 800 VHA clinical notes (500 training, 300 testing):

Evaluation of BoB

PHI categories MITR

Patient Name 0.590Relative Name 0.600Healthcare Provider Name

0.319Other Person Name 0.111Street City 0.828State Country 0.689Deployment 0.057ZIP Code 1Healthcare Units 0.008Other Organizations 0.033Date 0.399Age > 89 0.250Phone Number 0.494Electronic Address 1SSN 1Other ID Number 0.117Overall macro-averaged

0.468

Precision 0.311Recall 0.350F1-measure 0.329F2-measure 0.341

Ove

rall

µ-av

erag

ed

Some PHI categories have very low recall because of missing rules/patterns or dictionary entries.


Evaluation of BoB

PHI categories MIT RulesR R

Patient Name 0.590 0.972Relative Name 0.600 0.960Healthcare Provider Name

0.319 0.920Other Person Name 0.111 1Street City 0.828 0.962State Country 0.689 0.953Deployment 0.057 1ZIP Code 1 1Healthcare Units 0.008 0.832Other Organizations 0.033 0.824Date 0.399 0.963Age > 89 0.250 1Phone Number 0.494 0.989Electronic Address 1 1SSN 1 1Other ID Number 0.117 0.978Overall macro-averaged

0.468 0.960

Precision 0.311 0.362Recall 0.350 0.928F1-measure 0.329 0.521F2-measure 0.341 0.707

Ove

rall

µ-av

erag

ed

Rules/patterns and dictionary entries specific to VHA clinical notes were required (e.g., date pattern for formats like ‘09/09/09@1200’), and dictionary fuzzy-matches were also added.


Evaluation of BoB

PHI categories MIT Rules CRF Rules+CRFR R R R

Patient Name 0.590 0.972 0.953 0.992Relative Name 0.600 0.960 0.960 0.960Healthcare Provider Name

0.319 0.920 0.898 0.963Other Person Name 0.111 1 0.667 1Street City 0.828 0.962 0.872 0.974State Country 0.689 0.953 0.757 0.973Deployment 0.057 1 -- 1ZIP Code 1 1 -- 1Healthcare Units 0.008 0.832 0.755 0.914Other Organizations 0.033 0.824 0.549 0.912Date 0.399 0.963 0.917 0.977Age > 89 0.250 1 -- 1Phone Number 0.494 0.989 -- 0.989Electronic Address 1 1 -- 1SSN 1 1 -- 1Other ID Number 0.117 0.978 -- 0.978Overall macro-averaged

0.468 0.960 -- 0.977

Precision 0.311 0.362 -- 0.346Recall 0.350 0.928 -- 0.961F1-measure 0.329 0.521 -- 0.509F2-measure 0.341 0.707 -- 0.709

Ove

rall

µ-av

erag

ed

CRFs allowed detecting PHI missing in rules/patterns or dictionaries,but added significant noise.


Evaluation of BoB

PHI categories MIT Rules CRF Rules+CRF BoB fullBoB fullR R R R R P

Patient Name 0.590 0.972 0.953 0.992 0.980

0.707Relative Name 0.600 0.960 0.960 0.960 0.920 0.707Healthcare Provider Name

0.319 0.920 0.898 0.963 0.9430.707

Other Person Name 0.111 1 0.667 1 0.888

0.707

Street City 0.828 0.962 0.872 0.974 0.943 0.679State Country 0.689 0.953 0.757 0.973 0.878 0.751Deployment 0.057 1 -- 1 0.887 0.859ZIP Code 1 1 -- 1 1 1Healthcare Units 0.008 0.832 0.755 0.914 0.811 0.836Other Organizations 0.033 0.824 0.549 0.912 0.725 0.578Date 0.399 0.963 0.917 0.977 0.971 0.934Age > 89 0.250 1 -- 1 1 0.8Phone Number 0.494 0.989 -- 0.989 0.956 1Electronic Address 1 1 -- 1 1 1SSN 1 1 -- 1 1 0.964Other ID Number 0.117 0.978 -- 0.978 0.917 0.831Overall macro-averaged

0.468 0.960 -- 0.977 0.926 0.841

Precision 0.311 0.362 -- 0.346 0.8360.836Recall 0.350 0.928 -- 0.961 0.9220.922F1-measure 0.329 0.521 -- 0.509 0.8770.877F2-measure 0.341 0.707 -- 0.709 0.9040.904

Ove

rall

µ-av

erag

ed

Oscar Ferrandez Escamez (University of Utah, now Nuance)

Brett South (University of Utah and SLC VA)

Shuying Shen (University of Utah and SLC VA)

Jeffrey Friedlin (Regenstrief Institute)

Matthew Maw (SLC VA)

Matthew Samore (University of Utah and SLC VA)

Funding by VA HSR&D (CHIR; HIR 08-374)

Questions and comments:

[email protected]

Acknowledgments

Thank you!

mailto:[email protected]:[email protected]

Quality of De-Identification, and Impact on Clinical Information

Stéphane M. MeystreBiomedical Informatics, University of Utah


PHI content varies significantly between various clinical corpora:

Generalizability of de-identification

De-identification applications tested “out-of-the-box” with our VHA corpus: low performance!

•Rule-based systems reach 32-26% recall and 14-42% precision (fully-contained matches, one overall PHI category)

•Machine learning-based systems reach 28-30% recall and 56-58% precision (trained with the i2b2 deid corpus)

Generalizability of de-identification

The VHA training and testing corpora• Variety of clinical notes (stratified random sample)• Annotated for all HIPAA categories, some VHA-specific categories (deployment locations, units), and eponyms

• 500 documents for training, 300 documents for testing

The 2006 i2b2 de-identification challenge corpus• Discharge summaries from Partners Healthcare, de-identified and PHI resynthesized with "± realistic" surrogates

• Selection of PHI categories subset of HIPAA (Patient, Doctor, Hospital, IDs, Dates, Phone numbers, Ages)

• 669 documents for training, 220 documents for testing

Generalizability evaluation

Applications training and testing:

Train

Test

VHAVHAVHA

VHAVHAVHA

All / Some / No dictionaries*

*Dictionaries used by MIST and HIDE

Generalizability evaluation (cont.)


Train

Test

VHAVHAVHA

VHAVHAVHA


i2b2i2b2i2b2

i2b2i2b2i2b2

No dictionaries*




Train

Test

VHAVHAVHA

VHAVHAVHA


i2b2i2b2i2b2

i2b2i2b2i2b2

No dictionaries* No dictionaries*

i2b2i2b2i2b2

VHAVHAVHA



Results (VHA corpus)

MIST* HIDE** BoB

Overall micro-avg (PHI level)Overall micro-avg (PHI level)Overall micro-avg (PHI level)Overall micro-avg (PHI level)

Overall macro-avg(PHI-type level)

Precision 0.926 0.933 0.836

Recall 0.888 0.863 0.922F1-measure 0.907 0.897 0.877F2-measure 0.895 0.877 0.904

Recall 0.737 0.729 0.926

* Best MIST configuration, with no dictionaries** Best HIDE configuration, with selected dictionaries


Results (VHA corpus)

Training with our VHA corpus, and testing with the i2b2 corpus

MIST HIDE BoB

Overall micro-avg (PHI level)Overall micro-avg (PHI level)Overall micro-avg (PHI level)Overall micro-avg (PHI level)

Overall macro-avg(PHI-type level)

Precision 0.705 0.712 0.691

Recall 0.749 0.576 0.820F1-measure 0.726 0.637 0.750F2-measure 0.740 0.599 0.790

Recall 0.610 0.461 0.664

Results (VHA/i2b2 corpora)

MIST and HIDE with no dictionaries


Training with our VHA corpus, and testing with the i2b2 corpus

Results (VHA/i2b2 corpora)

i2b2 corpus and combination with VHA evaluation:• Training and testing with i2b2 corpora allows for good

performance, even if dictionaries less useful (BoB's CRF-based NER helped here).

• Generalizability remains an issue for all systems, when training with one corpus type, and training with another one. Not one system achieved good results (overall macro-averaged recall 46-66%).

• BoB’s design still reaches our goal, with the highest recall among the three systems, and obtaining similar precision results.


Some clinical information is more likely to be mistakenly considered as PHI.

Eponyms for example could easily be considered as person names. In our corpus, they represent various categories of clinical information:• Procedures and signs (40% of eponyms): Hartmann, Nissen,

Roux, Whipple, Apgar, Babinski, etc.

•Diseases (36%): Alzheimer, Addison, Asperger, Basedow, Crohn, Cushing, Graves, Hodgkin, Parkinson, Raynaud, etc.

•Devices (18%): Adson, Foley, Kelly, Swan-Ganz, etc.•Anatomical structures (6%): Achilles, His, Langerhans, etc.

Impact on Clinical Information

Overlap of 2010 i2b2 challenge concepts and BoB PHI annotations:849 concepts overlapped partly with PHI annotations, reaching an average of 1.78% of all concept annotations.

Partial overlapPartial overlapPartial overlap

i2b2 categories

i2b2 annot.

PHI overlap #

Eponyms Overlap [%]

Problem 19667 187 18 0.95

Test 13833 180 41 1.30

Treatment 14185 482 53 3.40

Impact on Clinical Information (cont.)

Partial overlap details:Problem Test Treatment No match

Clinical Eponyms 18 41 53 156Person Names 162 103 383 3074Street or City 2 1 3 433State or Country 12 12 18 905Deployment 0ZIP code 0Healthcare Unit Name 17 53 1289Other Organization Name 9 15 196Date 4 20 1 5436Age > 89 13Phone Number 153Electronic Address 0SSN 0Other ID Number 7 18 9 919Total matches 187 180 482 0No match 19466 13626 13675


Partial overlap details (cont.):Most overlap happened with Person Names annotations:

Most frequent overlap examples:Person Names - Treatment: Colace, Lopressor, Senna, Contin...Person Names - Problem: MR, E.Coli, Pseudomonas, Addison...Person Names - Test: Apgars, Papanicolaou, SP Stickney, Hct...

PHI i2b2 categ. Overlap %Person Names Treatment 45.11Person Names Problem 19.08Person Names Test 12.13


Partial overlap details (cont.):Most overlap happened with Person Names annotations:

Most frequent overlap examples:Person Names - Treatment: Colace, Lopressor, Senna, Contin...Person Names - Problem: MR, E.Coli, Pseudomonas, Addison...Person Names - Test: Apgars, Papanicolaou, SP Stickney, Hct...

PHI i2b2 categ. Overlap %Person Names Treatment 45.11Person Names Problem 19.08Person Names Test 12.13

76.33% overall


Even an efficient text de-identification system can mistakenly consider clinical information as PHI. This overlap is only 1.78% if considering even partial matches.Another study by Deleger et al. compared automated medications extraction from clinical text before and after de-identification. They found no significant difference.We comparing SNOMED-CT concept annotations by cTAKES before and after de-identification, we found 1.2-3% of concepts lost, depending on de-identification accuracy (partly significant difference). Most concepts “lost” were false positives (e.g., “VA” recognized as “vertebral artery”).

Deleger L, Molnar K, Savova G, Xia F, Lingren T, Li Q, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. JAMIA. 2012. Aug.2


All methods to assess the risk for re-identification were applied to a small number of structured and coded data (demographics, location), not to narrative text (work by Malin, El Emam etc.).Clinical notes are rich in clinical and social information that can be unique and could be used to re-identify a patient.This risk is significant (23% of 2010 i2b2 corpus documents have unique ICD-9-CM or CPT codes), but limited by access to other identified data sets with clinical codes.

How to limit this risk? Exclude narrative text from de-identified data sets?Require controlled access and data use agreements?Apply anonymization techniques to non-PHI content?

Risk of re-identification

De-identificationof French Clinical Texts :The LIMSI Experiments

Cyril Grouin Pierre Zweigenbaum

LIMSI-CNRSOrsay, France

MEDINFO 2013 Panel on De-IdentificationCopenhagen, 23/8/2013

De-Identification of French Clinical TextsPrevious Work

I Ruch et al. (2000)I Grouin (2002)I Grouin et al. (2009)I Proux et al. (2009)

LIMSI Experiments in De-Identification

Expert-based methodsI Localization of DE-ID to process FrenchI MEDINA

Machine-Learning methodsI CRF-based entity detection

Cross-corpus experimentsI Cardiology (discharge reports, hospital 1)I Fetopathology (multiple report types, OCR’ed,

hospital 2)I Mixed (multiple report types, hospital 3)

Expert-based Methods : LocalizationDe-id (Neamatullah et al., 2008)

I Starting from DE-ID, which de-identifies Englishclinical texts

I LexiconsI Patterns

I Translated the lexiconsI Started to translate the patterns, but

I too much dependence on language (word order, etc.)I program not written with localization in mind

I Decided to stop and to develop a new system

Expert-based MethodsMEDINA (Grouin et al., 2009)

I LexiconsI General lexicon : inflected forms, lemma, POSI Specific lexicons :

I townsI first namesI last names

I Apply through exact match

I PatternsI Character propertiesI Trigger wordsI Neighborhood of already (de-)identified entities

Machine-learning MethodsConditional Random Fields (see Grouin, MEDINFO 2013)

I Linear-chain CRFI Wapiti (Lavergne et al., 2010)I http ://wapiti.limsi.fr/

I Features :

surface features : token, capitalization, digit,punctuation, length

morpho-syntactic : POS via TreeTaggersemantic types : lexicon, CUI via UMLSdistributional analysis : clustering via Brown et al.’s

(1992) algorithmI Automatic feature selection : L1 regularization

http://wapiti.limsi.fr/

Evaluation : Cardiology and Fetopathology

Cardiology Corpus

P R F ConfidenceRule-based 0.855 0.830 0.843 [0.821, 0.864]CRF 0.909 0.858 0.883 [0.864, 0.901]

Fetopathology Corpus (OCR’ed, no adaptation)

P R F ConfidenceRule-based 0.678 0.684 0.681 [0.633, 0.729]CRF 0.732 0.565 0.638 [0.585, 0.692]

Cardiology Corpus (details)

Rule-based CRFDates (238) 0.920 0.874 0.897 0.987 0.946 0.966Last names (205) 0.903 0.907 0.905 0.892 0.883 0.887First names (109) 0.777 0.927 0.845 0.822 0.890 0.855Hospital (43) 0.500 0.372 0.427 0.931 0.628 0.750Town (22) 0.688 0.500 0.579 0.632 0.545 0.585Zip codes (8) 1.000 1.000 1.000 1.000 0.750 0.857Phone (8) 1.000 1.000 1.000 0.857 0.750 0.800

Cardiology vs New, Varied Corpus

P R FMEDINA-Rules

Detection 0.862 0.825 0.846Typing 0.846 0.804 0.824

CRF-otherDetection 0.929 0.798 0.858Typing 0.529 0.428 0.473

CRF-test 10×cvDetection 0.991 0.934 0.962Typing 0.959 0.876 0.916

Limitations

I Size of annotated corporaI More precisely, number of training examples

I Should handle “boilerplate” material differentlyI Address in headerI Signature in footer

I Lexicons are always incompleteI Lexicon features may however receive high

confidenceI which may prevent classifier from learning

features with better generalization power

Types of featuresGeneralization power

Current token is a clue : learn specific names, locations,etc.Smith

Current token is in a lexical_class : lexicons of names,locations, etc.Michael|Paul|Laura|. . .

Context of current token is a clue : Dr. xxxxxxxxxx , Ph.D.xxxxx has undergone

Current token belongs to a class : xxxxxCapitalizedxxxxxNNPxxxxxdrug see also lexicon

Context_of_current token belongs to a class : xxxxxNNPxxxxx

De-Identification and Loss of Information

I A recurring comment / question during presentationsI Does de-identification remove information ?

I Removing identifying pieces of informationI PseudonymizationI Date shifting

I Different goals for de-identificationI Perform Natural Language Processing researchI Publish case reportI . . .

I Inside hospital information systemI Extracted information should be handled

as other structured informationI Apply standard procedures for structured data

Thank you

A23_967_MEDINFO2013_Hercules-Deid-panel-Medinfo-aug-23-2014A23_967_MEDINFO2013_Deid-Medinfo2013A23_967_MEDINFO2013_ZweigenbaumMEDINFOPANEL2013

Panel: Automatic Clinical Text De-Identification: Is It ... · Kvinna med hjrtsvikt,...

Documents

Transcript of Panel: Automatic Clinical Text De-Identification: Is It ... · Kvinna med hjrtsvikt,...