Text Mining on Free-Text Based Anatomic Pathology Information Systems: A Front-End Data Integration...

11
Text Mining on Free- Text Based Anatomic Pathology Information Systems: A Front-End Data Integration Approach Zhuang Zuo, MD, PhD Carole W. Boudreaux, MD Department of Pathology University of South Alabama College of Medicine, Mobile, AL ([email protected])

Transcript of Text Mining on Free-Text Based Anatomic Pathology Information Systems: A Front-End Data Integration...

Text Mining on Free-Text Based Anatomic Pathology

Information Systems: A Front-End Data Integration Approach

Zhuang Zuo, MD, PhD

Carole W. Boudreaux, MD

Department of Pathology

University of South Alabama College of Medicine, Mobile, AL

([email protected])

The Needs

To find cases by searching pathology report database using free-text keywords:

• Research: case series

• Education: conferences

• QA/QC

• Correlations among services

• And more

Limitations of Current LIS

Cerner Classic

• Proprietary database

• Pathology reports stored as free-text

• Not support free-text search

• SNOMED search blocked

• Bandwidth near saturated

Design

• Relational database

• Front-end data integration

• Automatic data extraction

• Free-text search in multiple fields

• Support Boolean search

• No change to current LIS

• HIPAA compliance

Technology

• Cerner Classic LIS

• WRQ Reflection 2 terminal emulation software with VBScript

• Microsoft SQL Server, Access

• Microsoft Visual Basic

LIS

DataExtraction

DataCleaning

DataIntegration

PC

DataMining

GUI

SQLQuery

Interface

save the result as an html file

A sample search

Module switch panel

search within previous result

navigate history search results

get all reports of a matched patient

A sample HTML output

Results• 377,172 anatomic pathology reports• 198,187 unique medical record numbers

cytopathology(51.6%)

surgical pathology(40.3%)

dermatopathology(6.3%)

hematopathology (1%) neuropathology (0.3%)

autopsy (0.5%)

Results

• Searches using diagnosis and/or protocol keywords in combination with general case information tested on a PC – results returned almost instantly. – search results were thorough, specific, and

significantly faster and of higher quality than SNOMED searches on Cerner LIS.

• Data mining performed on 228 pediatric autopsy reports produced high quality results.

• Ongoing development: synoptic input, and CoPathPlus migration.

Conclusions• A relatively simple solution to transform unstructured

data into relational data. • Extending query capabilities of legacy LIS. • Front-end data integration

– zero impact to the existing LIS system– functions independently – flexible and easily adaptable across platforms.

• A potential middleware solution for third part software, data transferring and data mining.

• Readily scalable: from a Microsoft Access database application to an enterprise database Web application.