Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

19
Awakening Clinical Data: Semantics for Scalable Medical Research Informatics Satya S. Sahoo Division Medical Informatics Electrical Engineering and Computer Science Department Case Western Reserve University Cleveland, OH, USA

description

Health care data is growing at an explosive rate, with highly detailed physiological processes being recorded, high resolution scanning techniques (e.g. MRI), wireless health monitoring systems, and also traditional patient information moving towards Electronic Medical Records (EMR) systems. The challenges in leveraging this huge data resources and transforming to knowledge for improving patient care, includes the size of datasets, multi-modality, and traditional forms of heterogeneity (syntactic, structural, and semantic). In addition, the US NIH is emphasizing more multi-center clinical studies that increases complexity of data access, sharing, and integration. In this talk, I explore the potential solutions for these challenges that can use semantics of clinical data - both implicit and explicit, together with the Semantic Web technologies. I specifically discuss the ontology-driven Physio-MIMI platform for clinical data management in multi-center research studies. Further Details: http://cci.case.edu/cci/index.php/Satya_Sahoo Presentation at: Dagsthul Seminar: Semantic Data Management 2012 Author: Satya S. Sahoo

Transcript of Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Page 1: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Satya S. Sahoo Division Medical Informatics

Electrical Engineering and Computer Science Department Case Western Reserve University

Cleveland, OH, USA

Page 2: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Patient Reports

Polysomnograms 1-20GB each

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

500-600MB per patient per stay in EMU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

National Sleep Research Resource: 500 TB

Case Western EMU: 250 TB

Wireless Health Data source: CWRU School of Engineering

MRI: 50-100MB PET: 60-100MB

MRI, PET scans

143, 961 Patients per year (e.g. Emory)

~5.6 billion wireless connections and growing

Big Picture of Data in Clinical Research

Page 3: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Patient Reports

Polysomnograms 1-20GB each

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

500-600MB per patient per stay in EMU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

National Sleep Research Resource: 500 TB

Case Western EMU: 250 TB

Wireless Health Data source: CWRU School of Engineering

MRI: 50-100MB PET: 60-100MB

MRI, PET scans

143, 961 Patients per year (e.g. Emory) •  Ultra large volume of data and growing rapidly

•  Data is Multi-modal, Heterogeneous •  Heterogeneity: Syntactic, Structural, Semantic

~5.6 billion wireless connections and growing

Big Picture of Data in Clinical Research

Page 4: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Patient Reports

Polysomnograms

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

Exemplar: Sleep Medicine Research

Wireless Health Data source: CWRU School of Engineering

MRI, PET scans

Scalability in Medical Informatics: Beyond Volume

Page 5: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Patient Reports

Polysomnograms

source: PRISM project, BME dept CWRU

source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU

source: PRISM project CWRU

Epilepsy Monitoring Unit (EMU) Data

Pathology Reports, Tissue Bank

Exemplar: Sleep Medicine Research

Wireless Health Data source: CWRU School of Engineering

MRI, PET scans

•  Multi-Center Studies with differing administrative requirements – business logic

•  Dynamic data – grows over project duration •  Data Semantics as foundation to support a

wide spectrum of users – clinicians, nurse practitioners, research fellows

Scalability in Medical Informatics: Beyond Volume

Page 6: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

A Wish List for Scalable Clinical Data Management

•  Reconcile Data Heterogeneity – most critical to successful translational research o  Syntactic heterogeneity – less of a problem, data dictionaries

help o  Structural heterogeneity – problematic, XML somewhat helpful o  Semantic heterogeneity – a huge problem, ontologies to the

rescue? •  Provenance – essential for data quality, compliance, insight

o  Blood Oxygen Baseline: oxygen saturation during the first 15 or 30 seconds of sleep

o  Patient blood report last month cause of change in medication – Domain Provenance (not just tuple provenance)

•  Intuitive access to information – clinical trials eligibility, cohort identification

•  Scalable - Data sources, research partners added or removed dynamically

Page 7: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

A “not to do” list for Clinical Data Management

•  No Linked Open Patient Data – HIPAA, HITECH Act (US), Data Protection Act (UK) o De-identified data – IRB approval

•  Ontology as global schema – but no RDF o Vast majority as RDB o Practical issues with RDF – cannot be institution-

specific URI (privacy)

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch

Page 8: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological and Clinical Research

Sleep Domain Ontology

Any number of

new centers

FMA

OGMS …

SNOMED-CT

Clinical Researcher

Page 9: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Physio-MIMI: Enabling Scalable Medical Research

•  NCRR‐funded, multi‐CTSA site project: Sleep medicine as exemplar

•  Federated data management – scalable, adapts to changing data access policies

•  Ontology-driven: o Data mappings – Ontology class to data dictionary terms

(manually curated) o Drive query interface o Manage provenance

•  Privacy aware, IRB-compliant •  Collaboration among Case Western, U. of Michigan,

Marshfield Clinic and U. of Wisconsin, Madison o Now Harvard Medical School

Page 10: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Key Resource: Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts

Page 11: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Data Mappings: SDO to Data Dictionary

Physio-Map Module •  Visual interface •  Stores mappings in XML – moving towards rules •  Dynamically executed in response to user query

User Voting

Page 12: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Provenance: Contextual Metadata for Clinical Research

Slide courtesy: Remo Mueller

Page 13: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Provenance: To Trace Variations in Data and Results

Slide courtesy: Remo Mueller

Page 14: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Modified from slide courtesy: Remo Mueller

Page 15: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Provenance: Source information for Patient Data

Slide courtesy: Remo Mueller

Page 16: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Intuitive Query Interface: Ontology (SDO)-driven Visual Aggregator and Explorer (VisAgE)

DataSets

Ontology Concept – Type of Query Widget

Page 17: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

PhysioMIMI in National Sleep Research Resource

•  National Sleep Research Resource (NSSR) – scored and awaiting funding review

•  Collaboration between Harvard Medical School (domain experts) and Case Western (CS) with 15 projects o  50,000 sleep research studies – total size of 500TB

•  Semantic Data Integration – SDO and Sleep Provenance Ontology (extending W3C PROV Ontology PROV-O)

•  Signal processing tools – using a common format called European Data Format (EDF), XML-based

•  Domain analysis, cross-linking – secure Web access

Page 18: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Challenges: Semantics in Large Scale Clinical Data

•  Incentives for adopting RDF in clinical data management – what is already not possible in RDB?

•  OWL2, RDFS reasoning – Privacy aware reasoning, semantics-aware access control (Nguyen et al. 2012)

•  Missing Semantics? o  Variable, missing provenance in original study - re-

create provenance with (limited) provenance? o  Fine-level granularity for semantic annotation of

signal data – currently not scalable •  A little semantics does not go too far in clinical data

o  Need for greater involvement of Semantic Web community in development of EHR systems

Page 19: Awakening Clinical Data: Semantics for Scalable Medical Research Informatics

Acknowledgements •  Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi •  Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo,

Licong Cui, Chien-Hung Chen, Catherine Jayapandian •  Physio-MIMI Team: http://physiomimi.case.edu/ •  Contact Information: [email protected],

http://cci.case.edu/cci/index.php/Satya_Sahoo