Grid And Healthcare For IOM July 2009
-
Upload
ian-foster -
Category
Health & Medicine
-
view
833 -
download
1
description
Transcript of Grid And Healthcare For IOM July 2009
Grid computing and health information sharing
— A platform proposal —
Ian FosterDirector, Computation Institute
Chan Soon-Shiong Scholar
U. Chicago & Argonne Natl Lab
National Coalition For Heath Integration
Carl KesselmanCo-Director
Center for Health Informatics
University of Southern California
2
Responding to a pandemic
3
Addressing urban health
needs
4
Important characteristics
We must integrate systems that may not have worked together before
These are human systems, with differing goals, incentives, capabilities
All components are dynamic—change is the norm, not the exception
Processes are evolving rapidly too
We are not building something simple like a
bridge or an airline reservation system
5
Healthcare is acomplex adaptive system
A complex adaptive system is a collection of individual
agents that have the freedom to act in ways that are not
always predictable and whose actions are interconnected
such that one agent’s actions changes the context
for other agents.
Crossing the Quality Chasm, IOM, 2001; pp 312-13
Non-linear and dynamic Agents are independent
and intelligent Goals and behaviors
often in conflict Self-organization through
adaptation and learning No single point(s) of
control Hierarchical decomp-
osition has limited value
6
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
Zone of
complexity
7
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
8We call these groupingsvirtual organizations (VOs)
Healthcare = dynamic, overlapping VOs, linking Patient – primary care Sub-specialist – hospital Pharmacy – laboratory Insurer – …
A set of individuals and/or institutions engaged in the controlled sharing of
resources in pursuit of a common goal
But U.S. health system is marked by
fragmented and inefficient VOs with
insufficient mechanisms for
controlled sharing
I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser
9
The Grid paradigm
1995 2000 2005 2010
Principles and mechanisms for dynamic VOs Leverage service oriented architecture (SOA) Loose coupling of
data and services Open software,
architecture
Computer science
Physics
Astronomy
Engineering
Biology
Biomedicine
Healthcare
10
The Grid paradigm and healthcare information integration
Radiology Medical records
Name data and move it around
Make data usable and useful
Make data accessible over the network
Pathology Genomics Labs
Man
ag
e w
ho ca
n d
o w
hat
RHIOData
sources
Platform services
11
The Grid paradigm and healthcare information integration
Transform data into knowledge
Radiology Medical records
Management
Integration
Publication
Enhance user cognitive processes
Incorporate into business processes
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
12
The Grid paradigm and healthcare information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services
13
We partition the multi-faceted interoperability problem
Process interoperability Integrate work across healthcare
enterprise Data interoperability
Syntactic: move structured data among system elements
Semantic: use information across system elements
Systems interoperability Communicate securely, reliably
among system elements
Analysis
Management
Integration
Publication
Applications
14
Security and policy:Managing who can do what
Familiar division of labor
Publication level: bridge between local and global
Integration level: VO-specific policies, based on attributes
Attribute authorities
Identity-based authZMost simple - not scalable
Unix Access Control Lists (Discretionary Access Control: DAC)
Groups, directories, simple admin
POSIX ACLs/MS-ACLs
Finer-grained admin policy
Role-based Access Control (RBAC)
Separation of role/group from rule admin
Mandatory Access Control (MAC)
Clearance, classification, compartmentalization
Attribute-based Access Control (ABAC)
Generalization of attributes
>>> Policy language abstraction level and expressiveness >>>
>>> Policy language abstraction level and expressiveness >>>
16
Globus / caGrid GAARDS
17
Publication:Make information accessible
Make data available in a remotely accessible, reusable manner
Leave mediation for integration layer
Gateway from local policy/protocol into wide area mechanisms (transport, security, …)
18
Imaging clinical trials use case
NANTCOG
Childrens Oncology Group
VO
Neuroblastoma Cancer Foundation
VO
19
ApplnService
Create
Index service
StoreRepository ServiceAdvertize
Discover
Invoke;get
results
Introduce
Container
Transfer GAR
Deploy
caGrid, Introduce, gRAVI: Ohio State, U.Chicago
Automating service creation, deployment
Introduce Define service Create skeleton Discover types Add operations Configure security
Grid Remote Application Virtualization Infrastructure Wrap executables
20
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
21
Management:Naming and moving data
Persistent, uniform global naming of
objects, independent of type
Orchestration of data movement among
services
D
S1
S2
S3
D
S1
S2
S3
D
S1
S2
S3
22
Naming health objects:A prerequisite to management
The naming problem: “Health objects” =
patient information, images, records, etc.
“Names” refer to health objects in records, files, databases, papers, reports, research, emails, etc.
Challenges: No systematic way of
naming health objects Many health objects,
like DICOM images and reports, include references to other objects through non-unique, ambiguous, PHI-tainted identifiers
A framework for distributed digital object services: Kahn, Wilensky, 1995
23
Health Object Identifier (HOI)naming system
uri:hdl://888.us.npi.1234567890.dicom/8A648C33-A5…4939EBE
Random String for Identifier-Body
PHI-free and guaranteed unique
Random String for Identifier-Body
PHI-free and guaranteed unique
888: CHI’s top-level naming
authority
888: CHI’s top-level naming
authority
National Provider Id used in hierarchical Identifier
Namespace
National Provider Id used in hierarchical Identifier
Namespace
Application Context’s Namespace governed by provider Naming Authority
Application Context’s Namespace governed by provider Naming Authority
HOI’s URI schema identifier—based on
Handle
HOI’s URI schema identifier—based on
Handle
24
Data movement in clinical trials
25Community public health:Digital retinopathy screening network
26
Integration:Making data usable and useful
?
0% 100% Degree of prior syntactic and semantic agreement
Degree of communication
0%
100%
Rigid standards-based approach
Loosely coupled approach
Adaptive approach
27
Integration:Generally used approaches
Allow free text and lose interoperability Tightly encode data elements specific to
purpose but lose expressivity/re-use and interoperability
Post-hoc tying data elements to biomedical vocabularies
Constraining choices to concepts in biomedical vocabularies
Assemble raw data into warehouses
28
Semantic expressivity is generally problematic in biomedical data
Biomedical concepts are context dependent For billing data, ICD and CPT works For quality/effectiveness/research more detail is
required Encode data for semantic interoperability and re-
use— or collect specific to context? Physicians prefer free text Biomedical researchers collect data in highly
specific contexts -> tying data to standard vocabularies alone is insufficient and burdensome
29
Integration via mediation
Map between models Scoped to domain use
Multiple concurrent use
Bottom up mediation between standards and
versions between local versions in absence of
agreement
Query Reformulation
Query Optimization
Query Execution Engine
Wrapper
Query in the source schema
Wrapper
Query in union of exportedsource schema
Distributed query execution
Global Data Model
(Levy 2000)
30
ECOG 5202 integrated sample management
ECOGCC
Web portal
CHI appliance
CHI appliance CHI appliance CHI appliance
OGSA-DQP
OGSA-DAI OGSA-DAI OGSA-DAI
Mediator
No coordinated data systems
MD AndersonECOG PCO
31
Analytics:Transform data into knowledge
“The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.”
— US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008
32Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
33
Many many tasks:Identifying potential drug targets
2M+ ligands Protein xtarget(s)
(Mike Kubal, Benoit Roux, and others)
34
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues,
#MDsteps)
Amber Score:1. AmberizeLigand
3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED~4M x 60s x 1 cpu
~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
35DOCK on BG/P: ~1M tasks on 118,000 CPUs
CPU cores: 118784 Tasks: 934803 Elapsed time:
7257 sec Compute time:
21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to
32 racks) Utilization:
Sustained: 99.6% Overall: 78.3%
Time (secs)
36
Recap
Increased recognition that information systems and data understanding are limiting factor… much of the promise associated with health IT requires high
levels of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) …. RAND COMPARE
Health system is complex, adaptive system There is no single point(s) of control. System behaviors are often
unpredictable and uncontrollable, and no one is “in charge.” W Rouse, NAE Bridge
With diverse and evolving requirements and user communitities… I advocate … a model of virtual integration rather than true
vertical integration…. G. Halvorson, CEO Kaiser
37
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
Functioning in the zone of complexity
Plan and
control
Chaos
38
The Grid paradigm and healthcare information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services