Post on 23-Oct-2020
Wh
at’s the
role
of h
ealth
system
s?
Lucila Ohno-Machado, MD, PhDUniversity of California San Diego
12/7/16 HIMMS Chile
Disclosures
• No relevant financial relationships with commercial interests
• Grant funding from NIH, PCORI, VHA and the University of California
• Editor-in-Chief of the J Amer Med Inform Assoc
• On staff at the Veteran’s Administration
• Non-paid advisor for not-for-profit institutions
• Not speaking on behalf of any of these entities
Electro
nic H
ealth
Re
cord
s
Big Data
Electronic Health Records (EHRs) in most health systems can be used for research• Predictive analytics on big data help clinicians and
administrators, but
• Re-identification success from ‘de-identified’ data is
possible, and depends on many factors:
• What is disclosed, to whom
• Who an attacker is interested in identifying and how much
she is willing to spend
Patient Interaction
AnalyticsDistributed Analysis Machine Learning
Data StructuringData Modeling Natural Language Processing
Predictive ModelingEvaluation Methods
Decision Support ToolsGuidelines, Alert & Reminders
Data Collection ToolsClinical Data Warehouse
mHealthSensorsEnvironment
Data De-IdentificationPrivacy Technology
Communication StrategiesInformed ConsentConsumer Health Informatics
BioinformaticsGenomicsProteomics
Who owns the data?
Who can use the data?
• Patient
• Researcher
• Institution
The Evolution of Healthcare IT Systems (M Hogarth, UC Davis)
LaboratoryInformation
System
“Electronic Health
Record” System
RadiologyInformation
System
PharmacyInformation
System
Data Access/Exchange v1.0- Within institutional systems
Data Access/Exchange v2.0- EHR to EHR
Data Access/Exchange v3.0- Clinical Data Networks- Federated querying
LIS
EHR
LIS RX LIS
EHR
LIS RX
EHR
EHREHR
EHR
EHR
EHR
Example of Consent for Care
[institution]
[institution]
[institution]
[state]
[institution]
Knowledge & Tools
Privacy
Consent
Data
• Share access to data and computation
• Train the new generation of data scientists
• Provide innovative software, platform, and infrastructure
• Protect privacy
• Algorithms
• Tools
• Infrastructure
• Policies
iDASH
Knowledge
& Tools
ServicesPlatform
Data
Sensors
Genomic
Clinical
ServiceWWW
Apps
Exec.
Aggreg.Hosting
Sharing
Policies
Platform
Research
Develop.
Federation
Biomedical Data Science
Data Discovery Index Consortium
NIH BD2K Initiative
Do for data what PubMed did for the literature
The consortium has funded several institutions around the world
U24AI117966
DEMOTuesday 11:15amSalon A-3 Lower Level
MetadataIngestion
Terminology server• Query expansion• Result ranking
DataMed User InterfaceSearch Engine
Metadata Management• Mapping• Indexing
Repositories
Data Sets
Funding Agencies
Data Producers
Publishers
Dat
a so
urc
es
U24AI117966
DEMOTuesday 11:15amSalon A-3 Lower Level
DigestingIndexingFindingFiguring outInteroperating withCountingUsingLeveragingTesting
data
Metadata elements identified by combining the two complementary approaches
top-down approach bottom-up approach
The development process in a nutshell
Model serialized as JSON schemas and mapping to schema.org
(v1.0, v1.1, v2.0, v2.1)
S Asunta-Sansone 2016
Metad
ata
Data and Precision Medicine Components
Site 1
OMOP CDMData
Claims Data
SITE 2...N
…
Data transformation into PM CDM
EHR System
PatientReported
OMOP CDMData
EHR System
Patient reported data
Wearables
PMCDM Data
PM CDM Data
Data & Research Support Center
Data transfer > quarterly
Claims Data
M Hogarth 2016
Do
ing th
e R
ight Th
ing
Is it ethical to share?People have not been explicitly asked and don’t know who is sharing what
Could people choose?Is it practical?What if massive number of people withdraw their data?
Is it ethical not to share? New discoveries and acceleration of science depend on sharing
patient-centered SCAlableNational Network for Effectiveness Research
)
Supported by the Patient-Centered Outcomes Research Institute Contract CDRN-1306-04819
pSCANNER is a stakeholder-governed,
distributed clinical data network that aims to make health
data more accessible and usable for the generation of
scientific evidence that patients, clinicians, and other
stakeholders together use to make more informed health
decisions.
Set research
priorities and
design research
Stakeholder-governedInvolve stakeholders—patients, patient advocates, clinicians, and
researchers—as advisors in the pSCANNER work.
Develop
education and
communication
materials
Weight management/obesity
Congestive heart failure
Kawasaki disease
Governance
Patient-Centered Outcomes Research (PCOR)
Focuses on patients’ needs and preferences
and on outcomes most important to
them.
Helps patients and other healthcare
stakeholders, such as caregivers,
clinicians, insurers, policymakers and
others, make better-informed decisions
about health and healthcare options.*
*The word options implies the comparison of different types of treatments, medications, or healthcare practices.
Comparative Effectiveness Research (CER) is one type of PCOR
“Preparing for the Community Forum: Thinking about quality health care” AHRQ Community Forum pg. 8
Comparative Effectiveness Research (CER)
Medicine A Medicine Bvs.
Which of the two medications?
Comparative Effectiveness Research (CER)
Three common types of PCOR and CER research design
Randomized
Control Trials
Pragmatic
Trials
Observational
Studies
pSCANNER is part of PCORnet
PCORnet seeks to improve the nation’s capacity to conduct
clinical research by creating a large, highly representative,
national patient-centered network that supports more
efficient clinical trials and observational studies.
PCORnet embodies a “community of research” by uniting systems, patients & clinicians
11 Clinical Data Research
Networks
(CDRNs)
18 Patient-Powered Research Networks
(PPRNs)
PCORnet:
A national infrastructure for patient-centered clinical research
13 CDRNs and 20 PPRNs Funded
This map depicts
the number of
PCORI-funded
Patient-Powered or
Clinical Data
Research Networks
that have coverage
in each state.
24 Million Patients
9 health systems
Phase 1
UCD 2.3M
UCSF 3.2M
UCLA 4.3M
UCI 300k
UC ReX
SCANNER
USC LA
VA VINCI 11M
UCSD 2.3M
CTSA hubNetwork
AltaMed 607kTCC 240k QueensCare 19k
Phase 2
UC Davis 2.3M
UCSF 3.2M
UCLA 4.3M
UC Irvine 300k
USC Keck 2M
LA Children’s 200k
LA DHS 600K
UC ReX
SCANNER
LA
VA VINCI 11M
UCSD 2.3M
Cedars-Sinai 2M
University of Washington CTSA
CTSA hubNetwork
San Mateo Medical Center 77k
University of Colorado
Altamed 200kChidlren’s Clinic 24k Queenscare 19k
30 Million people
14 health systems
U Texas Houston
Rutgers
Emory U
Collaborating Patient-Powered Research Networks
PPRN Targeted Condition (IRB approved)
Health eHeart Heart Disease
AR-PoWER Inflammatory Arthritis
CENA Alström Syndrome, Dyskeratosis Congenita, Gaucher Disease, Hepatitis, Inflammatory Breast Cancer, Joubert Syndrome, Klinefelter Syndrome and Associated Chromosomal Anomalies, Metachromatic Leukodystrophy, Pseudoxanthoma Elasticum
DuchenneConnect Duchenne and Becker Muscular Dystrophy
iConquerMS Multiple Sclerosis
PPRN (in process) Targeted Condition
ImproveCareNow Inflammatory Bowel Disease
PRIDEnet Sexual and Gender Minorities
Crohn's & Colitis Foundation of America (CCFA)
Crohn's Disease and Ulcerative Colitis
MoodNetwork Mood Disorders
Slide from Dr. Maehara
International collaborations are welcome and needed• Federated data for distributed analytics
• Common data model
• Minimal computational infrastructure
• Compatible institutional policies
• Agreement on rules of engagement
• Shared ethics principles
How pSCANNER improves health researchResearch has focused on the priorities of scientists and
clinicians, which may be different from those of
patients, family members, and caregivers.
Health research often requires large numbers of study
participants but many individual healthcare
organizations do not have enough patients with a given
condition.
Evidence from health research takes years to make it
into practice because of the challenges in adapting and
communicating research findings.
1
2
3
http://urp.ucsd.edu/for-students/what-is-research.html (Accessed 12/5/2014)
Local data in EHRs, clinical,
administrative systems
Standardized data from public health and other sources
Data
21
pSCANNERstandardizes data
into Common Data Model for PCORnet
3
• Local Data are Cleaned and Harmonized for pSCANNER
pSCANNER supports big sciencepSCANNER will enable researchers to obtain data from distributed sites covering over
32 million patients in a privacy-preserving environment.
UCLA
4.1M
2.2M
UCDUCSF
3M2M
Cedar SinaiVA
9.1M
0.3M
ALTAMED
24K
TCC QUEENSCARE
19K 1.4M
UCI UCSD
2.1M
pSCANNER Hub
Population & Outcomes Characterization
What are the environmental and genetic determinants of Kawasaki Disease?
Do antibiotics contribute to
childhood obesity?
Which aspirin dose is better in coronary artery disease?
Can we use EHR data for research?
Preserving Privacy
Privacy Preserving Analytics for KD in African-Americans
Consent for Data and Biosample
Sharing in Underserved Populations
Partnership for Epidemiological
Research
Study on Latinos
Which DNA variants are implicated in KD susceptibility in this population?Emory, Genome Institute of Singapore, Imperial College
Does consent rate depend on who is obtaining the consent?Maricopa Health System, FQHS in Arizona
Do patients understand what they consented for?San Diego State University
What type of ‘sharing’ is acceptable?University of Oklahoma
StrongHeartStudy on American Indian Populations
Data and Biospecimen SharingPrivacy Preserving Computation
Personal monitorsno regulation of apps
My neighbor’s dataDoes she have Disease X?
Other databaseslinking data for re-identification
Public ‘de-identified’ database of Condition XResearch database with “de-identified” EHRs, genomes
EHRsDisease, Family History, etc
Health Insurance Portability & Accountability Act
34
HIPAA ‘De-identified’ data• removal of 18 identifiers,
such as dates, biometrics, names, etc.
• expert certification of low risk of re-identification
• ‘Limited’ data sets have ‘de-identified data’ plus dates
Biometrics and Protected Health Information
PHI requires HIPAA compliance
• Biometrics require HIPAA compliance
Biometrics are
Protected Health Information (PHI)
Genomes are Biometrics
PHI requires HIPAA compliance
• Genomes should be treated as HIPAA identifiers
Biometrics are
Protected Health Information (PHI)
New DNA tests on a secret sample collected from a relative of suspect Albert deSalvotriggered the exhumation after authorities said there was a “familial match” with genetic material preserved in the killing of Sullivan…
Authorities made the match through DNA taken from a water bottle thrown away by DeSalvo’s nephew…
But a lawyer for the DeSalvos told CNN the family was “outraged, disgusted and offended” by the decision to secretly take a DNA sample of one of its members…
Technology “solutions” (mitigation strategies)
Data-centric
• Add noise to data
(e.g., differential privacy)
• Operate on encrypted data
(e.g., homomorphic encryption)
• Multiparty computation
(e.g., distributed analytics)
Institution/People centric
• Data broker(e.g., clinical data research networks)
• Patient-defined data sharing permissions
(e.g., consent management)
Policies
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)
homomorphic encryption
secure multiparty computation
Institutional and Data-Centric Strategiesdifferential privacy
NIH U54HL1084600
Statistical Data Release (macrodata release)
Mohammed N. Privacy Preserving Heterogeneous Health Data Sharing. J Am Med Inform Assoc 2013
Jiang XL. Differential-Private Data Publishing Through Component Analysis. Transactions on Data Privacy 2013
Courtesy of Li XiongME-1310-07058 (Xiong)
Original records Original histogram
Statistical Data Release: Disclosure Risk
Courtesy of Li Xiong
Original records Original histogramPerturbed histogram with differential privacy
Statistical Data Release: Differential Privacy
Courtesy of Li Xiong
Differential Privacy (Dwork et al)
A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D, D’, and for any possible output S ∈ Range(A),
Pr[A(D) = S] ≤ exp(ε) × Pr[A(D’) = S]
D D’
• D and D’ are neighboring databases if they differ on at most one record
Courtesy of Li Xiong
Clinical Data Research Networks
30 Million people
14 health systems
UC Davis 2.3M
UCSF 3.2M
UCLA 4.3M
UC Irvine 300k
USC Keck 2M
LA Children’s 200k
LA DHS 600K
UC ReX
SCANNER
LA
VA VINCI 11M
UCSD 2.3M
Cedars-Sinai 2M
University of Washington CTSA
CTSA hubNetwork
San Mateo Medical Center 77k
University of Colorado
Altamed 200kChidlren’s Clinic 24k Queenscare 19k
U Texas Houston
Rutgers
Emory U
Patient-Centered Outcomes Research InstituteCDRN-1306-04819
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)
homomorphic encryption
secure multiparty computation
Data sharing ecosystem
Sharing Data, Tools, Systemsdifferential privacy
indexing
User requests data
for Quality
Improvement or
Research
•Identity & Trust
Management
•Policy
enforcement
Trusted
Broker(s)
Diverse Healthcare Entities
in 3 different states (federal, state,
private)
Distributed Computing
Wu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA, 2012 Wu Y et al. Grid Multi-Category Response Logistic Models. BMC Med Inform Dec Making 2015
NIH U54HL1084600
Distributed Regression Model
Conclusion: no patient data needs to be sent from the sites, only aggregates
Distributed Analytics across Horizontal and Vertical Partitions
Patient Age Genome data
A1 45 ACTGACT
A2 32 ACTTAGT
Patient Age Genome data
B1 48 CCTGACT
B2 72 CCTTAGT
Patient Age Genome data
A1 45 ACTGACT
A2 32 ACTTAGT
Li Y, et al. VERTIcal Grid lOgistic regression (VERTIGO) J Am Med Inf Assoc. 2015
International Collaboration
Slide from Dr. Shuang Wang
2nd Genome Privacy Protection Challenge
• Task 1: Homomorphicencryption (HME) based secure genomic data analysis
• Task 2: Secure comparison among genomic data in a distributed setting
Focus on secure outsourcing and secure data analysis in a distributed setting
Technology “solutions” (mitigation strategies)
Data-centric
• Add noise to data
(e.g., differential privacy)
• Operate on encrypted data
(e.g., homomorphic encryption)
• Multiparty computation
(e.g., distributed analytics)
Institution/People centric
• Data broker(e.g., clinical data research networks)
• Patient-defined data sharing permissions
(e.g., consent management)
Policies
Consent
Manageme
nt System
Do I wish to
disclose
data D to
U?
Sharing
Look-up
Yes
Patient I
Patient Interface
I can check
that U
looked at
my data D
• Data use
agreements
• Study registry
Trusted
broker
Healthcare
Institutions
User U
requests
Data D on
individual I
People-Centered Strategies
NIH R01HG008802
Informed CONsent for Clinical data Use in Research
iCONCUR
Courtesy of H Kim
NIH R01HG008802
My Sharing Choices • 14 Data Classes
+ Sharing for stem cell research
Courtesy of H Kim
Sharing Preference Distribution
Supported by the NIH Grant U54 HL108460 to the University of California, San Diego
Courtesy of H Kim
Preliminary dataon >1k patients
NIH R01HG008802
Big D
ata are a B
ig De
al
App datano regulation
Social network can be used
Databasescan be purchased
Asking peoplemay help
EHRscan be linked
Informatics Training for Global Health
NIH D43TW007015
Thank you
Acknowledgements to a large team
of study participants, community engagement professionals, privacy technology colleagues, advisors, funding agencies
Fun
din
g Sou
rces
NIH R01HG008802
NIH U24AI117966 Department of Veterans AffairsI01HX000982
Patient-Centered Outcomes Research InstituteCDRN-1306-04819
NIH T15LM011271
NIH U54HL1084600