Managing and Analyzing Global Health Data
-
Upload
institute-for-health-metrics-and-evaluation-university-of-washington -
Category
Data & Analytics
-
view
110 -
download
2
description
Transcript of Managing and Analyzing Global Health Data
UNIVERSITY OF WASHINGTON
Managing and Analyzing
Global Health Data
Seattle, August 30, 2011
Peter Speyer, Director of Data Development
IHME Background
• Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health
• Part of the Department of Global Health at the University of Washington
• Funded by the Bill & Melinda Gates Foundation and the state of Washington (“core funding”), and other funders through specific research grants
• Created in 2007
• 70 researchers, 30 staff
2
IHME Mission
Our goal isto improve the health of the world’s populations
by providing the best informationon population health
3
4
Health-related data
• Social determinants• Risk factors
Health data
5
Population-based data
• Household/facility surveys• Census• Vital registration• Registries (provider,
disease)
Facility-based data
• Health records• Administrative data
(financial, operational)• Research data (DSS,
clinical trials, etc.)
Individual-based data
• Personal health records• “Quantified self”• Disease-based social
networks
Health Data Innovation
Patient engagementOpen data
Health apps
Key health data challenges
6
Find & access
data
Dissemi-natedata
Use data
Key health data challenges
• Lack of transparency
• Timeliness of data
• Lack of documentation• Access vs. privacy
7
Find & access
data
Dissemi-natedata
Use data
Key health data challenges
• Sheer quantity of data files (30TB, 20K+ source datasets, 40M files)
• Diverse source data types and formats (pdf, csv, SPSS, CSPro,…)
• Data quality issues
8
Find & access
data
Dissemi-natedata
Use data
Key health data challenges
• Make results data engaging
• Accountability: share results, code, source data
• Accommodate diverse audiences (expertise, geographies)
9
Find & access
data
Dissemi-natedata
Use data
Example: Global Burden of Disease
Mortality & causes of death
• Sources: census, surveys, vital registration, verbal autopsy
• Estimates: covariate models, spatial-temporal regressions; weighted combination of models
Morbidity
• Sources: Literature reviews, surveys, registries,hospital data
• Disease modeling: compartmental Bayesian model
• Health severity weights
Burden of disease
• DALYnator
10
300 diseases
40 risk factors
21 regions
1990, 2005, 2010
GBD Country Years, Causes of Death 1950-2009
11
GBD Country Years, Causes of Death 1950-2009
12
Data source Countries Site-years # of Deaths
VR 128 4,190 722,267,710
Household surveys 136 2,827 10,132,976
Surveillance systems 12 126 717,698
National VA 21 71 301,855
Subnational VA 59 442 2,606,815
Mortuary registries 6 25 54,316
TOTAL 7,680 735,564,116
Solutions: computing infrastructure
• Analysis with statistical packages
– Projects with 100K+ lines of code
• File system
– 60TB disk space
– Redundant backup
• Cluster with 63 nodes (+300% in 2011), ~2000 cores
– Runs 24x7, very little downtime
• Virtual environments to test new applications, servethem to collaborators, etc.
13
Solutions: Global Health Data Exchange
• Transparency => data catalog• Access => data repository• Information => data community (future)
• One record per dataset• Standardized metadata• Internal users (10K records): files on file server• External users (5K records): files for download
• CMS: Drupal • Search: SOLR
14
Objectives
Approach
Implementation
15
UNIVERSITY OF WASHINGTON
Thank you!
[email protected]@peterspeyer
www.ghdx.org
Peter Speyer
Director of Data Development