Publicly Available Large Data Sets for Health Outcomes ......Pitfalls, Prices & More October, 2018...
Transcript of Publicly Available Large Data Sets for Health Outcomes ......Pitfalls, Prices & More October, 2018...
Publicly Available Large Data Sets for Health Outcomes Research: Pearls,
Pitfalls, Prices & More
October, 2018
L A K S H I K A T E N N A K O O N - M D , M S C , M P H I L , D T M & H
R E S E A R C H S C I E N T I S T
T R A U M A , A C U T E C A R E A N D C R I T I C A L C A R E S U R G E RY
S TA N F O R D U N I V E R S I T Y
Aims
• To encourage use of public data for
Research
• To characterize existing large clinical
databases
Databases Dates
Nationwide Inpatient Sample (NIS) 1988- 2016
Nationwide Emergency Department Sample (NEDS) 2006-2016
Nationwide Readmissions Database (NRD) 2010-2016
KID Inpatient Data (KID) 1997,2000,
2003,2006,
2009, 2012, 2016
National Trauma Databank (NTDB) 2002-2016
National Surgical Quality Improvement program (NSQIP) 2005-2016
National Ambulatory Medical Care Survey (NAMCS) 1993-2015
Best Currently Available Databases
Source
HCUP
HCUP
HCUP
HCUP
ACS
ACS
CDC
Databases Dates
National Health and Nutrition Examination Survey
(NHANES)
1999-2015
National Hospital Ambulatory Medical Care Survey
(NHAMCS)
1992-2015
Medicare/SEER 1991-2015
MarketScan 2002-2011
Hospital based Registry data
Databases……….
Source
CDC
CDC
Government
Private
Hospital Based
• The largest publicly available all-payer inpatient caredatabase in the United States
• Samples include all discharges from 20% stratifiedsample of US hospitals
• NIS data can be weighted to generate nationalestimates
• Years available: 1988 to 2016
• Has 8 million hospital stays a year
• NIS_2015_CORE data file has: 7,153,989 Records
Nationwide Inpatient Sample (NIS)
Cost & Data Load Software
• Cost of 2016 NIS : $625
• Original Data comes as CSV or ASCII files
• Load programs are available in:
STATA
SAS
SPSS
• Data storage : Large databases need a server or BOX
NIS-Requirements
Citing HCUP Databases• Citing HCUP Databases in Abstract and Manuscript:
As specified in the HCUP DUAs, include the database name, HCUP, and AHRQ as demonstrated below for each HCUP database:
• HCUP Nationwide Inpatient Sample (NIS). Healthcare Cost and Utilization Project (HCUP). 2007-2009. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/nisoverview.jsp
Data Files
▪ Core Data
▪ Hospital Data
▪ Illness Severity Data
▪ Cost to charge ratio Data
▪ Diagnosis & Procedure Groups Data
▪ https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp
What Data Elements Are in the NIS?
• Age at admission
• Gender of patient
• Race of Patient
• Location of patient’s residence
• Median household income for patient's ZIP code
• ICD-9-CM diagnoses: primary and secondary diagnoses,
number of diagnoses, diagnosis coding system
• External causes of injury and poisoning: ECODE 1-4, number
of external cause of injury
• ICD-9-CM Procedures: primary and secondary procedures,
number of procedures, procedure systems, duration of primary
and secondary procedures
• Total charges
• Disposition
• Length of stay
Core Data File
• Hospital bed size
• Type of Hospital: government or private;
government, nonfederal, public; private, non-
profit; private, investor-own
• Hospital Location: rural or urban,
• Location/teaching status of Hospital: rural,
urban non-teaching, urban teaching
• Region of Hospital: Northeast, Midwest, South,
West
• Hospital Weights: weight to hospitals in AHA
universe, weight to hospitals in the State
Hospital Data File
Severity of Illness Data• Severity of Illness Subclass
• Risk of Mortality Subclass
• 29 Comorbid conditions: Alcohol Abuse, Depression, DrugAbuse, Liver Disease, Renal Failure, Obesity…………..
• Defined by Elixhauser Comorbid Scale
• https://www.hcup-us.ahrq.gov/toolssoftware/comorbidityicd10/comorbidity_icd10.jsp
Cost-to-Charge- Ratio Data
• Year
• Hospital Unique Identifier
• Wage Index
• CCR_NIS (an Identifier, linking NIS 2012 to current )
• Calculate “Total Cost” based on above data and “Total Charges”
(TOTCHG) variable which is available in NIS core data file
• Formula : gen Total_COSTS= TOTCHG*CCR_NIS
• NEDS is the largest all-payer ED database in the
United States
• Samples include stratified samples of 20% of US
hospital-based Emergency Departments
• Years available: 2006-2016
• Number of ED visits: Between 25 and 30 million
(unweighted) records for ED visits from 950 hospitals
• Cost of NEDS 2016 $1000
Nationwide Emergency Department Sample (NEDS)
Four Data Files per year
▪ Core data
▪ Emergency department data
▪ Inpatient data
▪ Hospital Weights data
What Data Elements Are in the NEDS?
Examples of NEDS-Based Research
Nationwide Readmissions Database (NRD)
• Calculate national readmission rates for all payers and the uninsured
• Available nationally representative information on hospital
readmissions for all ages
• Unweighted NRD data from approximately 12 million discharges each
year
• Has Core data, Hospital data, Illness severity, Cost to Charge Ratio
data
• Available years 2010- 2016
• Cost of NRD 2016 data $1000
KID (Kids’ Inpatient Database )
• Only all-payer pediatric inpatient care database in the USA
• Contains 2-3 million hospital stays
• Helps to develop national & regional estimates on diseases
• Data available for Demographics, Injury characteristics,
Diagnosis, Hospital characteristics, Outcomes and Healthcare
Cost
• Need to sign a DUA
• Cost of KID 2016 data $500
• The largest registry of trauma patients admitted totrauma centers in the United States
• Data is not weighted
• No DUA (data user agreement)
• Samples are obtained from trauma center
• registries
▪ In 2011, 747 trauma centers were included
• Years available: 2002 -2016
• Data files are in CSV format
• Cost of 2016 NTDB data $300
National Trauma Data Bank (NTDB)
NTDB Data• Demographic data
• Injury severity data
• Emergency department data
• Mechanisms of Injury data
• ICD9 and ICD10 Procedure data
• ICD9 and ICD10 Diagnosis data
• Discharge disposition data
• Facility data
• Vital signs data
• Protective devices & transportation data
• Comorbid and complications data
National Surgical Quality Improvement Program (NSQIP)
• A nationally validated, risk-adjusted, and outcomes-based program
• NSQIP has prospective and outcomes data
• Years available: 2005 - 2011
• NSQIP will measure and improve the quality of surgicalcare across surgical specialties
• 680 hospitals are participating NSQIP in 2017
What Data Elements Are in the NSQIP?
• Preoperative risk factors
• Intraoperative variables
• 30-day postoperative mortality and morbidity outcomes
• Demographic data
• Current Procedural Terminology (CPT) data
• Health and behavior data
• Physical examination data
• Free data for NSQIP participating hospitals
• Data Request Process
www.facs.org/quality-programs/acs-nsqip
• Need to sign a DUA (Data User Agreement)
• Download the data
www.facs.org/quality-programs/acs-nsqip
• Data files available in 3 different formats: Text, SPSS, SAS
• Private database
• MarketScan is broadly representative of the commercially
insured population of United States
• High quality, longitudinal, and patient level data
• Low percentage of missing data
• Years available: 2002 – 2011
• Need to sign a DUA (Data User Agreement)
Cost around $50,000/year
MarketScan Data
• Patient socio-demographic data
• Admission date and type
• Diagnosis code (principal and secondary)
• Discharge status
• Procedure code (principal and secondary)
• Length of stay
• Place of service
• Provider ID
• Data on drugs/medications
What Data Elements Are in the MarketScan?
SEER-Medicare Data
• SEER-Medicare Linked Database
• Medicare beneficiaries with cancer
• Data derived from Surveillance, Epidemiology and End Results
• Diagnosis & Procedure codes: ICD9, ICD10, CPT,
• HCPCS (Healthcare Common Procedure Classification System)
• Patient Demographic and Socioeconomic Characteristics
• Comorbidity
• Breast, Colorectal, and Prostate Cancer Screening
• Radiation Therapy (includes codes to identify radiation therapy)
• Chemotherapy Use (includes codes to identify chemotherapy)
• Complications of Cancer Treatment
• Surveillance After Cancer Treatment
• Data sets available from 1991-2015
• Need to sign a DUA (Data User Agreement)
• Physician Characteristics
• Hospital Characteristics
• Health Care Costs Related to Cancer Treatment
National Health and Nutrition Examination Survey (NHANES)
• Cross-sectional and high quality survey data of adults and children
in United States
• Data available on nationally representative sample of about 5,000
persons/each year
• Years available:
1971-75—NHANES I
1976-80—NHANES II
1982-84—Hispanic Health and Nutrition Examination Survey
(HHANES)
1988-94—NHANES III
1999-present--National Health and Nutrition Examination Survey
(Continuous NHANES)
• Free to download the data from CDC website
• Socio-Demographic data
• Dietary data
• Clinical examination data (medical, dental, and
physiological measurements)
• Laboratory data
• Questionnaire data
• Genetic data
• Mortality data
• NHANES Medicare Utilization and Expenditure Linked
Files (Restricted data)
• NHANES Linked Mortality files
• NHANES Linked Social Security Administration Files
(Restricted Data)
What Data Elements Are in the NHANES?
• National survey has ambulatory medical care
services data in the United States
• Data will represent a sample of visits to non-
federal employed, office-based physicians who
are primarily engaged in direct patient care
• NAMCS has high quality cross-sectional data
• Years available: 1973-Current
• Free to download the data from CDC
website
National Ambulatory Medical Care Survey (NAMCS)
• Socio-demographic data
• Source of payment and number of past visits
• Patient’s Primary Care Physician Information
Diagnosis
• Chronic disease checklist and disease management programs
• Screening and diagnostic services
• Treatments and drugs prescribed
• Physician specialty
• EMR use and practice parameters
• sources of revenue
• Providers seen and duration of care under those providers
What Data Elements Are in the NAMCS?
Evidence Based Research-NAMCS
National Hospital Ambulatory Medical Care Survey (NHAMCS)
• National probability sample survey of visits to Emergency and
Outpatient departments in nonfederal, general, and short-stay
hospitals in United States
• Records-based survey data, producing annual estimates of the
number and attributes of visits to hospital emergency departments
(EDs) in the U.S
• Survey is a visit based and cannot calculate prevalence and incidence
rates
• Years available: 1992-Current
• Free to download the data from CDC website
Thank you !
Lakshika email: [email protected]