Developing a Data Quality Profile for the Consumer ......1 —U.S. BUREAU OF LABOR STATISTICS...
Transcript of Developing a Data Quality Profile for the Consumer ......1 —U.S. BUREAU OF LABOR STATISTICS...
1 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Developing a Data Quality Profile for the Consumer
Expenditure Survey
Yezzi Angi LeeVeri Crain, Scott Fricker , Evan Hubener,
Clayton Knappenberger, Brandon Kopp,
Julie Sullivan, and Lucilla Tan.
August 1st, 2017
2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Presentation Outline
To share the challenges encountered in the initial stages of this development process, report on interim progress, and thoughts for next steps.
What is a Data Quality Profile (DQP)
Challenges
Iterative approach to development
Interim results
Moving forward
3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
What is a Data Quality Profile(DQP)?
4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Survey Research Center (2010)
“A comprehensive report prepared by producers of survey data that provided information data users need
to assess the quality of the data”
“ To provide researchers and data users with a single source for a wide range of information on
the quality of AHS data”
Quality Profile of the American Housing Survey (1996)
5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
C
BRFSS 2013 Summary Data Quality Report
More Example:
Vary in Breadth and Depth of Coverage
American Housing Survey 1996 Quality Profile
https://www.cdc.gov/brfss/annual_data/2013/pdf/2013_dqr.pdf https://www.census.gov/content/dam/Census/programs-surveys/ahs/publications/h12195-1.pdf
RESPONSE RATES 23 PAGE Annual publication
TOTAL SURVEY ERROR DIMENSIONS
80 + PAGE 1996
6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Data Quality Profile for the CE
Internal External
M“ Monitoring; Establish baselines ” “ Fitness for Use ”
7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Total Quality Management Dimensions
(TQM)
Relevance
Accuracy
Coherence
Timeliness
Accessib-ility
Interpret-ability
Definition of Data Quality for CEMulti-dimensional Definition of Data Quality adopted for CE
Total Survey Error Sources (TSE)
Frame (coverage) Specification (construct)
Sampling Measurement
Non-response Processing (data edit)
Post-surveyadjustment
(Gonzalez et al 2009)https://www.bls.gov/cex/ovrvwdataqualityrpt.pdf
8 — U.S. BUREAU OF LABOR STATISTICS • bls.gov8 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Challenges
9 — U.S. BUREAU OF LABOR STATISTICS • bls.gov9 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Metric Documentation:efficient and robust
Infrastructure :Continuous and adaptable
to change
To achieve reproducibility and interpretability of metrics
10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
`
1.Requires participation and coordination across the survey program
2.Resource intensive to develop and maintain
TSE
CE DQP Challenges
TQM
11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
CE Strategy to identify metrics
12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
TQM: Survey as a manufacturing process
http://www.freepik.com/free-vector/industry-and-technology-background_1048768.htm Designed by Freepik
Total Quality Management Dimensions
(TQM)
Relevance
Accuracy
Coherence
Timeliness
Accessib-ility
Interpret-ability
13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Proposed Framework
Activity 1
Issue 1
Monitoring method/ Metric
Quality dimension(s)
Activity 2
Issue 1
Monitoring method/ Metric
Quality dimension(s)
Identifying key stages in CE life cycle
For each stage, identify major activity
For each activity, identify issue(s) of concern
Propose how to monitor issue identified
Identify quality dimension(s) affected
(Fricker et al 2012)
14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Example of metric metadata description using a template
Metric Name
Description
Metric interpretation
Survey
Quality dimension
CALCULATION
Formula
Data source and variables
Frequency
Level of aggregation
Maintained by
MONITORING
Target / Threshold / Tolerance
Presentation / display
NOTES/COMMENTS
15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Proposed framework: Criteria for Metric Prioritization
Specific – targeted at identified risk
Measurable – can be used to determined progress
Achievable – realistically attainable
Relevant – not just “good to know”, actionable
Timely – available when needed
S.M.A.R.T
16 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Iterative approach to DQP development“Learn by doing, Refine and Scale up!”
17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Lesson Learned
2 0 1 1
L E S S O N S L EA R N E D
• UNDERSTAND THE TASK FOR WHICH WE WANT TO DEVELOP METRIC
• IMPORTANCE OF METRIC METATDATA DOCUMENTATION FOR
REPRODUCIBILITY AND INTERPRETATION OVER TIME
2 0 1 2• PROPOSE FRAMEWORK FOR DQP
• ENSURE CONSISTENCY IN DOCUMENTING KEY ELEMENTS OF METRIC METADATA
USE OF A TEMPLATE
2 0 1 3 - 1 4MEASUREMENT ERROR STUDY (WESTAT CONTRACT)
• NO SINGLE ”BEST” METHOD
MULTIPLE METHOD AND INDICATORS (MMI) APPROACH
2 0 1 6MMI FOLLOW-UP
• EXTERNAL INDICATORS FEASIBILITY STUDY
2 0 1 7 DQP VERSION 2 IN PROGRESS
2 0 1 5DQP VERSION 1
• RESPONSE RATES AND EDIT RATES
I n 2 0 0 9 , D Q d e f i n i t i o n
A d o p t e d f o r C E
18 — U.S. BUREAU OF LABOR STATISTICS • bls.gov18 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Lesson Learned
2 0 1 1
L E S S O N S L EA R N E D
• UNDERSTAND THE TASK FOR WHICH WE WANT TO DEVELOP METRIC
• IMPORTANCE OF METRIC METATDATA DOCUMENTATION FOR
REPRODUCIBILITY AND INTERPRETATION OVER TIME
2 0 1 2• PROPOSE FRAMEWORK FOR DQP
• ENSURE CONSISTENCY IN DOCUMENTING KEY ELEMENTS OF METRIC METADATA
USE OF A TEMPLATE
2 0 1 3 - 1 4MEASUREMENT ERROR STUDY (WESTAT CONTRACT)
• NO SINGLE ”BEST” METHOD
MULTIPLE METHOD AND INDICATORS (MMI) APPROACH
2 0 1 6MMI FOLLOW-UP
• EXTERNAL INDICATORS FEASIBILITY STUDY
2 0 1 7 DQP VERSION 2 IN PROGRESS
2 0 1 5DQP VERSION 1
• RESPONSE RATES AND EDIT RATES
I n 2 0 0 9 , D Q d e f i n i t i o n
A d o p t e d f o r C E
19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Lesson Learned
2 0 1 1
L E S S O N S L EA R N E D
• UNDERSTAND THE TASK FOR WHICH WE WANT TO DEVELOP METRIC
• IMPORTANCE OF METRIC METATDATA DOCUMENTATION FOR
REPRODUCIBILITY AND INTERPRETATION OVER TIME
2 0 1 2• PROPOSE FRAMEWORK FOR DQP
• ENSURE CONSISTENCY IN DOCUMENTING KEY ELEMENTS OF METRIC METADATA
USE OF A TEMPLATE
2 0 1 3 - 1 4MEASUREMENT ERROR STUDY (WESTAT CONTRACT)
• NO SINGLE ”BEST” METHOD
MULTIPLE METHOD AND INDICATORS (MMI) APPROACH
2 0 1 6MMI FOLLOW-UP
• EXTERNAL INDICATORS FEASIBILITY STUDY
2 0 1 7 DQP VERSION 2 IN PROGRESS
2 0 1 5DQP VERSION 1
• RESPONSE RATES AND EDIT RATES
I n 2 0 0 9 , D Q d e f i n i t i o n
A d o p t e d f o r C E
20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Example of CE DQP Version 1
https://www.bls.gov/cex/ce_dqreport.pdf
1. Response Rates
2. Nonresponse rates
3. Expenditure Edit Rates
4. Income Imputation rates
* Reporting period: 2009 - 2013
21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Lesson Learned
2 0 1 1
L E S S O N S L EA R N E D
• UNDERSTAND THE TASK FOR WHICH WE WANT TO DEVELOP METRIC
• IMPORTANCE OF METRIC METATDATA DOCUMENTATION FOR
REPRODUCIBILITY AND INTERPRETATION OVER TIME
2 0 1 2• PROPOSE FRAMEWORK FOR DQP
• ENSURE CONSISTENCY IN DOCUMENTING KEY ELEMENTS OF METRIC METADATA
USE OF A TEMPLATE
2 0 1 3 - 1 4MEASUREMENT ERROR STUDY (WESTAT CONTRACT)
• NO SINGLE ”BEST” METHOD
MULTIPLE METHOD AND INDICATORS (MMI) APPROACH
2 0 1 6MMI FOLLOW-UP
• EXTERNAL INDICATORS FEASIBILITY STUDY
2 0 1 7 DQP VERSION 2 IN PROGRESS
2 0 1 5DQP VERSION 1
• RESPONSE RATES AND EDIT RATES
I n 2 0 0 9 , D Q d e f i n i t i o n
A d o p t e d f o r C E
22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
CE DQP Version 2
23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
DQP Version 2: Scale up from DQP version 1
► Updated metric reporting period: 2010-2015
► New metric added: Use of Records by Survey Mode
► Metrics refined:
• Reponses rates: Additional breakouts by survey wave (Internal)
• Expenditure edit rates: Differentiated between processed and
reported data (Internal)
► Addition of visual summary of metric trends
Contents
24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
DQP Version 2: Scale up from DQP version 1
Production Process
► Coordinated team from 3 areas of the CE Program
► Use of metric metadata template for Documentation
► All coding for analysis of metrics and graphs produced within SAS
25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Moving forward
26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Lessons Learned from DQP 2
Spend more time for creating and reviewing the data
Spend more time for exploring and discussing metric ideas, and document!
Consult “topic experts”
Moving the DQP to routine production will need further consideration about the infrastructure needed to support that
27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Next
Upcoming: Data Quality Profile version 2 will be available for public users in SEPTEMBER
We would appreciate your feedbacks and comments!
Contact Information
28 — U.S. BUREAU OF LABOR STATISTICS • bls.gov
Yezzi Angi Lee
Economist
Division of Consumer Expenditure Surveys
www.bls.gov/cex
202-691-5154