Survival Guide: Taming the Data Quality Beast
-
Upload
techwellpresentations -
Category
Software
-
view
132 -
download
1
Transcript of Survival Guide: Taming the Data Quality Beast
4/23/15
1
Survival Guide: Taming the Data Quality Beast
By Shauna Ayers and Catherine Cruz Agosto
About . • Availity is a trusted intermediary for informa:on exchange between health plans and providers
• Availity eases the complexity of moving business and clinical informa:on to health care stakeholders na:onwide
• Availity’s real-‐:me, point-‐to-‐point connec:vity provides speed and accuracy at the intersec:on of health care and technology
• Availity’s tools include: – A mul:-‐payer Web Portal – An all-‐payer Advanced Clearinghouse – A powerful Revenue Cycle Management suite – A smarter Pa:ent Access solu:on
4/23/15
2
Overview
• Data Quality Defini:ons and Impact • The 5 Goals of Data Quality • The 4 Pillars of Data Quality • The Flow of Your Data • The 4 V’s of Your Data Sets • The Proper:es of Your Data • Sharing the Health of Your Data
Defini:ons and Impact • Data quality is data's fitness and usability for its intended
purpose. • Data quality assurance is the monitoring and analysis of
data sets and the processes that create or manipulate data, in order to ensure the data’s quality meets the company's needs.
• The role of data quality assurance within the company is to iden:fy problems with its data and to manage these problems, preven:ng them wherever possible, and correc:ng those that cannot be prevented.
• Func?ons suppor?ng data quality assurance, and frequently integrated with it, include but are not limited to data governance, data architecture, data stewardship, data quality tes:ng, and data cleansing.
4/23/15
3
The 5 Goals of Data Quality
• Prevent • Detect • Communicate • Mi:gate • Correct These goals guide us and light our path.
The 4 Pillars of Data Quality • Analysis and Profiling • Strategies and Tac:cs • Tes:ng • Intelligence
4/23/15
4
• Data is not sta:c. It constantly flows between data sets and applica:ons in con:nuing waves of gathering, delivery, storage, integra:on / transforma:on, retrieval and analysis.
• …So, how do we test a moving target?
The Flow of Your Data
The 4 V’s of Your Data Sets The scale of your data is driven by the four V’s: • Volume • Variety • Vitality • Velocity The boundaries of each data set are defined by business rules and constraints. The content of each data set is what is measured or evaluated.
Volume
Variety Velocity
Vitality
4/23/15
5
The Proper:es of Your Data The quality of your data is driven by various proper:es: • Accuracy • Completeness • Timeliness • Consistency • Validity • Temporal Reliability • Interpretability • Accessibility • Usage • Precision • Uniqueness
Property + Business Value = Impact of Quality problem
Sharing the Health of Your Data To find your quarry, and tame it, you must be able to see the forest for the trees. Ar:facts used to communicate data system health: • Dashboards • System monitoring alerts • Reports • Bug-‐tracking :ckets
4/23/15
6
Analysis and Profiling Pillar Analyzing the data can give valuable insight into the data. It can shed light on paberns that might not have been seen previously. Profiling allows for similar data to be grouped. • Categoriza:on • Methods • “Gotchas” and possible challenges • Gathering metrics – On data – On test coverage
• Dependencies, rela:onships and paberns
Strategies and Tac:cs Pillar Most companies use a mix of strategies and tac:cs, such as: • Input valida:on • Cri:cal value checks (sampling or periodic analysis of standing data)
• In-‐line valida:on • Hash values and checksums • Tolerance checks and sta:s:cal analysis
• Architectural and domain integrity checks
Without a plan, your results can be haphazard.
4/23/15
7
Tes:ng Pillar Types of tests • Count checks • Compare checks • Business Rule Valida:on • Null value checks • Code Checks Methods and Strategies • Exploratory • Manual • Automated Tools • Buying vs. In-‐house • Machine cannot replace a human
Intelligence Pillar Data Quality intelligence provides visibility of the data environment, suppor:ng: • Opera:onal Troubleshoo:ng • Process Improvement • Risk Analysis • Data Governance and Regulatory Compliance
Metrics useful for DQ Intelligence • Current state: unresolved defects or failed tests • Property Tolerances: e.g., histogram analysis, % change over
:me • Defect Trends over :me: defect count by data set or type • Test Coverage: % implemented/% possible
4/23/15
8
Property: Accuracy • Defini:on: Whether the data values stored for an object are the correct values. To be correct, a data value must be the right value, and must be represented in a consistent and unambiguous form.
• Possible DQ checks: Hash values and checksums, business rule valida:ons, source-‐to-‐target value comparisons
• Examples: – Mismatch between labeling and content – American vs European date formats – “John Doe” vs “JOHN DOE”
Property: Completeness • Defini:on: When all the data required to meet the requirements/business need is available in the target
• Possible DQ checks: Source-‐to-‐Target Count checks, Compare Checks, not-‐null checks
• Examples: – Inconsistent data types between source and target
– Unenforced column is null in the target. – Missing criteria in filter causing records to be missed
4/23/15
9
Property: Timeliness • Defini:on: Whether data is visible when the user or consuming applica:on expects it to be.
• Possible DQ checks: process control tolerance checks, ID comparisons, missing update checks
• Examples: – Package delivery – Credit card account ac:vity – CRM data
Property: Consistency • Defini:on: The process works all the :me. No maber what source you get the data from, it should be the same if it correlates.
• Possible DQ checks: Business Rule Valida:on, Source-‐to-‐target Compare
• Example: – Table A shows one address for customer and Table B shows another
– Account informa:on is different when look at profile on website vs mobile app
4/23/15
10
Property: Validity • Defini:on: The correctness and reasonableness of data, how well it conforms to the syntax (format, type, range) of its defini:on.
• Possible DQ checks: input valida:on, parametric checks, domain checks
• Examples: – Two-‐digit years on birthdates for Medicare enrollees
– Nega:ve cycle :mes – Invalid customer codes
Property: Temporal Reliability • Defini:on: Time dependent data • Possible DQ checks: Source to target count checks, Compare checks
• Example: – Source to view change from daily to real-‐:me – Process loads data to source table is delayed
4/23/15
11
Property: Interpretability • Defini:on: How easy is it to extract understandable informa:on from the data
• Possible DQ checks: Histograms, source-‐to-‐target ID compares over date range
• Examples: – Units of measurement: Metric mishap caused loss of NASA orbiter
Property: Accessibility • Defini:on: Is it available? • Possible DQ checks: Security checks, source-‐to-‐target checks
• Examples: – User unable to search for data when using one iden:fier but can find record using a different iden:fier
– Order specific
4/23/15
12
Property: Usage • Defini:on: Does the data support the usage to which it is being applied?
• Possible DQ checks: Duplicate checks, histograms, ID compares over :me, domain checks
• Examples: – Time Zone assump:ons: Data from the future – Page rankings derived from links to the page – Cross-‐grain configura:on values (“All” or “Other”)
Property: Precision • Defini:on: Correla:on between what is reality and what is shown in the data.
• Possible DQ checks: Business Rule Valida:on, Source to target comparison
• Example: – Incorrect address displayed for customer – Showing Customer A data in Customer B’s account page
– Calcula:ons
4/23/15
13
Property: Uniqueness • Defini:on: What makes a data en:ty one of its kind.
• Possible DQ checks: Duplicate checks • Examples: – Mul:ple customer entries in CRM system – Mul:ple conflic:ng configura:on entries for same en:ty
– Duplicate inventory entries
Overall picture/ conclusion • Any expedi:on to ensure data quality in the living, dynamic data ecosystem that occurs in every company requires the following: – clear goals to guide efforts, – a func:onal framework providing the tools to work with,
– an understanding of the living flow of your data, – an understanding of its fundamental shape and nature
– clear communica:on of these elements to all members of the party involved