A Hitchhikers Guide to Data Quality_20150331

12
A HITCHHIKER'S GUIDE TO DATA QUALITY Tatiana Stebakova The Data & Information Assembly Australia April 2015

Transcript of A Hitchhikers Guide to Data Quality_20150331

Page 1: A Hitchhikers Guide to Data Quality_20150331

A HITCHHIKER'S GUIDE TO DATA QUALITY

Tatiana Stebakova

The Data & Information Assembly Australia April 2015

Page 2: A Hitchhikers Guide to Data Quality_20150331

Evolution of DQ Governance approach over the past 10 years

How to make a quantum leap from DQ theory to execution, personal view

You’ve done it all by the book, but there is little traction in Data quality. DQ and system’s thinking. Don’t panic!

Content

Page 3: A Hitchhikers Guide to Data Quality_20150331

Evolution of DQ Governance approach over the past 10 years

Data Duplicates – still magic words

Data Quality Frameworks - from emergence to maturity

Senior Management Support - a breakthrough

Senior Architects Support – little change

Data Quality Governance - from novelty to mainstream

Data Quality Tools and Technology – from luxury to BAU

Metadata - from “what is it?” to “new black”

Page 4: A Hitchhikers Guide to Data Quality_20150331

How to make a quantum leap from DQ theory to execution, personal view

Page 5: A Hitchhikers Guide to Data Quality_20150331

Step1. Data Quality Justification

DQ Horror stories

About 6.5 million Americans are 112 or older. The US Social Security office has 6.5 million people on record as having reached the age of 112, even though only 42 people are known to be that old globally

"Studies in cost analysis show that

between 15% to > 20% of a company’s operating revenue is spent doing things to get around or fix data quality issues"

Larry English

Option 1 – What can we gain?

Option 2 – Scare technique

Page 6: A Hitchhikers Guide to Data Quality_20150331

Option 3 (my favourite) –Risks

"Poor data is like a dirty windscreen. You can continue driving as your

vision degrades, but at some point you must stop and clear the

windscreen or risk everything"

Ken Orr

Page 7: A Hitchhikers Guide to Data Quality_20150331

Step2. Build DQ requirements into solution architecture and system’s development contract

Example of DQ requirements

ETL solution SHALL have capability to perform Column integrity screening/ profiling

ETL solution SHALL have capability to perform Data Structure screening/ profiling

ETL solution SHALL have capability to perform Compliance to Business rule screening/ profiling

ETL controls solution SHALL capture and store the date and time that the data batch extraction process

completed successfully.

Editorial note: This may or may not be the same date as the Batch Business Schedule Date. It is

recommended to use ISO 8601 standard to represent the date/ time.

Quality should be built into the product, and testing alone cannot be relied to ensure product quality (FDA, Current Good Manufacturing Practice)

The … ETL controls solution SHALL perform a periodic full snapshot

of the same data for reconciliation purposes, if Delta files are used.

The … ETL solution SHALL have capability to perform Data

Structure screening/profiling

The … data extract process SHALL support logical data

consistency (temporal relationship of data).

Page 8: A Hitchhikers Guide to Data Quality_20150331

Step3. Build data quality requirements into system’s operation contract + DQ KPIs

“I’ve never been a good

spectator.

Either I’m playing the

game or I’m not

interested.”Christiaan Barnard, the first surgeon,

performed heart transplant

…..solution shall have a capability to measure and report on the data quality Key Performance Indicators

(KPIs) as defined by the Governance authority.

KPI Examples:

• customer record uniqueness

• directory currency and accessibility

• information provenance.

• uptake rate - coverage

• quality of records per DQ dimensions and characteristics

• response time for typical transactions.

Page 9: A Hitchhikers Guide to Data Quality_20150331

You’ve done it all by the book, but there is little traction in Data quality.

Don’t be afraid

From Hitchhiker to Hijacker Become a driver. Apply for the architect’s, project lead or data

management jobs

Drop your “data quality bugs/requirements” anywhere you can

Look for opportunities. Change your strategy all the time

Mimic your requirements, do not call them DQ requirements

Lean on standards

Do not reference DQ gurus. Reference Technology gurus instead

Befriend architects

Be patient, keep cool

““Success is not final,

failure is not fatal: it is

the courage to continue

that counts.” Winston Churchill

Page 10: A Hitchhikers Guide to Data Quality_20150331

Complex adaptive systems (CAS) - are dynamic systems able to adapt with a changing environment where all participants are closely linked with each other making up an “IT ecosystem” (MIT)

Within such ecosystem, change becomes not so much as adaptation, but co-evolution with all other related systems

Rules of flocking: Follow the leader

Align with neighbours

Avoid overcrowding

Data Quality and system’s thinking

Page 11: A Hitchhikers Guide to Data Quality_20150331

System’s thinking – delayed response

Launch date - 2 March 2004

Mission duration 10 years, 11 months and 23 days

6.5 billion Kilometres

“After 10 years, and a journey of more than six billion kilometres, the Rosetta spacecraft sent its fridge-sized Philae lander down to Comet

67P/Churyumov-Gerasimenko”.

Page 12: A Hitchhikers Guide to Data Quality_20150331

Questions