· • The difference between profiling and auditing. 4 Basic system architecture Staging Data...

22

Transcript of  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data...

Page 1:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

1

Page 2:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

<Insert Picture Here>

OWB Data Quality – Best PracticesJean-Pierre DijcksDecember 2008

Page 3:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

3

Agenda

• Building a data quality firewall• The importance of data rules• The difference between profiling and auditing

Page 4:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

4

Basic system architecture

StagingData Layer

Operational data layer

Performance data layerSiebel CRM

Oracle EBS

PeopleSoft

SAP/R3

Other Sources

Data Sources

Message Queues

Page 5:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

5

Building a data quality firewall

StagingData Layer

Operational data layer

Performance data layerSiebel CRM

Oracle EBS

PeopleSoft

SAP/R3

Other Sources

Data Sources

Message Queues

DataProfiling

Stage 2 DataCorrection

Schema & Data Type Correction

Data Audits

Data Audits

DataGovernance

Page 6:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

6

Building a data quality firewall

StagingData Layer

Operational data layer

Performance data layer

Siebel CRM

Oracle EBS

PeopleSoft

SAP/R3

Other Sources

Data Sources

Message Queues Profile Workspace

Move Sample Data to Profile Workspace

Page 7:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

7

Schema and Data Type Correction

• Leverage data profiling for• Generating the staging area tables• Schema corrections• Data Type corrections (enforce real data types)

Oracle EBS

StagingData LayerDiscuss with business users

Untangle for lookups or recoding

Profile data

Schema & Data Type Correction

Page 8:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

8

Anatomy of the operational data layer

Goal:• Create lowest grain data for

reporting• Create a schema to service all

applications with correct data• Act as source for performance

layer

Characteristics• De-normalized but still close to 3-

NF• Relationships established and

enforced• Data corrected and de-duplicated• Permanent data

Operational data layer

Page 9:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

9

Loading the operational layer

• Leverage in-database architecture• Do all the hard work here!• Load between schemas – not databases• Huge performance gains through OWB architecture

• Embed data quality into the loads• Create a data quality fire wall

• Strictly enforce all required rules• Document all erroneous data and correct if desired

• Do matching and merging to create uniqueness from many data flows• Create master data records• Re-code as necessary• Re-key as necessary• Keep cross references

Page 10:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

10

Data Quality Fire Wall

Cleanse:• De-duplicate incoming data• Fix data issues

• Name and address• String comparisons

Protect:• Enforce referential integrity• Enforce data rules• Enforce data types and

conversions

• Report• Data issues• Quality levels• Quality trends

Operational data layer

ProtectCleanse Report

Page 11:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

11

Feeding non-DW systems

• Always load from the operational layer

• Delivers flexibility and lowest grain to external systems

• Aggregate on the way out if required (not typical)

• Delivers clean data, with measured service levels for DQ

Operational data layer

ProtectCleanse Report

Page 12:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

12

Data QualityThe importance of data rules

1) Profile Know your data

Data Rules

Correction Mappings Data Auditors

Coherent Data Audit Results and trends

2) Generate

3) Operate 4) Monitor

Trust your data

Information

5) Report

Fear your data

Ignorance

Page 13:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

13

Data QualityData Profiling – Unique Capabilities

• Complete offering

• Two usage modes:• Use to investigate

unknown data• Use to validate known

business rules against real data

Page 14:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

14

Data Profiling vs. Data Auditing

Data Profiling • Ad-hoc when required• Discovery in search of

unknowns• Time consuming• Resource intensive

Data Auditing:• Continuous processes• Planned to be done

repetitively • Gathers information over time• Small tasks

Both serve the same purpose through different means

Page 15:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

15

<Insert Picture Here>

D E M O N S T R A T I O N

Page 16:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

16

Performance Tips for Data Profiling

• Data Profiling is a highly processor and I/O intensive process

• Run large profiles (>10M Rows in a table) on multi-processor machines

• Use parallel:• OWB uses /*+ PARALLEL(<TBL>) */ hints in DP queries• Default degree of parallelism is picked up from database

• Balance your configuration• Stripe data across disks using ASM• Make sure I/O and CPU ratios are remotely correct

Page 17:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

17

Performance Tips for Data Profiling

• When loading the workspace you are moving lots of data => optimize this:• Place the profile workspace in the same database as the

source data• Enable the source tables for parallel reads• Consider moving the data with regular OWB maps first, or use

Transportable Tablespaces or Data Pump

• Memory:• SGA should be no less than 500MB, preferably be around 2-

3G for most profiles• Buffer cache hit ratio >95%• Library cache hit ratio >99%

Page 18:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

18

Further Reference Material

• http://blogs.oracle.com/warehousebuilder• Data Quality posts about:

• Using data rules for Referential Integrity• Key Quality Indicators• Match and Merge

• Demonstrations on OTN• Data Profiling and Corrections• Fuzzy match and merging• Name and address cleansing

• Training• Extending your Knowledge (data profiling handson)

Page 19:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

19

New Features for DQBeta Program for 11gR2

If you are interested in the beta please contact the OWB product management team:

• Michelle Bird ([email protected])

Or directly go to:http://otnbeta.oracle.com/bpo/prospects/index.htm

Make sure to mention Michelle as sponsor.

Page 20:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

20

Questions

Page 21:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

21

Summer2009

Spring2009

CY2010

CY2011

UnifiedTeam

UnifiedPlatform

High-Level Data Integration RoadmapNatural Upgrade Path for Existing Solutions

• OWB/ODI Investments are Fully Protected

• No Forced Migrations• Natural Upgrade Path• Unified Platform aims to be

a Superset of Existing Products – no regression

Page 22:  · • The difference between profiling and auditing. 4 Basic system architecture Staging Data Layer Operational data layer Siebel Performance data layer

22