® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

22
® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage

Transcript of ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

Page 1: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

®

IBM Software Group

©IBM Corporation

IBM Information Server

Cleanse - QualityStage

Page 2: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

IBM Information ServerDelivering information you can trust

Understand

Cleanse Transform Deliver

Discover, model, and govern information

structure and content

Standardize, merge,and correct information

Combine and restructure

information for new uses

Synchronize, virtualize and move information for in-

line delivery

ParallelProcessing Connectivity Metadata DeploymentAdministration

Platform Services

Support for Service-Oriented Architectures

22

Page 3: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

3

The IBM Solution: IBM Information ServerDelivering information you can trust

Understand

Transform Deliver

Parallel ProcessingRich Connectivity to Applications, Data, and

Content

IBM Information Server

Unified Deployment

Unified Metadata Management

Cleanse

WebSphere QualityStageData cleansing, standardization, matching, and survivorship for enhancing data quality

and creating coherent business views

Page 4: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

Need for Data Quality

4

Critical Problems Need to create & maintain 360 degree views of

customers, suppliers, products, locations, events Need to leverage data - make reliable decisions,

comply with regulations, meet service agreementsWhy? No common standards across organization Unexpected values stored in fields Required information buried in free-form fields Fields evolve - used for multiple purposes No reliable keys for consolidated views Operational data degrades 2% per month

Alternative Approaches Denial – problem misunderstood and ignored until

too late; load and explode Hand-coding - clerical exception processing; very

time consuming and resource intensive Simplistic cleansing apps - evolved from direct

marketing & list hygiene, lack flexibility

Kent Fried Chick

Kentucky Fried

Kentucky Fried Chicken

KFC

Molly Talber DBA KFC

Mrs. M. Talber

John & Molly Talber

Talber, KFC, ATIMA

Data Sources Data ValuesData Sources Data Values

227G CB&NATURAL STICKMOZZ WRAPPER

227G CB&NAT STICK P QUE/MOZZ WRAPP.

4

Page 5: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

Why Should I Care About Cleansing Information?

Lack of information standards Different formats & structures

across different systems

Data surprises in individual fields Data misplaced in the database

Information buried in free-form fields

Data myopia Lack of consistent identifiers inhibit

a single view

The redundancy nightmare Duplicate records with a lack of

standards

Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116

Catherine Roberts Four sixteen Columbus APT2, Boston, MA 02116

Mrs. K. Roberts 416 Columbus Suite #2, Suffolk County 02116

Name Tax ID Telephone

J Smith DBA Lime Cons. 228-02-1975 6173380300Williams & Co. C/O Bill 025-37-1888 415-392-20001st Natl Provident 34-2671434 3380321HP 15 State St. 508-466-1200 Orlando

WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH

WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES

USE 4 5J868A BOLTS (HEX .25) - DRILL HOLES FOR EA ON WING ASSEM

RUDER, TAP 6 WHOLES, SECURE W/KL2301 RIVETS (10 CM)

19-84-103 RS232 Cable 6' M-F CandS

CS-89641 6 ft. Cable Male-F, RS232 #87951

C&SUCH6 Male/Female 25 PIN 6 Foot Cable

90328574 IBM 187 N.Pk. Str. Salem NH 0145690328575 I.B.M. Inc. 187 N.Pk. St. Salem NH 0145690238495 Int. Bus. Machines 187 No. Park St Salem NH 0415690233479 International Bus. M. 187 Park Ave Salem NH 0415690233489 Inter-Nation Consults 15 Main Street Andover MA 0234190345672 I.B. Manufacturing Park Blvd. Bostno MA 04106

5

Page 6: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

Importance of Data Quality

Low data quality impacts an organization in several ways Poor data quality leads to misguided marketing promotions

Cross sell opportunities may be missed because same customer appears several times in slightly different ways

Valued customers may not be recognized during support calls or other important touchpoints

Data mining is difficult because related items are not detected as related

What is good data quality? Two percent of “bad” data doesn’t sound that bad?

Two percent of 10M rows means that you have 200K errors

200K errors add up to big problem for analytics/operations/anything!

6

Page 7: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

Compliance

Business to Business Standards

Risk Management

Reduce Costs & Increase Productivity

Increase Revenue / CRM Payoff

Business Intelligence Payoff

Supply chain collaboration & item synchronization

Inventory consolidation

Single view of a customer or supplier

ERP Implementations

ERP instance consolidation

IT System renovation

Consolidation resulting from M&A activity

Enterprise Data Warehouse

Compliance & Regulatory projects (SOX, HIPAA, ACCORD, etc.)

Enterprise initiatives……to satisfy critical business requirements.

…need high quality data…

7

Page 8: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

IBM WebSphere QualityStage

Shared design environment with DataStage increases functionality and reduces development time

Visual match rule interface simplifies match tuning

Service orientation provides ‘continuous’ quality & delivers confidence in your data

Parallel architecture shortens execution time

8

Page 9: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

9

Database with Consolidated

Views

1. Free Form Investigation

2. Data Standardization

3. Data Matching

4. Data Survivorship

WebSphere QualityStage Process

Customers

Transactions

Vendors / Suppliers

Target

Products / Materials

How will you get an accurate, consolidated view of your business?

Page 10: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

10

Why Investigate

Discover trends and potential anomalies in the data

100% visibility of single domain and free-form fields

Identify invalid and default values

Reveal undocumented business rules and common terminology

Verify the reliability of the data in the fields to be used as matching criteria

Gain complete understanding of data within context

Page 11: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

11

Investigation - Free Form

Parsing:Separating multi-valued fields into individual pieces

“The instructions for handling the data are inherent within the data itself.”

123 | St. | Virginia | St.

VirginiaVirginia

Lexical analysis:Determining business significance of individual pieces

Context Sensitive:Identifying various data structures and content

number street state street type type

123 | St. | Virginia | St.

House Street Street Number Name Type

123 | St. Virginia | St.

123123 St.St. St.St.

Page 12: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

12

Rule Sets Pre-defined rules for parsing and

standardizing: Name Address Area (City, State and Zip)

Multi-national address processing

Validate structure: Tax ID US Phone Date Email

Append ISO country codes

Pre-process or filter name, address and area

Rule sets are stored in the common repostiory

Page 13: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

13

Standardization - Example

Input File:

Address Line 1 Address Line 2

639 N MILLS AVENUE ORLANDO, FLA 32803306 W MAIN STR, CUMMING, GA 301303142 WEST CENTRAL AV TOLEDO OH 43606843 HEARD AVE AUGUSTA-GA-309041139 GREENE ST ACCT #1234 AUGUSTA GEORGIA 309014275 OWENS ROAD SUITE 536 EVANS GA 30809

Result File:

House # Dir Str. Name Type Unit No. NYSIIS City SOUNDEX State Zip ACCT#

639 N MILLS AVE MAL ORLANDOO645 FL 32803 306 W MAIN ST MAN CUMMINGC552 GA 30130

3142 W CENTRAL AVE CANTRAL TOLEDO T430 OH 43606

843 HEARD AVE HAD AUGUSTA A223 GA 30904

1139 GREENE ST GRAN AUGUSTA A223 GA 30901 1234

4275 OWENS RD STE 536 ON EVANS E152 GA 30809

Page 14: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

14

Why Match

Identify duplicate entities within one or more files

Perform householding

Create consolidated view of customer

Establish cross-reference linkage

Enrich existing data with new attributes from external sources

Page 15: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

15

WILLIAM J KAZANGIAN 128 MAIN ST 02111 12/8/62

WILLAIM JOHN KAZANGIAN 128 MAINE AVE 02110 12/8/62

Are these two records a match?

Deterministic Decisions Tables:• Fields are compared• Letter grade assigned• Combined letter grades are compared to a vendor delivered file• Result: Match; Fail; Suspect

B B A A B D B A = BBAABDBA +5 +2 +20 +3 +4 -1 +7 +9 = +49

Probabilistic Record Linkage:• Fields are evaluated for degree-of-match• Weight assigned: represents the “information content” by value• Weights are summed to derived a total score• Result: Statistical probability of a match

Two Methods to Decide a Match

Page 16: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

16

Why Survive

Provide consolidated view of data

Provide consolidated view containing the “best-of-breed” data

Resolve conflicting values and fill missing values

Cross-populate best available data

Implement business and mapping rules

Create cross-reference keys

Page 17: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

17

Survivorship - Example

Survivorship Input (Match Output)Group Legacy First Middle Last No. Dir. Str. Name Type Unit

No.1 D150 Bob Dixon 1500 SE ROSS CLARK CIR1 A1367 Robert Dickson 1500 ROSS CLARK CIR

23 D689 Ernest A Obrian 5901 SW 74TH ST STE 20223 A436 Ernie Alex O’Brian5901 SW 74TH ST23 D352 Ernie Obrian 5901 74 ST # 202

Consolidated Output Group First Middle Last No. Dir. Str. Name Type Unit No.

1 Robert Dickson1500 SE ROSS CLARK CIR

23 Ernie Alex O’Brian 5901 SW 74TH ST STE 202

GroupLegacy1 D150

1 A1367

23 D68923 A43623 D352

Page 18: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

18

How Does WebSphere QualityStage Integrate

Database

DB2OracleSybaseOnyxIDMSetc.

Target

1. Investigation2. Standardizati

on3. Integration4. Survivorship

QualityStage

Data Extraction and Load Routines

DB2OracleSybaseOnyxIDMSetc.

Page 19: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

19

WebSphere DataStage andWebSphere QualityStage: Fully Integrated!

Page 20: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

QualityStage: Data Quality Extensions

IBM WebSphere QualityStage GeoLocator

IBM WebSphere QualityStage Postal Verification ProductsWAVES (WorldWide)

IBM WebSphere Worldwide Address Verification Solution

IBM WebSphere QualityStage Postal Certification ProductsCASS (United States)

SERP (Canada)

DPID (Australia)

IBM Information Server Data Quality Module for SAP

IBM WebSphere QualityStage for Siebel2020

Page 21: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

IBM Software Group

Key Strengths for IBM QualityStage

Intuitive, “Design as you think” User InterfaceSimple rule design & fine tuning

Seamless Data Flow integration

Intuitive rule design & fine tuning

Defining the technology standard with SOA

Industry leading probabilistic matching engine

2121

Page 22: ® IBM Software Group ©IBM Corporation IBM Information Server Cleanse - QualityStage.

®

IBM Software Group

©IBM Corporation

Thank You