INTRODUCTION TO DATA QUALITY SERVICES Presentation by Tim Mitchell (Artis Consulting) .

Post on 16-Jan-2016

213 views 0 download

Transcript of INTRODUCTION TO DATA QUALITY SERVICES Presentation by Tim Mitchell (Artis Consulting) .

INTRODUCTION TO DATA QUALITY SERVICES

Presentation by Tim Mitchell (Artis Consulting)www.TimMitchell.net

2

Today’s Agenda

Overview of DQS

Structure

Knowledge Base

DQS Project

Operations

Matching

Cleansing

Administration

SSIS Component

Shortcomings

3

About the Presenter

Tim Mitchell

BI Consultant, Artis Consulting

North Texas SQL Server User Group

SQL Server MVP

Contributing author, MVP Deep Dives Vol 2

Coauthor, SSIS Design Patterns

TimMitchell.net | twitter.com/Tim_Mitchell

4

Housekeeping

Questions

Surveys

v

Overview of Data Quality Services

6

What is DQS?

DQS is a knowledge driven data cleansing and matching services

Built on top of SQL Server 2012

Simple yet powerful interface

7

What is DQS?

8

What is DQS?

Replaces manual data quality work you’re already doing

Stored procedures

Triggers

Custom applications

v

DQS Structure

10

Knowledge Base

DQS Structure and Flow

DomainsMatching Policies

Composite Domains

Matching Project

Cleansing Project

Matching Project

Cleansing Project

Cleansing Project

11

Knowledge Base

Starting point for data quality provisioning

Uses locally customized data stores or marketplace data sources

Highly reusable and evolutionary

Key elements:

Domains

Matching policies

12

Knowledge Base

Create by:

Knowledge discovery

Domain management

Matching rule

13

Knowledge Base

14

Domains

Domain = data field

Domain rules

Composite domains

Allows greater flexibility in domain rules

15

Data Quality Project

Create interactive projects for data matching and cleansing

Leverage one or more domains in an existing knowledge base

Somewhat reusable

16

Data Quality Project

Nondestructive – no changes to source of data to be cleansed

No changes to the KB either

Separately, DQS project data can be used to improve the knowledge base

17

Data Quality Project

18

DQS Operations

Cleansing

Process data against known entities and domain rules

Similar to Fuzzy Lookup transform in SSIS

Matching

Group data together

Similar to Fuzzy Grouping transform in SSIS

19

DQS Administration

Monitor past activity

Set logging options

Set confidence thresholds

20

DQS Administration

21

DQS and SSIS

SQL Server Integration Services has integrated hook into DQS

DQS Cleansing Component

Provide automated, noninteractive data cleansing operations

22

DQS and SSIS

v

Demos

24

Shortcomings

V1 product

No API – must use DQS client interactively

SSIS component only does cleansing

25

Final Thoughts

CU1 performance improvements

http://bit.ly/IKmMow

DQS videos / blogs

http://technet.microsoft.com/en-us/sqlserver/hh780961

My blog (www.TimMitchell.net)

DQS/MDS virtual chapter

masterdata.sqlpass.org

v

Questions?