DQ 951 IDDUserGuide En

download DQ 951 IDDUserGuide En

of 39

Transcript of DQ 951 IDDUserGuide En

  • 8/13/2019 DQ 951 IDDUserGuide En

    1/39

    Informatica Data Director for Data Quality (Version 9.5.1)

    User Guide

  • 8/13/2019 DQ 951 IDDUserGuide En

    2/39

    Informatica Data Director for Data Quality User Guide

    Version 9.5.1December 2012

    Copyright (c) 1998-2012 Informatica. All rights reserved.

    This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any forby any means (electronic, photocopying, recording or otherwise) w ithout prior consent of Informatica Corporation. This Software may be protected by U.S. and/or internatioPatents and other Patents Pending.

    Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided iDFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applica

    The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.

    Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica OnDemand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and InformatMaster Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other compaand product names may be trade names or trademarks of their respective owners.

    Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rightsreserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rightsreserved.Copyright Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright MetaIntegration Technology, Inc. All rights reserved. Copyright Intalio. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. Arights reserved. Copyright DataArt, Inc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved.Copyright Rogue Wave Software, Inc. All rights reserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright

    Glyph & Cog, LLC. All rights reserved. Copyright Thinkmap, Inc. All rights reserved. Copyright Clearpace Software Limited. All rights reserved. Copyright InformationBuilders, Inc. All rights reserved. Copyright OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All righreserved. Copyright International Organization for Standardization 1986. All rights reserved. Copyright ej-technologies GmbH. All rights reserved. Copyright JaspersoCorporation. All rights reserved. Copyright is International Business Machines Corporation. All rights reserved. Copyright yWorks GmbH. All rights reserved. CopyrightLucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright Daniel Veillard. All rights reserved. Copyright Unicode, Inc.

    Copyright IBM Corp. All rights reserved. Copyright MicroQuill Software Publishing, Inc. All rights reserved. Copyright PassMark Software Pty Ltd. All rights reserved.Copyright LogiXML, Inc. All rights reserved. Copyright 2003-2010 Lorenzi Davide, All rights reserved. Copyright Red Hat, Inc. All rights reserved. Copyright The Boof Trustees of the Leland Stanford Junior University. All rights reserved. Copyright EMC Corporation. All rights reserved. Copyright Flexera Software. All rights reserved

    This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writingsoftware distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations under the License.

    This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at httwww.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but notlimited to the implied warranties of merchantability and fitness for a particular purpose.

    The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvineand Vanderbilt University, Copyright () 1993-2006, all rights reserved.

    This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistributionthis software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.

    This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, . All Rights Reserved. Permissions and limitations regarding this

    software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or withoufee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

    The product includes software copyright 2001-2005 () MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availabat http://www.dom4j.org/ license.html.

    The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http://dojotoolkit.org/license.

    This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thsoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

    This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at httpwww.gnu.org/software/ kawa/Software-License.html.

    This product includes OSSP UUID software which is Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & WirelessDeutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

    This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subto terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

    This product includes software copyright

    1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

    This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// www.eclipse.org/org/documents/epl-v10.php.

    This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.orghttp://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- licenagreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.orglicense.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/

  • 8/13/2019 DQ 951 IDDUserGuide En

    3/39

  • 8/13/2019 DQ 951 IDDUserGuide En

    4/39

    Table of Contents

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i i i

    Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    Chapter 1: Introduction to Informatica Data Director for Data Quality. . . . . . . . . . . . . . . . 1

    Informatica Data Director for Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Tasks and Workflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Informatica Data Director for Data Quality User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Logging In to Informatica Data Director for Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Tasks and Dashboard Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Task Administration Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Chapter 2: Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Tasks Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Human Tasks andTask Instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Task Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Task Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Task Ownership. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Task Data Export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Exception Task Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    DuplicateTask Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Task Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Multiple Task Completion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Exporting Task Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Assigning a Task to a User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Viewing Tasks Assigned to Other Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Viewing a List of Tasks in a Human Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Completing Multiple Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Chapter 3: Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Exception Records Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Data View forException Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Table of Contents i

  • 8/13/2019 DQ 951 IDDUserGuide En

    5/39

    Exception Task Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    Exception Record Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    Exception Record Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Exception Record Actions and Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Exception Record Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    Editing Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Filtering Exception Tasks in the Data View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Setting the Status of an Exception Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Reviewing Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    Chapter 4: Duplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    Duplicate Records Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    Data View forDuplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    Duplicate Task Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Duplicate Record Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Duplicate Record Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Duplicate RecordActions and Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Duplicate Record Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    Filtering Duplicate Tasks in the Data View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Editing a Cluster in a Duplicate Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Creating a Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    Finding Duplicate Records in Multiple Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    Setting the Status of a Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    Reviewing Duplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    Chapter 5: A u d i t T r a i l O p e r a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 7

    Audit Trail Operations Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Audit Trail Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    Audit View Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    Status Filter Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    Audit View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    Opening an Audit Trail from the Data View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    Opening an Audit Trail from the Inbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    Filtering Records in the Audit View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    ii Table of Contents

  • 8/13/2019 DQ 951 IDDUserGuide En

    6/39

    Preface

    The Informatica Data Director for Data Quality User Guidedescribes the features and functionality of Informatica

    Data Director for Data Quality.

    Informatica Data Director for Data Quality is a web-based application that data analysts use to perform tasks on

    database tables. Use this guide if you are assigned a task in Informatica Data Director for Data Quality.

    Informatica Resources

    Informatica Customer Portal

    As an Informatica customer, you can access the Informat ica Customer Portal site at

    http://mysupport.informatica.com. The site contains product information, user group information, newsletters,

    access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,

    the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Product

    Documentation, and access to the Informatica user community.

    Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you have

    questions, comments, or ideas about this documentation, contact the Informatica Documentation team through

    email at [email protected]. We will use your feedback to improve our documentation. Let us

    know if we can contact you regarding your comments.

    The Documentation team updates documentation as needed. To get the latest documentation for your product,

    navigate to Product Documentation from http://mysupport.informatica.com.

    Informatica Web Site

    You can access the Informatica corporate web site at http://www.informatica.com. The site contains information

    about Informatica, its background, upcoming events, and sales offices. You will also find product and partner

    information. The services area of the site includes important information about technical support, training and

    education, and implementation services.

    Informatica How-To Library

    As an Informatica customer, you can access the Informat ica How-To Library at http:/ /mysupport .informatica.com.

    The How-To Library is a collection of resources to help you learn more about Informatica products and features. It

    iii

    http://www.informatica.com/http://mysupport.informatica.com/http://www.informatica.com/http://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/
  • 8/13/2019 DQ 951 IDDUserGuide En

    7/39

    includes articles and interactive demonstrations that provide solutions to common problems, compare features and

    behaviors, and guide you through performing specific real-world tasks.

    Informatica Knowledge Base

    As an Informatica customer, you can access the Informat ica Knowledge Base at http: //mysupport .informatica.com.

    Use the Knowledge Base to search for documented solutions to known technical issues about Informatica

    products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If

    you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base

    team through email at [email protected].

    Informatica Multimedia Knowledge Base

    As an Informatica customer, you can access the Informat ica Multimedia Knowledge Base at

    http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files

    that help you learn about common concepts and guide you through performing specific tasks. If you have

    questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base

    team through email at [email protected].

    Informatica Global Customer Support

    You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires

    a user name and password. You can request a user name and password at http://mysupport.informatica.com.

    Use the following telephone numbers to contact Informatica Global Customer Support:

    North America / South America Europe / Middle East / Africa Asia / Australia

    Toll Free

    Brazil: 0800 891 0202

    Mexico: 001 888 209 8853

    North America: +1 877 463 2435

    Toll Free

    France: 0805 804632

    Germany: 0800 5891281

    Italy: 800 915 985Netherlands: 0800 2300001

    Portugal: 800 208 360

    Spain: 900 813 166

    Switzerland: 0800 463 200

    United Kingdom: 0800 023 4632

    Standard Rate

    Belgium: +31 30 6022 797

    France: +33 1 4138 9226

    Germany: +49 1805 702 702

    Netherlands: +31 306 022 797United Kingdom: +44 1628 511445

    Toll Free

    Austra lia: 1 800 151 830

    New Zealand: 09 9 128 901

    Standard Rate

    India: +91 80 4112 5738

    iv Preface

    http://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/
  • 8/13/2019 DQ 951 IDDUserGuide En

    8/39

    C H A P T E R 1

    Introduction to Informatica Data

    Director for Data Quality

    This chapter includes the following topics:

    Informatica Data Director for Data Quality Overview, 1

    Tasks and Workflows, 1

    Informatica Data Director for Data Quality User Interface , 2

    Informatica Data Director for Data Quality Overview

    Informatica Data Director for Data Quality is a web-based application that you can use to view and update records

    in a database table.

    You use Informatica Data Director for Data Quality to view and update records that were processed in other

    Informatica applications. The records may contain errors, or they may contain duplicate information.

    Informatica Data Director for Data Quality organizes the records into units of work called tasks. A task identifies a

    set of records and specifies the operations you can perform to resolve data quality issues in the records.

    When you log in to Informatica Data Director for Data Quality, you view the list of tasks that you are assigned in

    the Inbox for your account. You select a task, and you work on the data records that it contains. After you view or

    edit a record in a task, you update the status of the record to indicate if it is fit for storage in the database. When

    you complete work on a task, Informatica Data Director for Data Quality removes the task from your Inbox.

    Tasks and Workflows

    A workflow is a set of events, tasks, and decisions that Informatica users define for a data set.

    Informatica users create workflows in the Informatica Developer application. Informatica stores a workflow as an

    object in a database called the Model repository. When a workflow runs, it creates the tasks you perform in

    Informatica Data Director for Data Quality.

    A workflow can contain different types of task. At the workflow level, the type of tasks that you perform in

    Informatica Data Director for Data Quality are called Human tasks, because each task needs human interaction to

    complete.

    1

  • 8/13/2019 DQ 951 IDDUserGuide En

    9/39

    A workflow that contains a Human task must also contain a Mapping task. A Mapping task runs an Informatica

    data process called a mapping. A mapping applies data quality algorithms to data records and corrects data errors

    where possible. When a workflow runs with a Mapping task, the mapping writes the corrected records to a

    specified database table. The mapping writes records that contain unresolved data quality issues to another table.

    The Human task reads the table of unresolved records and assigns the records to Informatica Data Director for

    Data Quality users for manual review.

    Informatica Data Director for Data Quality User Interface

    The Informatica Data Director for Data Quality web interface contains views to display Inbox and task data.

    The interface has the following views:

    Dashboard. Displays the list of tasks assigned to you and tasks that you completed.

    Data. Displays any currently open task.

    Tasks. Displays the Inbox. The Inbox lists the list assigned to you. If the Human task identifies you as a

    business administrator for task instances, the Tasksview displays the tasks that you administer in the TaskAdministrationview.

    Audit. Displays an audit trail for changes you made in a task.

    Logging In to Informatica Data Director for Data Quality

    Use a web browser to log in to Informatica Data Director for Data Quality.

    You need the Informatica Data Director for Data Quality URL and a user name and password. An administrator

    defines the URL and user credentials in the Informatica Administrator application. Ask an administrator for a user

    name, password, and URL.

    1. Open a web browser.

    2. Enter the URL for Informatica Data Director for Data Quality.The URL has the following structure:

    http://[host_name]:[port_number]/IDDService/login.jsp

    3. Enter the user name and password for your account.

    After you log in, select the Tasksview and open the Inboxand view the list of tasks assigned to you.

    Tasks and Dashboard Views

    The Tasksand Dashboardviews display the tasks that are assigned to you.

    The views display the tasks in a table. When you select the Tasksview, select the Inboxtab to view the table of

    tasks.

    The Tasksview also displays a Task Administrationtab. The Task Administrationtab displays the Tasks view

    options and also displays options that allow you to release tasks and assign tasks to other users.

    Use the Tasksand Dashboardviews to perform the following operations:

    Select the Tasksview or Dashboardviews to view the list of tasks you need to complete. On the Tasksview,

    you view tasks in the Inbox.

    2 Chapter 1: Introduction to Informatica Data Director for Data Quality

  • 8/13/2019 DQ 951 IDDUserGuide En

    10/39

    If you are a business administrator, select the Tasksview and then select the Task Administrationtab to view

    the list of tasks that you administer. You can open a task, you can release a task from a user, and you can

    assign a task to a user.

    Inbox

    The Inboxstores the list of tasks that are assigned to you. Use the Inbox to view and open the tasks that you

    need to complete.

    When you are assigned a task, the task appears in the Inbox. Open a task from the Inbox and use the Data view

    to verify or update the records in the task. When you complete the task, Informatica Data Director removes the

    task from your Inbox.

    Note: The Dashboardview and the Inboxtab show the same task list.

    Inbox and Dashboard Options

    The Inboxand Dashboardviews share a common set of options. You can use both views to manage the tasks

    assigned to you.

    The following table describes the options on the Inboxand DashboardViews:

    Option Description

    Filter On, Off Shows or hides the filter fields for the Inbox and Dashboard

    columns.

    Use the filters to search for tasks with common values in a

    column, such as task name or task owner. You can also use

    the filters to search for unique values, such as the task ID.

    Refresh Updates the list of tasks.

    Task Details Displays information for a task in a single dialog box. The

    information includes the task name, type, creation date, due

    date, and status.

    Open Task Opens the task so you can edit records.

    Release Task Releases the task for assignment to another user.

    View Audit Trail Opens an audit trail that describes changes made to the

    records in the task.

    Inbox and Dashboard Columns

    The Inboxand Dashboardviews list the task assigned to you and provide metadata about each task. The views

    organize the task data and metadata in a series of columns.

    The following table describes the columns on the Inboxand Dashboardviews:

    Column Name Description

    Task ID Unique identifier for the task in the workflow. At the workflow

    level, the Task ID identifies the task instance in the Human

    Informatica Data Director for Data Quality User Interface 3

  • 8/13/2019 DQ 951 IDDUserGuide En

    11/39

    Column Name Description

    task. The workflow stores the task ID value so that the ID is

    unique across multiple runs of the workflow.

    Task Title Task title.

    Task Type Task Type. The types are correct exceptions, correct

    duplicates, review exceptions, and review duplicates.

    Owner Name of the user currently assigned to the task.

    Due Date Scheduled date for completion of the task.

    Status Status of the task. If the task has not reached the due date,

    the status indicates the task is on time. If the task in

    incomplete on the due date, the status indicates the task is

    overdue.

    Created Date the task was created.

    Task Administration Options

    If you are a business administrator, you can use the Tasks view to view the tasks that you administer. You can

    perform any action from the Task Administrationtab that other users can perform from the Inbox or Dashboard.

    Use the Task Administrationtab under the Tasks view to open the list of tasks. The Task Administrationtab

    displays additional options for business administrators.

    The following table describes the additional options on the Task Administration tab:

    Option Description

    Reassign Task Changes the owner of a task.

    Complete Linked Tasks Completes all task instances that are linked to the current

    task instance.

    View Task Opens a task to view the work performed.

    4 Chapter 1: Introduction to Informatica Data Director for Data Quality

  • 8/13/2019 DQ 951 IDDUserGuide En

    12/39

    C H A P T E R 2

    Tasks

    This chapter includes the following topics:

    Tasks Overview, 5

    Human Tasks and Task Instances, 6

    Task Types, 6

    Task Status, 7

    Task Data Export, 7

    Task Administration, 9

    Multiple Task Completion, 9

    Exporting Task Data, 10

    Assigning a Task to a User, 10

    Viewing Tasks Assigned to Other Users, 10

    Viewing a List of Tasks in a Human Task, 11

    Completing Multiple Tasks, 11

    Tasks Overview

    The objective of a task is to verify that a set of data records is ready to move to the next stage in a workflow. The

    steps you take tocomplete a task depend on the type of task and the condition of the records in the task.

    You can performthe following steps in a task:

    Correct exceptions.

    Consolidate clusters of duplicate records to a single preferred record.

    Update the status of a record or cluster.

    Review the work done by another user, if the task reads data from an earlier task.

    If you are a business administrator, you can manage the tasks that the Human task assigned to you foradministration.

    5

  • 8/13/2019 DQ 951 IDDUserGuide En

    13/39

  • 8/13/2019 DQ 951 IDDUserGuide En

    14/39

    Task Status

    The task status defines the next stage in the workflow for the records in the task. For example, when you complete

    a task to correct exception records, the records may pass to another task that reviews the records.

    When you complete work on a task, you use the Task Actionsmenu to set the task status.

    Note:A task that corrects data is not always fol lowed by a task that reviews the data. A workflow developer can

    specify correction and review tasks in any order.

    Consider the following guidelines when you set the task status:

    You cannot undo the task status. When you set the task status, the task disappears from your Inbox.

    The Task Actionsmenu can have one or more options. When the menu contains multiple options, you select

    the option that represents the status of the records in the task.

    The options on the Task Actionsmenu are defined in the workflow that contains the task. You cannot add or

    edit options on the Task Actionsmenu.

    Task Ownership

    A workflow can assign a task to more than one user. For example, the workf low may identify a group of users to

    work on a Human task.

    When you open a task, you become the owner of the task. If the workflow assigned the task to multiple users,

    Informatica Data Director removes the task from the Inbox of the other users when you take ownership of the task.

    Task Data Export

    You can export task data to a delimited file. Export data when you want to share the current state of the data with

    others in applications such as Microsoft Excel.

    You can share the data with colleagues who work on the same data project. Export data when you want to verify

    that the updates you make in a task conform to the business rules defined for the project.

    You export all data associated with the task instance, including the record or cluster data and the status data

    saved for the task. The data includes any additional column data created by the Mapping task or Human task that

    ran in the workflow.

    Exception Task Data

    When you export data from an exception task, you export all record data and status data from the database. The

    export process includes any additional columns created by the Mapping task or Human task that ran in the

    workflow.

    The following table describes the metadata columns that export by default with exception data:

    Column Name Description

    ROW_IDENTIFIER Identifies the record row in the database table.

    REVIEW_STATUS The status assigned to the record in Informatica Data Director

    for Data Quality.

    Task Status 7

  • 8/13/2019 DQ 951 IDDUserGuide En

    15/39

    Column Name Description

    The status can be one of the following values:

    - REVIEWED. You marked the record as reviewed.

    - NULL. You did not mark the record as reviewed.

    WORKFLOW_ID Identifies the workflow that contains the Human task

    associated with the task.

    USER_COMMENT Any comment added to the record in Informatica Data Director

    for Data Quality.

    UPDATED_STATUS The update status of the record in the task.

    The status can be one of the following values:

    - UPDATED. You added a comment to the record, or you

    marked the record as reviewed.

    - ACCEPTED. You accepted the record for storage in the

    database table that contains valid data.

    - REJECTED. You rejected the record as unsuitable for

    storage in the database table that contains valid data.

    - REPROCESS. You indicated that the record needs furtherprocessing in another application.

    - NULL. You did not update the record

    RECORD_STATUS The record status set by the workflow. The workflow sets the

    status value when it writes the record to an exception table for

    analysis in a Human task. The default value is INVALID. An

    INVALID value indicates that the current record cannot be

    stored in the database that contains valid data.

    Duplicate Task Data

    When you export data from a duplicate task, you export all cluster data and status data from the database. The

    export process includes any additional columns created by the Mapping task or Human task that ran in theworkflow.

    The following table describes the metadata columns that export by default with exception data:

    Column Name Description

    ROW_IDENTIFIER Identifies the record row in the database table.

    SEQUENTIAL_CLUSTER_ID Unique identifier for the cluster in the database table.

    CLUSTER_ID Identifies the cluster that the record belongs to. The Mapping

    task assigns a cluster ID value to each record in the table.

    MATCH_SCORE Decimal value between 0 and 1. Identifies the degree of

    similarity between two records in the cluster.

    IS_MASTER Indicates if the record is the preferred record in the cluster.

    The possible values are Y and N.

    UPDATED_STATUS The update status of the record.

    8 Chapter 2: Tasks

  • 8/13/2019 DQ 951 IDDUserGuide En

    16/39

    Column Name Description

    The status can be one of the following values:

    - UPDATED. You updated a value in the record.

    - NULL. You did not update the record.

    - EXTRACTED. You removed the record from a cluster.

    USER_COMMENT Any comment added to the cluster in Informatica Data

    Director for Data Quality.

    REVIEW_STATUS The status assigned to the cluster in Informatica Data Director

    for Data Quality.

    The status can be one of the following values:

    - REVIEWED. You confirmed the record as reviewed.

    - NULL. You did not mark the record as reviewed.

    WORKFLOW_ID Identifies the workflow that contains the Human task

    associated with the task.

    Task Administration

    If you are a business administrator, you manage the status of a set of task instances. For example, if a user is

    unable to complete a task instance on schedule, you can reassign the task to another user.

    The Task Administrationtab displays the tasks that you manage in addition to any task that is assigned to you.

    Use Task Administrationtab options to perform the following operations:

    Open tasks assigned to you. If you are assigned a task to complete, you can use the Task Administrationtab

    in the same way as the Inbox.

    Assign tasks to users. For example, you can assign a task to a new user if the current user did not complete

    the task.

    View the work performed by a user on a task. For example, you can review the rate of user progress in a task

    and verify that the user is performing the task correctly.

    View the list of tasks that share a common Human task in a workflow.

    Complete multiple tasks. You can move a set of tasks to a completed state when the tasks represent a

    common Human task in a workflow. When you complete the tasks, the records in the tasks move from the

    Human task to the next stage in the workflow.

    Multiple Task Completion

    You can complete multiple tasks in a single operation.

    Complete multiple tasks in the following cases:

    The workflow process failed, and you want to run the workflow again.

    Users did not complete the task instances by the scheduled date, and you want to delete the tasks from the

    user Inbox views.

    Task Administration 9

  • 8/13/2019 DQ 951 IDDUserGuide En

    17/39

    Use the Complete Linked Tasksoption on the Task Administrationview to compete multiple tasks. The

    Complete Linked Tasksoption identifies the current task as an instance of a Human task, and it completes all

    task instances that share the Human task as a parent. Informatica Data Director for Data Quality removes the

    tasks from the Inbox views of the users who worked on the tasks.

    When you complete all tasks instances for a Human task, the Human task releases the record data to the next

    stage in the workflow. The record data includes all updates and status changes made by the users who worked onthe tasks in Informatica Data Director for Data Quality.

    Exporting Task Data

    Export task data to a delimited file.

    1. Open the task in the Dataview.

    2. Open the Record Actions menu and select Export Data.

    The Export Datadialog box appears.

    3. Verify or edit the file name. By default, the Export Datadialog box uses the task name as the file name.

    4. Select or clear the option to export field names as the first row of the file.

    5. Click OK.

    6. Select an option to open or save the file.

    Assigning a Task to a User

    Assign tasks to users on the Task Administrationtab. Assign a task when a task has no assigned user, or when

    a user cannot complete a task on time.

    1. On the Tasksview, select the Task Administrationtab.

    2. Select a task from the task list.

    3. Click Reassign Task.

    4. Select a user to perform the task.

    Viewing Tasks Assigned to Other Users

    Use the Task Administrationoptions to open a task and view the progress made by a user in the task.

    1. On the Tasksview, select the Task Administrationtab.

    2. Select a task from the task list.

    3. Click View Task.

    The task opens in the Dataview.

    10 Chapter 2: Tasks

  • 8/13/2019 DQ 951 IDDUserGuide En

    18/39

    Viewing a List of Tasks in a Human Task

    To view the tasks that a workflow generated from a single Human task, use the Complete Linked Tasksoption.

    1. On the Tasksview, select the Task Administrationtab.

    2. Select a task from the task list.

    3. Click Complete Linked Tasks.

    4. Review the information for each task.

    The task list displays the following information for each task:

    Task ID

    Name of the task

    Task type

    Task owner

    Due date

    Status

    Do not click OKin the task list. If you click OK, you advance all tasks to the next stage in the workflow and removeall tasks from the Inbox of each owner.

    Completing Multiple Tasks

    To complete all tasks that have a common Human task as a parent, use the Complete Linked Tasksoption.

    When you complete a set of tasks, you end all work on the Human task and you advance the task records to the

    next stage in the workflow.

    Note: The action to complete the tasks does not update any record or status data.

    1. On the Tasksview, select the Task Administrationtab.

    2. Select a task from the task list.

    3. Click Complete Linked Tasks. The list of tasks opens.

    4. Verify that the list contains the tasks you want to complete.

    5. Click OK.

    If you open the Inbox or Dashboard after you complete the tasks, the Inbox or Dashboard may not display any

    change to the task list. To view the current list of tasks in the Inbox or Dashboard, refresh the browser window.

    Viewing a List of Tasks in a Human Task 11

  • 8/13/2019 DQ 951 IDDUserGuide En

    19/39

    C H A P T E R 3

    Exception Records

    This chapter includes the following topics:

    Exception Records Overview, 12

    Data View for Exception Records, 13

    Exception Record Correction, 14

    Exception Record Review, 15

    Exception Record Actions and Status, 15

    Exception Record Example, 16

    Editing Exception Records, 17

    Filtering Exception Tasks in the Data View, 17

    Setting the Status of an Exception Record, 17

    Reviewing Exception Records, 18

    Exception Records Overview

    An exception is arecord that may contain one or more dataerrors. A workflow adds a record to a correct

    exceptions task when other users or software processes cannot determine that the record data is correct.

    When you correct exception records, you examine the records in the task for errors that you can fix. When you

    review exceptionrecords, you verify the work done in an earlier task. After you edit or review a record, you update

    the record status. You set the task status to indicate the overall status of the records in the task.

    The workflow adds metadata columns to the records in the task. You use the metadata columns to update the

    status of each record.

    You can perform the following actions to correct exceptions:

    Edit a record.

    When you open a task, the Data view displays the records in the task and indicates the cells that contain data

    quality issues. You can select a cell and correct the error that it contains.

    Accept a record for storage in the database.

    You can mark a record as acceptable for permanent storage in the database. You can edit the record, or you

    can accept the current data in the record.

    Reject a record from the database.

    You can decide that a record does not belong in the database table. You mark the record for deletion from the

    table.

    12

  • 8/13/2019 DQ 951 IDDUserGuide En

    20/39

    Note: The task does not remove records from the table. The records are removed in a later task in the

    workflow or by a user in another application.

    Return a record for further processing.

    You can decide that a record cannot be fixed in manual review. For example, you cannot determine the

    correct state of the data in the record. You mark the record for additional processing in another application.

    Clear the status of a record.

    You can undo the status that you set for a record. For example, if you mark a record for rejection from the

    database, but later you change your mind, you can clear the status and select a different status.

    You can update or clear a record status at any time before you complete the task.

    Complete the task.

    You use the Task Actionsmenu to indicate that you have completed work on the task. The menu options

    define the next step for the task data. The options are determined by the previous task in the workflow.

    Data View for Exception RecordsThe Dataview shows the records in a task. When you open an exception task, the Dataview lists the records in

    the task and provides a set of options you can use to complete the task.

    Use the Customize Tableoption to organize the data columns in the Dataview. Use the Filteroption to filter the

    records that appear in the Dataview.

    The following table describes the options on the Dataview:

    Option Description

    Filter Exceptions Filters the list of records based on criteria you specify. You

    can filter records by issue type, priority, and status.

    Customize Table Selects the table columns to display.

    Edit Enables edit features for a record. You must click Edit before

    you make changes to a row.

    Undo Reverses the last change you made in a record.

    Redo Makes the change to a record that you previously reversed.

    Find Finds records that match the criteria you enter in the filter

    field for a column. The filter fields and the Filter Exceptions

    options work independently of each other.

    Task Actions Sets the status of the task. The workflow uses the Task

    Actions sett ing to determine the next step for the task da ta

    when you complete the task.

    Record Actions Sets the status of a record. You set the record status when

    you are finished work on the record. You can also undo or

    redo any edit to the record you select.

    Data View for Exception Records 13

  • 8/13/2019 DQ 951 IDDUserGuide En

    21/39

    Exception Task Filters

    You can filter the records by the types of issue they contain, the priority assigned to them, and the current status

    of the records.

    The following table describes the filter options:

    Option Description

    Type of issue Indicates the type of data quality issue that the workflow

    identified in the record data. The data quality issue indicates

    that the record is an exception in the database.

    Hold the cursor over the red icon in a table cell to view the

    issue name.

    Priority Indicates the priority that the workflow assigned to the data

    quality issue in the record.

    Status Indicates the status of the record in the current task, based on

    the data quality of the record. You can choose from the

    following status options:- Accepted. Records accepted for storage in the database.

    - Rejected. Records rejected as unsuitable for storage in the

    database.

    - Reprocessed. Records that need further analysis in

    another application.

    - Empty. Records with no current status.

    Review Indicates the review status of the record in the current task.

    You can choose from the following review options:

    - Reviewed. Records that are marked as reviewed.

    - Empty. Records that are not marked as reviewed.

    Exception Record Correction

    The records in a correct exceptions task contain data quality issues that an earlier data process has identified. The

    data quality issues may or may not indicate an error in the data.

    When you work in a correct exceptions task, you complete one or more of the following steps:

    Verify that the records contain an error. If the record does not contain an error, you can accept the record for

    storage in the table.

    Update the records with correct data. If you can update a record with correct data, you can accept the record

    for storage in the table.

    Identify records that cannot be used by the business. If you cannot update a record to a usable form, you can

    reject the record.

    Update the status of a record. The status you set determines how the workflow processes the record when the

    task completes.

    Set the task status. You select the status that best represents the current state of the record data. Set the task

    status when you finish work on the task data.

    14 Chapter 3: Exception Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    22/39

    Exception Record Review

    When you perform a task to review exception record data, you validate the work done by another user in an earlier

    task.

    The steps to complete the review task are similar to the steps to correct the data. In a review task, you can verify

    or undo the work performed in the earlier task. You can find both types of task in your Inbox.

    When you review exception records, you examine the changes made by the previous user and the status assigned

    to each record. Consider the following questions:

    Are the data updates correct in each record?

    Does the record status reflect the current state of the record?

    The level of data accuracy and completeness in the record must match the record status. For example, if a record

    contains errors in a review task, the record status may indicate that the record requires additional processing, or

    that the record is unfit for storage in the database. When you fix update data in a record, you must also verify that

    the record status is accurate.

    Exception Record Actions and Status

    You use the Record Actionsmenu options to update record data and to set the status of a record. When you set

    the record status, you update a metadata field that other users or processes can use after the task completes.

    The Record Actionsmenu displays the following options:

    Menu Option Description

    Edit Enables edit features for a record. You must click Edit before

    you make changes to a row.

    Undo Reverses the last change you made in a record.

    Redo Makes the change to a record that you previously reversed.

    Comment Adds a comment that you type to the audit trail for the record.

    Reprocess Record Adds a value to the record to indicate that the record must be

    reprocessed by another application.

    Accept Record Adds a value to the record to indicate th at the record is

    acceptable for storage in the database.

    Reject Record Adds a value to the record to indicate that the record can be

    deleted from the database.

    Clear Record Status Clears the status indicator that you set or another user set for

    the record.

    Approve Record Edi t Review tasks o nly. A dds a value to the record to confi rm the

    edit made to the record.

    Exception Record Review 15

  • 8/13/2019 DQ 951 IDDUserGuide En

    23/39

    Menu Option Description

    Reject Record Edit Review tasks only. Adds a value to the record to reject the

    edit made to the record.

    Mark as Reviewed Correct tasks only. Confirms that you have reviewed therecord.

    Clear Reviewer Status Clears the status indicator set for the record.

    Export Data Exports all record and task data from the task in a delimited

    file.

    View Audit Trail Opens the audit trail view for the task.

    Close Closes the task.

    Note: When you approve or reject an edit in a review task, you add a status indicator to the record. The status

    update does not update the data in other fields in the record. The next steps in the workflow or data project

    determine any further update performed on the record.

    Exception Record Example

    You are part of a team of data stewards at a retail organization. Your role is to maintain the data quality of a set of

    customer account records. You are concerned that the records in the data set contain errors.

    A member of your team uses the Developer tool to evaluate the accuracy of the customer account data. The

    developer creates a workflow with a Mapping task and a Human task. The Mapping task categorizes the data

    according to different levels of data accuracy. In some cases, the mapping cannot verify the accuracy of the

    records. The workflow passes the unresolved records to the Human task, and the Human task creates taskinstances for Informatica Data Director for Data Quality users.

    When you open a task in Informatica Data Director for Data Quality, the Dataview displays the records assigned

    to you. The task uses red indicators to identify the cells that contain problem data. You update every cell that

    contains data that you can fix.

    If you determine that the record is correct in its current state, select Accept Record. If you cannot edit the record,

    select Reprocess Record. If the record does not belong in the table, select Reject Record.

    Use the Task Actionsmenu to set the task status when you complete work on the task. You can set the task

    status at any time. Select the task status that represents the state of the records in the task and the type of data

    operation that the records now require. The workflow may define a single task status that moves the records to the

    next task in the workflow.

    Consider the following factors when you examine the data:

    A record may contain no errors. The purpose of the task is to evaluate data qual ity in cases where a software

    process was unable to do so.

    You update the record status when you finish work on the highlighted cells in the record. You must update the

    record status whether you edit the record or not.

    16 Chapter 3: Exception Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    24/39

    Editing Exception Records

    When you open a task that contains exceptions, the Dataview uses a red icon to identify data values that may

    contain errors. Examine the values and correct any error you find.

    You can edit records when you correct exceptions and review exceptions.

    1. Open the task in Dataview, and click Edit.

    2. In any record, select a data value that specifies an error.

    3. Enter the correct data value.

    4. Click Save.

    The icon applied to the data value changes from red to green.

    5. Set the record status to Accept Record. The status indicates that the record is now acceptable for storage in

    the database.

    Repeat the steps for other records in the task.

    Note: You may not know the correct data values for every record in the task. If you cannot edit a record, set the

    status to Reprocess Record. If you determine that the record is not acceptable in any form, set the status to

    Reject Record.

    Filtering Exception Tasks in the Data View

    You can filter the data records on the types of issue they contain, on the priority of the data quality issue in the

    record, and the status of the record. By default, the task does not apply a filter to the data.

    Use the Filteroption to filter the records that display in the Data view.

    1. Open a correct exception or review exception task.

    2. In the Dataview, click Filter.The Filterdialog box opens.

    3. Select the filter criteria to apply to the task data.

    4. Click Applyto apply the filter to the clusters in the task.

    Setting the Status of an Exception Record

    When you complete work on a record, you set the record status. You do not need to edit a record to set the status.

    1. Click Edit in the Dataview.2. Select a record.

    3. Open the Record Actionsmenu and select the status you want to apply to the record.

    To indicate that a record contains correct business information and can remain in the database, select

    Accept Record.

    To indicate that a record does not contain usable information and can be deleted from the database, select

    Reject Record.

    Editing Exception Records 17

  • 8/13/2019 DQ 951 IDDUserGuide En

    25/39

    To indicate that the record needs further processing before it can be returned to the database, click

    Reprocess Record.

    You can use the Clear Record Statusoption on the Record Actionsmenu to clear the status you set.

    Reviewing Exception Records

    When you review the output of a task that corrects exceptions, you validate the status of each record. The status

    determines how the records are treated in the next stage of the workflow. The review task ends when you review

    all records and set the task status.

    Perform the following steps for all records in the task:

    1. Open the task in Dataview.

    2. Verify that the status of each record represents the information in the record.

    For example, a record may be marked for deletion from the database, but you may decide that the record

    contains usable information. You update the status so that the record is reprocessed and not deleted. If a record is marked for storage in the database but needs additional work, click Editand clear the record

    status.

    If you identify an error in the record, click Editand update the record. When you are satisfied that the

    record is correct, you can update the record status.

    3. After you review all records, set the task status.

    18 Chapter 3: Exception Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    26/39

    C H A P T E R 4

    Duplicate Records

    This chapter includes the following topics:

    Duplicate Records Overview, 19

    Data View for Duplicate Records, 20

    Duplicate Record Correction, 21

    Duplicate Record Review, 22

    Duplicate Record Actions and Status, 22

    Duplicate Record Example, 23

    Filtering Duplicate Tasks in the Data View , 24

    Editing a Cluster in a Duplicate Task, 24

    Creating a Cluster, 25

    Finding Duplicate Records in Multiple Clusters, 25

    Setting the Status of a Cluster, 25

    Reviewing Duplicate Records, 26

    Duplicate Records Overview

    When you correct duplicate records, you edit the contents of one or more clusters. A cluster contains records that

    may or may not be duplicates of each other.

    If two or more records are duplicates, you consolidate the records into a single record called the preferred record.

    If a record is not a duplicate of another record in the cluster, you remove it from the cluster. You can move a

    record from one cluster to another. You can create a cluster with a single record if the record is unique. The task is

    complete when you edit all clusters.

    A cluster contains two or more records. Each cluster identifies a preferred record. The preferred record contains

    the most accurate representation of the information in the cluster. By default, Informatica Data Director for Data

    Quality selects the first record in the cluster as the preferred record. To update a cluster, replace data values in

    the preferred record with more accurate values from any other record in the cluster. When the preferred record is

    complete, mark the cluster as reviewed and start work on the next cluster in the task.

    Note: Two or more records are duplicates when they contain the same business information. Records can contain

    similar data but not represent the same information to the business.

    Perform the following actions to correct duplicates:

    19

  • 8/13/2019 DQ 951 IDDUserGuide En

    27/39

    Examine and edit the cluster records.

    When you open a task, the Dataview displays the records in the first cluster and nominates a record as the

    preferred record. Examine the values in each record in the cluster. If you find values that contain more

    accurate information than the preferred record values, replace the preferred record values.

    Find records in other clusters.

    If you expect that the task data set contains duplicate records across more than one cluster, search for

    records in other clusters and display the clusters together onscreen. If duplicate records exist across the

    clusters, move records from one cluster to another.

    Create a cluster, and identify unique records.

    A cluster may contain records that you can consolidate into two unique preferred records. In this case, create

    a cluster and define a preferred record for each cluster.

    A cluster may contain a record that is not a dupl icate of any other record in the cluster. In this case, create a

    cluster and add the record to it. The new cluster contains a single record.

    Identify redundant records.

    When you complete the preferred record in a cluster, mark the cluster as reviewed. Informatica Data Director

    for Data Quality marks the preferred record for storage in the database table and marks the remaining recordsas redundant. The database deletes the redundant records in a later stage of the workflow.

    Change the status of the cluster.

    You can mark a cluster as reviewed, and you can undo the status that you set for cluster. For example, if you

    create a preferred record, but later you change your mind, you can clear the status.

    Complete the task.

    To indicate that you completed work on the task, use the Task Actionsmenu options. The options determine

    the possible next steps for the task data. The workflow defines the options.

    Data View for Duplicate RecordsThe Dataview shows the clusters in the task and provides a set of options you can use to complete the task.

    When you open a duplicate task, the Dataview organizes the clusters on a series of tabs. Each tab displays the

    records in a single cluster.

    Use the Customize Tableoption to organize the data columns in the Dataview. Use the Filteroption to filter the

    clusters that appear in the Dataview.

    The following table describes the options on the Dataview:

    Option Description

    Customize Table Selects the table columns to display.

    Filter Filters the clusters that display in the Data view. You can use

    filters to view clusters with a specified status or to find records

    that contain specified values.

    Edit Enables edit features for a cluster. You must click Edit before

    you make changes to a row.

    20 Chapter 4: Duplicate Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    28/39

    Option Description

    Undo Reverses the last change you made in a record.

    Redo Makes the change to a record that you previously reversed.

    Task Actions Sets the status of the task. The workflow uses the Task

    Actions sett ing to determine the next step for the task. Select

    a task action when all clusters in the task are ready for the

    next stage in the workflow.

    Cluster Actions Sets the status of a cluster. Set the status when you finish

    work on the cluster. You can also undo or redo any update to

    the cluster you select.

    Duplicate Task Filters

    You can sort the clusters that appear in the Dataview. You can also filter the view to display only the clusters that

    contain a data value that you specify.

    Use the Filteroptions to filter and sort the clusters. You sort the clusters by the status assigned to each cluster by

    the user who worked on the task.

    The following table describes the status options:

    Option Description

    Accepted Ident ifies cluste rs marked as accepted fo r storage in the da tabase .

    Rejec ted Ident if ies c lusters marked as unsuitable for storage in the database.

    Reviewed Ident if ies clusters tha t a re marked as reviewed.

    Duplicate Record Correction

    The clusters in a correct duplicate task contain records that may contain duplicate information.

    When you work in a correct duplicates task, complete one or more of the following steps:

    Verify that the records in the cluster represent different versions of the same record.

    If the cluster records are duplicates of a single record, select the most accurate data values in each record and

    add the values to the preferred record.

    If the cluster contain information from more than one record, create a cluster for each unique record andconfigure a preferred record in each cluster.

    If the cluster contains a record that does not match the other records, create a cluster and add the record to the

    cluster. The new cluster contains a single preferred record.

    Find records in other clusters that may be duplicates of a record in the current cluster. If you believe that

    duplicate records exist across more than one cluster, you can search the task data set for records that match

    data values you specify.

    Update the status of a cluster. The status you set determines how the workflow processes the preferred record.

    Duplicate Record Correction 21

  • 8/13/2019 DQ 951 IDDUserGuide En

    29/39

    Set the task status. Select the status that best represents the current state of the cluster data. Set the task

    status when you finish work on the task.

    Duplicate Record ReviewWhen you perform a task to review the records in a cluster, you validate the work done by another user in an

    earlier task.

    The steps to complete the review task are similar to the steps performed in the task that identified the preferred

    record data. In a review task, you verify the work performed in the earlier task.

    Note: You can find both types of task in your Inbox.

    When you review cluster data, examine the preferred record defined by the previous user and the other records in

    the cluster. Consider the following questions:

    Does the preferred record represent the most accurate version of the records in the cluster? Update the

    preferred record if you find more accurate data in another record in the cluster. Do the other records in the cluster include any record that the business may require? Create a cluster and add

    non-redundant records to the new cluster. Then define a preferred record in the new cluster.

    Duplicate Record Actions and Status

    You use the Cluster Actionsmenu options to configure one or more preferred records and to set the status of

    one or more clusters. When you set the cluster status, you update a metadata field that other users or processes

    can use after the task completes.

    The Cluster Actionsmenu displays the following options:

    Menu Option Description

    Find Cluster(s) Finds records in other clusters that contain data values you

    specify.

    Create Cluster Creates an empty cluster below the current cluster in the Data

    view.

    Use the Move Records to move a record between clusters.

    Confirm Cluster Review Correct tasks only. Confirms that you have reviewed the

    cluster and do not plan further changes.

    Clear Cluster Status Clears the status indicator for the cluster.

    Export Data Exports all cluster and task data from the task in a delimited

    file.

    View Audit Trail Opens the audit trail view for the task.

    Close Closes the task.

    22 Chapter 4: Duplicate Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    30/39

    Menu Option Description

    View Comment Review tasks only. Opens any comment added to the cluster.

    Mark as Accepted Review tasks only. Adds a value to the cluster to indicate that

    you accept the cluster update operation performed in anearlier task.

    Mark as Rejected Review tasks only. Adds a value to the cluster to indicate that

    you reject the cluster update operation performed in an earlier

    task.

    Note: When you mark a cluster as accepted or rejected in a review task, you add a status indicator to the cluster.

    The status update does not update the data in the cluster records. The next steps in the workflow or data project

    determine any further update of the cluster records.

    Duplicate Record ExampleAs a data steward in a retai l organization, you are concerned that the customer account data includes duplicate

    records.

    A member of your team uses the Developer tool to evaluate the levels of duplication in the customer account data.

    The developer creates a workflow with a Mapping task and a Human task. The Mapping task sorts the records into

    clusters according to the levels of similarity between them. Some records are similar but non-identical, and the

    mapping cannot determine if the records are genuine duplicates. The workflow passes the unresolved records to

    the Human task, and the Human task creates task instances for Informatica Data Director for Data Quality users.

    When you open a task in Informatica Data Director for Data Quality, the Dataview displays the clusters assigned

    to you. You review the records in the cluster and identify the data values that most accurately identify the

    customer. Each cluster includes a default preferred record. If the preferred record does not contain the mostaccurate data values, update the preferred record with the values from other records in the cluster. You must also

    determine if the records in the cluster represent one or more customer account.

    Consider the following questions when you examine the data:

    What type of business rules apply to the data? For example, can a table contain more than one account record

    for a customer? The answers can help you determine the primary key columns in the data. Primary key

    columns must contain unique values.

    By default, the task selects the first record in every cluster as the preferred record. Do you need to update the

    current preferred record? Update column values one by one to build the preferred version of the account

    information.

    Do you need to search outside the current cluster for duplicate records? You can search the task for records

    that share data values with the current cluster. You can add records from one cluster to another cluster. You

    can create a cluster with one more records in the current cluster.

    Do you want to edit the data in a record? You may want to edit a record if you cannot create a preferred record

    that you think is valid. However, you cannot edit data in a cluster. A workflow does not evaluate the accuracy of

    a record when it adds it to a cluster. The purpose of the task is to define a single record that best represents

    the information in the data set.

    When you finish work on the cluster, you open the Cluster Actionsmenu and set the cluster status to Confirm

    Cluster Review. You update the cluster status whether you edit the preferred record or not.

    Duplicate Record Example 23

  • 8/13/2019 DQ 951 IDDUserGuide En

    31/39

    Use the Task Actionsmenu to set the task status when you complete work on the task. You can set the task

    status at any time.

    The workflow defines the task status options that you see in the task. The task options indicate the paths that the

    records can take through the workflow. Select the task status that represents the overall state of the records in the

    task and the type of data operation that the records now require. The workflow may define a single task status that

    moves the records to the next task in the workflow.

    Filtering Duplicate Tasks in the Data View

    Use the Filteroption to filter the duplicate clusters that display in the Dataview.

    By default, the task does not apply any filter.

    1. Open a correct duplicate or review duplicate task.

    2. In the Dataview, click Filter. The Filter dialog box opens.

    3. Enter a data value to use as a column filter. You select a data column and you enter a value that must occur

    in the column.

    When you apply the filter, the Data view displays the clusters in which one or more records contains the data

    value in the column you specify.

    4. Use the Up and Down arrows to sort the clusters by status.

    If you do not sort the clusters, the Data view displays the clusters in numerical order by cluster ID.

    5. Click Applyto apply the filter to the clusters in the task.

    Editing a Cluster in a Duplicate Task

    When you open a task that contains clusters, the clusters in the task appear on a series of tabs under the Data

    view. The first cluster is open.

    The cluster identifies the fist record in the cluster as the preferred record by default. Examine the records in the

    cluster and select any data value that you want to add to the preferred record. You can select values from multiple

    records.

    1. Compare the preferred record with the other records in the cluster.

    Identify the most accurate values in each column in the cluster.

    2. Click Editin the Dataview.

    3. Click a value in a record to add that value to the preferred record.

    Repeat the steps for all values that you want to add to the preferred record. When you complete work in a cluster,

    update the cluster status.

    24 Chapter 4: Duplicate Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    32/39

    Creating a Cluster

    Create a cluster when the current cluster contains information that identifies more than one non-duplicate record.

    When you create a cluster, verify the preferred record in the new cluster.

    1. On the Cluster Actions menu, select Create Cluster.

    The new cluster appears in the Dataview below the current cluster.

    2. Select a record to add to the new cluster.

    3. Click Move Record.

    The record becomes the preferred record in the new cluster.

    4. Move any other record that matches the preferred record in the new cluster.

    If the new cluster contains a single record, the task treats the preferred record as a unique record.

    Finding Duplicate Records in Multiple ClustersUse the Find Cluster(s) option to find records that other clusters that may match records in the current cluster. You

    specify a data value to search for and the record column that must store the data value.

    1. In the Dataview, select Find Cluster(s).

    The Find dialog box opens.

    2. Enter the data value you want to find. You enter the full data value as it appears in the record column, or you

    enter a wildcard value, such as an asterisk.

    3. Select the column that contains the data value to search for.

    4. Click Find.

    The search operation returns all records that contain the value in the column you specify.

    5. Select any record in the search results that matches a record in the open cluster. You can use the CTRL key

    to select multiple records.

    The Dataview displays the clusters that contain the records you select. You can use the Move Record option

    to move a record from one cluster to the other.

    Setting the Status of a Cluster

    When you complete work on a cluster, you set the cluster status. You do not need to update the preferred record

    in the cluster before you set the status.

    1. Click Edit in the Dataview.

    2. Open the Record Actionsmenu and select the Reviewedoption.

    You can use the Clear Record Statusoption on the Cluster Actionsmenu to clear the status you set.

    Creating a Cluster 25

  • 8/13/2019 DQ 951 IDDUserGuide En

    33/39

    Reviewing Duplicate Records

    When you review the output of a task that corrected duplicate records, you validate that the preferred records

    represent the best version of the data in the clusters. You review one cluster at a time. The review task ends when

    you review all clusters and set the task status.

    Perform the following steps for all clusters in the task:

    1. Open a cluster in Dataview.

    2. Compare the preferred record with the other records in the cluster.

    3. Verify that the preferred record contains the most accurate version of the data in the cluster.

    If a cluster is classified as reviewed but needs additional work, click Editand clear the cluster status.

    If you identify an error in the preferred record, click Editand update the preferred record. When you are

    satisfied that the cluster is correct, you can update the cluster status.

    After you review the records in al l clusters, you can set the task status.

    26 Chapter 4: Duplicate Records

  • 8/13/2019 DQ 951 IDDUserGuide En

    34/39

    C H A P T E R 5

    Audit Trail Operations

    This chapter includes the following topics:

    Audit Trai l Operations Overview, 27

    Audit Trai l Data, 27

    Audit View Fil ters, 28

    Audit View, 29

    Filtering Records in the Audit View, 30

    Audit Trail Operations Overview

    Informatica Data Director for Data Quality stores audit trail data for all updates made in a task. Use the audit trail

    data to review the changes to the task data.

    You can perform the following operations on an audit trail:

    View the list of task updates since the task was created.

    Filter the audit trail by date, user name, and by type of update.

    Add or remove columns from the audit trail view.

    When a user edits a value in a record, the audit trail adds an edit tool icon to the value. Place the cursor over the

    icon to see the earlier state of the value.

    Note: When you view the audit trail for a duplicate task, the audit trail lists the records that users updated in the

    task. The audit trail does not display cluster data.

    Audit Trail Data

    Each row in the audit trail represents a single data update. If you make multiple updates to a record, the audit trail

    adds an entry for each update. The audit trail organizes record updates in chronological order. If a task contains

    no updates, the audit trail is empty.

    An audit trail d isplays the record data columns that you can edit in the task. In addition, an audit trail displays

    metadata columns that identify the user who updated the task and the date and type of update.

    27

  • 8/13/2019 DQ 951 IDDUserGuide En

    35/39

    The following table describes the metadata columns in an audit trail:

    Column Name Description

    Updated By The user who updated the record.

    Updated The date of the record update.

    Comment Any comment added by a user.

    Status Any status update made by a user.

    Review Any review status set by a user.

    Use the Customize Tableoption to organize the data columns that display in the Auditview.

    Audit View FiltersYou can use the Filteroption to filter the records that display in the Auditview.

    The following table describes the filter options:

    Option Description

    From

    To

    The date range for the updates you want to view.

    User The user who performed the updates you want to view.

    Status The status of the record in the current task, based on the dataquality of the record.

    Review The review status of the record in the current task. You can

    choose from the following review options:

    - Reviewed. Records that are marked as reviewed.

    - Empty. Records that are not yet reviewed.

    - Cleared. Records in a previously reviewed state that a user

    updated to unreviewed.

    28 Chapter 5: Audit Trail Operations

  • 8/13/2019 DQ 951 IDDUserGuide En

    36/39

    Status Filter Options

    The status options you can use in the Audit view depend on the type of task you open.

    The following table describes the status options you can set as filters in the Audit view:

    Status Task Type Description

    Accepted Except ion Records accep ted for sto rage i n the

    database.

    Cleared Exception Records with a status update that a

    user deleted.

    Empty Duplicate

    Exception

    Records with no status update.

    Moved into cluster Duplicate Records that moved into the specified

    cluster.

    Moved out of cluster Duplicate Records that were moved out of thespecified cluster

    Rejected Exception Records rejected as unsuitable for

    storage in the database.

    Reprocessed Exception Records that need further analysis in

    another application.

    Audit View

    An audit trail opens in the Auditview. You can open an audit trail from the Dataview or from the Inbox.

    When you open an audit trail from the Dataview, the audit trail displays the user updates in the current task.

    When you open an audit trail from the Inbox, the audit trail displays user updates for the task you select.

    Opening an Audit Trail from the Data View

    1. Open a task in the Dataview.

    2. In an exception task, select Record Actions.

    In a duplicate task, select Cluster Actions.

    3. Click View Audit Trail.

    The audit data displays in the Auditview.

    Opening an Audit Trail from the Inbox

    1. Select the Tasksview, and open the Inbox.

    2. Select a task.

    3. Click View Audit Trail.

    Aud it Vie w 29

  • 8/13/2019 DQ 951 IDDUserGuide En

    37/39

    The audit data displays in the Auditview.

    Filtering Records in the Audit View

    Use the Filteroption to filter the records that display in the Audit view.

    By default, the audit trail does not apply any filter.

    1. Open a task in the Audit view, and click Filter. The Filterdialog box opens.

    2. Select the filter criteria to apply to the task data.

    3. Click Apply to apply the filter to the clusters in the task.

    30 Chapter 5: Audit Trail Operations

  • 8/13/2019 DQ 951 IDDUserGuide En

    38/39

    I N D E X

    Aaudit trails

    reading an audit trail 27

    Audit view

    filtering audit records 28, 30

    Ccluster

    creating a cluster 25

    Find Cluster(s) option 25

    updating cluster status 25

    DDashboard view

    columns 3

    options 3

    Data view

    duplicate tasks 20

    exception tasks 13

    filtering the cluster data view 21, 24

    filtering the exception data view 14, 17

    duplicate records

    correct duplicates task 19

    creating a cluster 25editing duplicate record clusters 24

    searching clusters 25

    steps to correct duplicates 21

    table metadata 8

    updating cluster status 25

    Eexception records

    correct exceptions task 12

    editing exception records 17

    steps to correct exceptions 14

    table metadata 7

    updating record status 17

    IInbox tab

    columns 3

    options 3

    task lists 3

    Inbox view

    options 4

    Informatica Data Director for Data Quality

    logging in 2

    overview 1

    user interface 2

    MModel repository 1

    Ooptions

    cluster actions 22cluster status 22

    exception record status 15

    Task Administration tab 4

    Ppreferred record

    creating a preferred record 24

    Rreview task

    review duplicates 26

    review exceptions 18

    steps to review clusters 22

    steps to review exceptions 15

    Ttask

    correct duplicates 19

    correct exceptions 12

    description 1

    exporting task data 7, 10

    Human task 1, 6

    Mapping task 1

    task instances 1, 6

    task operations 5

    task status 7

    tasks and workflows 1types of task 6

    task administration

    assigning a task to a user 10

    viewing tasks assigned to others 10

    task administration options 9

    Task Administration tab 2

    Tasks view

    Dashboard 3

    Inbox 3

    Task Administration 4

    31

  • 8/13/2019 DQ 951 IDDUserGuide En

    39/39

    Vviews

    Dashboard view 2