DQ 951 IDDUserGuide En
-
Upload
talbi-hassan -
Category
Documents
-
view
228 -
download
0
Transcript of DQ 951 IDDUserGuide En
-
8/13/2019 DQ 951 IDDUserGuide En
1/39
Informatica Data Director for Data Quality (Version 9.5.1)
User Guide
-
8/13/2019 DQ 951 IDDUserGuide En
2/39
Informatica Data Director for Data Quality User Guide
Version 9.5.1December 2012
Copyright (c) 1998-2012 Informatica. All rights reserved.
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any forby any means (electronic, photocopying, recording or otherwise) w ithout prior consent of Informatica Corporation. This Software may be protected by U.S. and/or internatioPatents and other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided iDFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applica
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica OnDemand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and InformatMaster Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other compaand product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rightsreserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rightsreserved.Copyright Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright MetaIntegration Technology, Inc. All rights reserved. Copyright Intalio. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. Arights reserved. Copyright DataArt, Inc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved.Copyright Rogue Wave Software, Inc. All rights reserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright
Glyph & Cog, LLC. All rights reserved. Copyright Thinkmap, Inc. All rights reserved. Copyright Clearpace Software Limited. All rights reserved. Copyright InformationBuilders, Inc. All rights reserved. Copyright OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All righreserved. Copyright International Organization for Standardization 1986. All rights reserved. Copyright ej-technologies GmbH. All rights reserved. Copyright JaspersoCorporation. All rights reserved. Copyright is International Business Machines Corporation. All rights reserved. Copyright yWorks GmbH. All rights reserved. CopyrightLucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright Daniel Veillard. All rights reserved. Copyright Unicode, Inc.
Copyright IBM Corp. All rights reserved. Copyright MicroQuill Software Publishing, Inc. All rights reserved. Copyright PassMark Software Pty Ltd. All rights reserved.Copyright LogiXML, Inc. All rights reserved. Copyright 2003-2010 Lorenzi Davide, All rights reserved. Copyright Red Hat, Inc. All rights reserved. Copyright The Boof Trustees of the Leland Stanford Junior University. All rights reserved. Copyright EMC Corporation. All rights reserved. Copyright Flexera Software. All rights reserved
This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writingsoftware distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations under the License.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at httwww.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but notlimited to the implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvineand Vanderbilt University, Copyright () 1993-2006, all rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistributionthis software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.
This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, . All Rights Reserved. Permissions and limitations regarding this
software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or withoufee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 () MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availabat http://www.dom4j.org/ license.html.
The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http://dojotoolkit.org/license.
This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thsoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.
This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at httpwww.gnu.org/software/ kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & WirelessDeutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subto terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright
1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.
This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// www.eclipse.org/org/documents/epl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.orghttp://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- licenagreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.orglicense.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/
-
8/13/2019 DQ 951 IDDUserGuide En
3/39
-
8/13/2019 DQ 951 IDDUserGuide En
4/39
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i i i
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Chapter 1: Introduction to Informatica Data Director for Data Quality. . . . . . . . . . . . . . . . 1
Informatica Data Director for Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Tasks and Workflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Informatica Data Director for Data Quality User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Logging In to Informatica Data Director for Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Tasks and Dashboard Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Task Administration Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2: Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Tasks Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Human Tasks andTask Instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Task Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Task Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Task Ownership. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Task Data Export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Exception Task Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
DuplicateTask Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Task Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Multiple Task Completion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Exporting Task Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Assigning a Task to a User. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Viewing Tasks Assigned to Other Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Viewing a List of Tasks in a Human Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Completing Multiple Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3: Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Exception Records Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Data View forException Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table of Contents i
-
8/13/2019 DQ 951 IDDUserGuide En
5/39
Exception Task Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Exception Record Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Exception Record Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Exception Record Actions and Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Exception Record Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Editing Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Filtering Exception Tasks in the Data View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Setting the Status of an Exception Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Reviewing Exception Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 4: Duplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Duplicate Records Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Data View forDuplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Duplicate Task Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Duplicate Record Correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Duplicate Record Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Duplicate RecordActions and Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Duplicate Record Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Filtering Duplicate Tasks in the Data View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Editing a Cluster in a Duplicate Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Creating a Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Finding Duplicate Records in Multiple Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Setting the Status of a Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Reviewing Duplicate Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 5: A u d i t T r a i l O p e r a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 7
Audit Trail Operations Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Audit Trail Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Audit View Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Status Filter Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Audit View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Opening an Audit Trail from the Data View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Opening an Audit Trail from the Inbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Filtering Records in the Audit View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ii Table of Contents
-
8/13/2019 DQ 951 IDDUserGuide En
6/39
Preface
The Informatica Data Director for Data Quality User Guidedescribes the features and functionality of Informatica
Data Director for Data Quality.
Informatica Data Director for Data Quality is a web-based application that data analysts use to perform tasks on
database tables. Use this guide if you are assigned a task in Informatica Data Director for Data Quality.
Informatica Resources
Informatica Customer Portal
As an Informatica customer, you can access the Informat ica Customer Portal site at
http://mysupport.informatica.com. The site contains product information, user group information, newsletters,
access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,
the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Product
Documentation, and access to the Informatica user community.
Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you have
questions, comments, or ideas about this documentation, contact the Informatica Documentation team through
email at [email protected]. We will use your feedback to improve our documentation. Let us
know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your product,
navigate to Product Documentation from http://mysupport.informatica.com.
Informatica Web Site
You can access the Informatica corporate web site at http://www.informatica.com. The site contains information
about Informatica, its background, upcoming events, and sales offices. You will also find product and partner
information. The services area of the site includes important information about technical support, training and
education, and implementation services.
Informatica How-To Library
As an Informatica customer, you can access the Informat ica How-To Library at http:/ /mysupport .informatica.com.
The How-To Library is a collection of resources to help you learn more about Informatica products and features. It
iii
http://www.informatica.com/http://mysupport.informatica.com/http://www.informatica.com/http://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/ -
8/13/2019 DQ 951 IDDUserGuide En
7/39
includes articles and interactive demonstrations that provide solutions to common problems, compare features and
behaviors, and guide you through performing specific real-world tasks.
Informatica Knowledge Base
As an Informatica customer, you can access the Informat ica Knowledge Base at http: //mysupport .informatica.com.
Use the Knowledge Base to search for documented solutions to known technical issues about Informatica
products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If
you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base
team through email at [email protected].
Informatica Multimedia Knowledge Base
As an Informatica customer, you can access the Informat ica Multimedia Knowledge Base at
http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files
that help you learn about common concepts and guide you through performing specific tasks. If you have
questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base
team through email at [email protected].
Informatica Global Customer Support
You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires
a user name and password. You can request a user name and password at http://mysupport.informatica.com.
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America Europe / Middle East / Africa Asia / Australia
Toll Free
Brazil: 0800 891 0202
Mexico: 001 888 209 8853
North America: +1 877 463 2435
Toll Free
France: 0805 804632
Germany: 0800 5891281
Italy: 800 915 985Netherlands: 0800 2300001
Portugal: 800 208 360
Spain: 900 813 166
Switzerland: 0800 463 200
United Kingdom: 0800 023 4632
Standard Rate
Belgium: +31 30 6022 797
France: +33 1 4138 9226
Germany: +49 1805 702 702
Netherlands: +31 306 022 797United Kingdom: +44 1628 511445
Toll Free
Austra lia: 1 800 151 830
New Zealand: 09 9 128 901
Standard Rate
India: +91 80 4112 5738
iv Preface
http://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/ -
8/13/2019 DQ 951 IDDUserGuide En
8/39
C H A P T E R 1
Introduction to Informatica Data
Director for Data Quality
This chapter includes the following topics:
Informatica Data Director for Data Quality Overview, 1
Tasks and Workflows, 1
Informatica Data Director for Data Quality User Interface , 2
Informatica Data Director for Data Quality Overview
Informatica Data Director for Data Quality is a web-based application that you can use to view and update records
in a database table.
You use Informatica Data Director for Data Quality to view and update records that were processed in other
Informatica applications. The records may contain errors, or they may contain duplicate information.
Informatica Data Director for Data Quality organizes the records into units of work called tasks. A task identifies a
set of records and specifies the operations you can perform to resolve data quality issues in the records.
When you log in to Informatica Data Director for Data Quality, you view the list of tasks that you are assigned in
the Inbox for your account. You select a task, and you work on the data records that it contains. After you view or
edit a record in a task, you update the status of the record to indicate if it is fit for storage in the database. When
you complete work on a task, Informatica Data Director for Data Quality removes the task from your Inbox.
Tasks and Workflows
A workflow is a set of events, tasks, and decisions that Informatica users define for a data set.
Informatica users create workflows in the Informatica Developer application. Informatica stores a workflow as an
object in a database called the Model repository. When a workflow runs, it creates the tasks you perform in
Informatica Data Director for Data Quality.
A workflow can contain different types of task. At the workflow level, the type of tasks that you perform in
Informatica Data Director for Data Quality are called Human tasks, because each task needs human interaction to
complete.
1
-
8/13/2019 DQ 951 IDDUserGuide En
9/39
A workflow that contains a Human task must also contain a Mapping task. A Mapping task runs an Informatica
data process called a mapping. A mapping applies data quality algorithms to data records and corrects data errors
where possible. When a workflow runs with a Mapping task, the mapping writes the corrected records to a
specified database table. The mapping writes records that contain unresolved data quality issues to another table.
The Human task reads the table of unresolved records and assigns the records to Informatica Data Director for
Data Quality users for manual review.
Informatica Data Director for Data Quality User Interface
The Informatica Data Director for Data Quality web interface contains views to display Inbox and task data.
The interface has the following views:
Dashboard. Displays the list of tasks assigned to you and tasks that you completed.
Data. Displays any currently open task.
Tasks. Displays the Inbox. The Inbox lists the list assigned to you. If the Human task identifies you as a
business administrator for task instances, the Tasksview displays the tasks that you administer in the TaskAdministrationview.
Audit. Displays an audit trail for changes you made in a task.
Logging In to Informatica Data Director for Data Quality
Use a web browser to log in to Informatica Data Director for Data Quality.
You need the Informatica Data Director for Data Quality URL and a user name and password. An administrator
defines the URL and user credentials in the Informatica Administrator application. Ask an administrator for a user
name, password, and URL.
1. Open a web browser.
2. Enter the URL for Informatica Data Director for Data Quality.The URL has the following structure:
http://[host_name]:[port_number]/IDDService/login.jsp
3. Enter the user name and password for your account.
After you log in, select the Tasksview and open the Inboxand view the list of tasks assigned to you.
Tasks and Dashboard Views
The Tasksand Dashboardviews display the tasks that are assigned to you.
The views display the tasks in a table. When you select the Tasksview, select the Inboxtab to view the table of
tasks.
The Tasksview also displays a Task Administrationtab. The Task Administrationtab displays the Tasks view
options and also displays options that allow you to release tasks and assign tasks to other users.
Use the Tasksand Dashboardviews to perform the following operations:
Select the Tasksview or Dashboardviews to view the list of tasks you need to complete. On the Tasksview,
you view tasks in the Inbox.
2 Chapter 1: Introduction to Informatica Data Director for Data Quality
-
8/13/2019 DQ 951 IDDUserGuide En
10/39
If you are a business administrator, select the Tasksview and then select the Task Administrationtab to view
the list of tasks that you administer. You can open a task, you can release a task from a user, and you can
assign a task to a user.
Inbox
The Inboxstores the list of tasks that are assigned to you. Use the Inbox to view and open the tasks that you
need to complete.
When you are assigned a task, the task appears in the Inbox. Open a task from the Inbox and use the Data view
to verify or update the records in the task. When you complete the task, Informatica Data Director removes the
task from your Inbox.
Note: The Dashboardview and the Inboxtab show the same task list.
Inbox and Dashboard Options
The Inboxand Dashboardviews share a common set of options. You can use both views to manage the tasks
assigned to you.
The following table describes the options on the Inboxand DashboardViews:
Option Description
Filter On, Off Shows or hides the filter fields for the Inbox and Dashboard
columns.
Use the filters to search for tasks with common values in a
column, such as task name or task owner. You can also use
the filters to search for unique values, such as the task ID.
Refresh Updates the list of tasks.
Task Details Displays information for a task in a single dialog box. The
information includes the task name, type, creation date, due
date, and status.
Open Task Opens the task so you can edit records.
Release Task Releases the task for assignment to another user.
View Audit Trail Opens an audit trail that describes changes made to the
records in the task.
Inbox and Dashboard Columns
The Inboxand Dashboardviews list the task assigned to you and provide metadata about each task. The views
organize the task data and metadata in a series of columns.
The following table describes the columns on the Inboxand Dashboardviews:
Column Name Description
Task ID Unique identifier for the task in the workflow. At the workflow
level, the Task ID identifies the task instance in the Human
Informatica Data Director for Data Quality User Interface 3
-
8/13/2019 DQ 951 IDDUserGuide En
11/39
Column Name Description
task. The workflow stores the task ID value so that the ID is
unique across multiple runs of the workflow.
Task Title Task title.
Task Type Task Type. The types are correct exceptions, correct
duplicates, review exceptions, and review duplicates.
Owner Name of the user currently assigned to the task.
Due Date Scheduled date for completion of the task.
Status Status of the task. If the task has not reached the due date,
the status indicates the task is on time. If the task in
incomplete on the due date, the status indicates the task is
overdue.
Created Date the task was created.
Task Administration Options
If you are a business administrator, you can use the Tasks view to view the tasks that you administer. You can
perform any action from the Task Administrationtab that other users can perform from the Inbox or Dashboard.
Use the Task Administrationtab under the Tasks view to open the list of tasks. The Task Administrationtab
displays additional options for business administrators.
The following table describes the additional options on the Task Administration tab:
Option Description
Reassign Task Changes the owner of a task.
Complete Linked Tasks Completes all task instances that are linked to the current
task instance.
View Task Opens a task to view the work performed.
4 Chapter 1: Introduction to Informatica Data Director for Data Quality
-
8/13/2019 DQ 951 IDDUserGuide En
12/39
C H A P T E R 2
Tasks
This chapter includes the following topics:
Tasks Overview, 5
Human Tasks and Task Instances, 6
Task Types, 6
Task Status, 7
Task Data Export, 7
Task Administration, 9
Multiple Task Completion, 9
Exporting Task Data, 10
Assigning a Task to a User, 10
Viewing Tasks Assigned to Other Users, 10
Viewing a List of Tasks in a Human Task, 11
Completing Multiple Tasks, 11
Tasks Overview
The objective of a task is to verify that a set of data records is ready to move to the next stage in a workflow. The
steps you take tocomplete a task depend on the type of task and the condition of the records in the task.
You can performthe following steps in a task:
Correct exceptions.
Consolidate clusters of duplicate records to a single preferred record.
Update the status of a record or cluster.
Review the work done by another user, if the task reads data from an earlier task.
If you are a business administrator, you can manage the tasks that the Human task assigned to you foradministration.
5
-
8/13/2019 DQ 951 IDDUserGuide En
13/39
-
8/13/2019 DQ 951 IDDUserGuide En
14/39
Task Status
The task status defines the next stage in the workflow for the records in the task. For example, when you complete
a task to correct exception records, the records may pass to another task that reviews the records.
When you complete work on a task, you use the Task Actionsmenu to set the task status.
Note:A task that corrects data is not always fol lowed by a task that reviews the data. A workflow developer can
specify correction and review tasks in any order.
Consider the following guidelines when you set the task status:
You cannot undo the task status. When you set the task status, the task disappears from your Inbox.
The Task Actionsmenu can have one or more options. When the menu contains multiple options, you select
the option that represents the status of the records in the task.
The options on the Task Actionsmenu are defined in the workflow that contains the task. You cannot add or
edit options on the Task Actionsmenu.
Task Ownership
A workflow can assign a task to more than one user. For example, the workf low may identify a group of users to
work on a Human task.
When you open a task, you become the owner of the task. If the workflow assigned the task to multiple users,
Informatica Data Director removes the task from the Inbox of the other users when you take ownership of the task.
Task Data Export
You can export task data to a delimited file. Export data when you want to share the current state of the data with
others in applications such as Microsoft Excel.
You can share the data with colleagues who work on the same data project. Export data when you want to verify
that the updates you make in a task conform to the business rules defined for the project.
You export all data associated with the task instance, including the record or cluster data and the status data
saved for the task. The data includes any additional column data created by the Mapping task or Human task that
ran in the workflow.
Exception Task Data
When you export data from an exception task, you export all record data and status data from the database. The
export process includes any additional columns created by the Mapping task or Human task that ran in the
workflow.
The following table describes the metadata columns that export by default with exception data:
Column Name Description
ROW_IDENTIFIER Identifies the record row in the database table.
REVIEW_STATUS The status assigned to the record in Informatica Data Director
for Data Quality.
Task Status 7
-
8/13/2019 DQ 951 IDDUserGuide En
15/39
Column Name Description
The status can be one of the following values:
- REVIEWED. You marked the record as reviewed.
- NULL. You did not mark the record as reviewed.
WORKFLOW_ID Identifies the workflow that contains the Human task
associated with the task.
USER_COMMENT Any comment added to the record in Informatica Data Director
for Data Quality.
UPDATED_STATUS The update status of the record in the task.
The status can be one of the following values:
- UPDATED. You added a comment to the record, or you
marked the record as reviewed.
- ACCEPTED. You accepted the record for storage in the
database table that contains valid data.
- REJECTED. You rejected the record as unsuitable for
storage in the database table that contains valid data.
- REPROCESS. You indicated that the record needs furtherprocessing in another application.
- NULL. You did not update the record
RECORD_STATUS The record status set by the workflow. The workflow sets the
status value when it writes the record to an exception table for
analysis in a Human task. The default value is INVALID. An
INVALID value indicates that the current record cannot be
stored in the database that contains valid data.
Duplicate Task Data
When you export data from a duplicate task, you export all cluster data and status data from the database. The
export process includes any additional columns created by the Mapping task or Human task that ran in theworkflow.
The following table describes the metadata columns that export by default with exception data:
Column Name Description
ROW_IDENTIFIER Identifies the record row in the database table.
SEQUENTIAL_CLUSTER_ID Unique identifier for the cluster in the database table.
CLUSTER_ID Identifies the cluster that the record belongs to. The Mapping
task assigns a cluster ID value to each record in the table.
MATCH_SCORE Decimal value between 0 and 1. Identifies the degree of
similarity between two records in the cluster.
IS_MASTER Indicates if the record is the preferred record in the cluster.
The possible values are Y and N.
UPDATED_STATUS The update status of the record.
8 Chapter 2: Tasks
-
8/13/2019 DQ 951 IDDUserGuide En
16/39
Column Name Description
The status can be one of the following values:
- UPDATED. You updated a value in the record.
- NULL. You did not update the record.
- EXTRACTED. You removed the record from a cluster.
USER_COMMENT Any comment added to the cluster in Informatica Data
Director for Data Quality.
REVIEW_STATUS The status assigned to the cluster in Informatica Data Director
for Data Quality.
The status can be one of the following values:
- REVIEWED. You confirmed the record as reviewed.
- NULL. You did not mark the record as reviewed.
WORKFLOW_ID Identifies the workflow that contains the Human task
associated with the task.
Task Administration
If you are a business administrator, you manage the status of a set of task instances. For example, if a user is
unable to complete a task instance on schedule, you can reassign the task to another user.
The Task Administrationtab displays the tasks that you manage in addition to any task that is assigned to you.
Use Task Administrationtab options to perform the following operations:
Open tasks assigned to you. If you are assigned a task to complete, you can use the Task Administrationtab
in the same way as the Inbox.
Assign tasks to users. For example, you can assign a task to a new user if the current user did not complete
the task.
View the work performed by a user on a task. For example, you can review the rate of user progress in a task
and verify that the user is performing the task correctly.
View the list of tasks that share a common Human task in a workflow.
Complete multiple tasks. You can move a set of tasks to a completed state when the tasks represent a
common Human task in a workflow. When you complete the tasks, the records in the tasks move from the
Human task to the next stage in the workflow.
Multiple Task Completion
You can complete multiple tasks in a single operation.
Complete multiple tasks in the following cases:
The workflow process failed, and you want to run the workflow again.
Users did not complete the task instances by the scheduled date, and you want to delete the tasks from the
user Inbox views.
Task Administration 9
-
8/13/2019 DQ 951 IDDUserGuide En
17/39
Use the Complete Linked Tasksoption on the Task Administrationview to compete multiple tasks. The
Complete Linked Tasksoption identifies the current task as an instance of a Human task, and it completes all
task instances that share the Human task as a parent. Informatica Data Director for Data Quality removes the
tasks from the Inbox views of the users who worked on the tasks.
When you complete all tasks instances for a Human task, the Human task releases the record data to the next
stage in the workflow. The record data includes all updates and status changes made by the users who worked onthe tasks in Informatica Data Director for Data Quality.
Exporting Task Data
Export task data to a delimited file.
1. Open the task in the Dataview.
2. Open the Record Actions menu and select Export Data.
The Export Datadialog box appears.
3. Verify or edit the file name. By default, the Export Datadialog box uses the task name as the file name.
4. Select or clear the option to export field names as the first row of the file.
5. Click OK.
6. Select an option to open or save the file.
Assigning a Task to a User
Assign tasks to users on the Task Administrationtab. Assign a task when a task has no assigned user, or when
a user cannot complete a task on time.
1. On the Tasksview, select the Task Administrationtab.
2. Select a task from the task list.
3. Click Reassign Task.
4. Select a user to perform the task.
Viewing Tasks Assigned to Other Users
Use the Task Administrationoptions to open a task and view the progress made by a user in the task.
1. On the Tasksview, select the Task Administrationtab.
2. Select a task from the task list.
3. Click View Task.
The task opens in the Dataview.
10 Chapter 2: Tasks
-
8/13/2019 DQ 951 IDDUserGuide En
18/39
Viewing a List of Tasks in a Human Task
To view the tasks that a workflow generated from a single Human task, use the Complete Linked Tasksoption.
1. On the Tasksview, select the Task Administrationtab.
2. Select a task from the task list.
3. Click Complete Linked Tasks.
4. Review the information for each task.
The task list displays the following information for each task:
Task ID
Name of the task
Task type
Task owner
Due date
Status
Do not click OKin the task list. If you click OK, you advance all tasks to the next stage in the workflow and removeall tasks from the Inbox of each owner.
Completing Multiple Tasks
To complete all tasks that have a common Human task as a parent, use the Complete Linked Tasksoption.
When you complete a set of tasks, you end all work on the Human task and you advance the task records to the
next stage in the workflow.
Note: The action to complete the tasks does not update any record or status data.
1. On the Tasksview, select the Task Administrationtab.
2. Select a task from the task list.
3. Click Complete Linked Tasks. The list of tasks opens.
4. Verify that the list contains the tasks you want to complete.
5. Click OK.
If you open the Inbox or Dashboard after you complete the tasks, the Inbox or Dashboard may not display any
change to the task list. To view the current list of tasks in the Inbox or Dashboard, refresh the browser window.
Viewing a List of Tasks in a Human Task 11
-
8/13/2019 DQ 951 IDDUserGuide En
19/39
C H A P T E R 3
Exception Records
This chapter includes the following topics:
Exception Records Overview, 12
Data View for Exception Records, 13
Exception Record Correction, 14
Exception Record Review, 15
Exception Record Actions and Status, 15
Exception Record Example, 16
Editing Exception Records, 17
Filtering Exception Tasks in the Data View, 17
Setting the Status of an Exception Record, 17
Reviewing Exception Records, 18
Exception Records Overview
An exception is arecord that may contain one or more dataerrors. A workflow adds a record to a correct
exceptions task when other users or software processes cannot determine that the record data is correct.
When you correct exception records, you examine the records in the task for errors that you can fix. When you
review exceptionrecords, you verify the work done in an earlier task. After you edit or review a record, you update
the record status. You set the task status to indicate the overall status of the records in the task.
The workflow adds metadata columns to the records in the task. You use the metadata columns to update the
status of each record.
You can perform the following actions to correct exceptions:
Edit a record.
When you open a task, the Data view displays the records in the task and indicates the cells that contain data
quality issues. You can select a cell and correct the error that it contains.
Accept a record for storage in the database.
You can mark a record as acceptable for permanent storage in the database. You can edit the record, or you
can accept the current data in the record.
Reject a record from the database.
You can decide that a record does not belong in the database table. You mark the record for deletion from the
table.
12
-
8/13/2019 DQ 951 IDDUserGuide En
20/39
Note: The task does not remove records from the table. The records are removed in a later task in the
workflow or by a user in another application.
Return a record for further processing.
You can decide that a record cannot be fixed in manual review. For example, you cannot determine the
correct state of the data in the record. You mark the record for additional processing in another application.
Clear the status of a record.
You can undo the status that you set for a record. For example, if you mark a record for rejection from the
database, but later you change your mind, you can clear the status and select a different status.
You can update or clear a record status at any time before you complete the task.
Complete the task.
You use the Task Actionsmenu to indicate that you have completed work on the task. The menu options
define the next step for the task data. The options are determined by the previous task in the workflow.
Data View for Exception RecordsThe Dataview shows the records in a task. When you open an exception task, the Dataview lists the records in
the task and provides a set of options you can use to complete the task.
Use the Customize Tableoption to organize the data columns in the Dataview. Use the Filteroption to filter the
records that appear in the Dataview.
The following table describes the options on the Dataview:
Option Description
Filter Exceptions Filters the list of records based on criteria you specify. You
can filter records by issue type, priority, and status.
Customize Table Selects the table columns to display.
Edit Enables edit features for a record. You must click Edit before
you make changes to a row.
Undo Reverses the last change you made in a record.
Redo Makes the change to a record that you previously reversed.
Find Finds records that match the criteria you enter in the filter
field for a column. The filter fields and the Filter Exceptions
options work independently of each other.
Task Actions Sets the status of the task. The workflow uses the Task
Actions sett ing to determine the next step for the task da ta
when you complete the task.
Record Actions Sets the status of a record. You set the record status when
you are finished work on the record. You can also undo or
redo any edit to the record you select.
Data View for Exception Records 13
-
8/13/2019 DQ 951 IDDUserGuide En
21/39
Exception Task Filters
You can filter the records by the types of issue they contain, the priority assigned to them, and the current status
of the records.
The following table describes the filter options:
Option Description
Type of issue Indicates the type of data quality issue that the workflow
identified in the record data. The data quality issue indicates
that the record is an exception in the database.
Hold the cursor over the red icon in a table cell to view the
issue name.
Priority Indicates the priority that the workflow assigned to the data
quality issue in the record.
Status Indicates the status of the record in the current task, based on
the data quality of the record. You can choose from the
following status options:- Accepted. Records accepted for storage in the database.
- Rejected. Records rejected as unsuitable for storage in the
database.
- Reprocessed. Records that need further analysis in
another application.
- Empty. Records with no current status.
Review Indicates the review status of the record in the current task.
You can choose from the following review options:
- Reviewed. Records that are marked as reviewed.
- Empty. Records that are not marked as reviewed.
Exception Record Correction
The records in a correct exceptions task contain data quality issues that an earlier data process has identified. The
data quality issues may or may not indicate an error in the data.
When you work in a correct exceptions task, you complete one or more of the following steps:
Verify that the records contain an error. If the record does not contain an error, you can accept the record for
storage in the table.
Update the records with correct data. If you can update a record with correct data, you can accept the record
for storage in the table.
Identify records that cannot be used by the business. If you cannot update a record to a usable form, you can
reject the record.
Update the status of a record. The status you set determines how the workflow processes the record when the
task completes.
Set the task status. You select the status that best represents the current state of the record data. Set the task
status when you finish work on the task data.
14 Chapter 3: Exception Records
-
8/13/2019 DQ 951 IDDUserGuide En
22/39
Exception Record Review
When you perform a task to review exception record data, you validate the work done by another user in an earlier
task.
The steps to complete the review task are similar to the steps to correct the data. In a review task, you can verify
or undo the work performed in the earlier task. You can find both types of task in your Inbox.
When you review exception records, you examine the changes made by the previous user and the status assigned
to each record. Consider the following questions:
Are the data updates correct in each record?
Does the record status reflect the current state of the record?
The level of data accuracy and completeness in the record must match the record status. For example, if a record
contains errors in a review task, the record status may indicate that the record requires additional processing, or
that the record is unfit for storage in the database. When you fix update data in a record, you must also verify that
the record status is accurate.
Exception Record Actions and Status
You use the Record Actionsmenu options to update record data and to set the status of a record. When you set
the record status, you update a metadata field that other users or processes can use after the task completes.
The Record Actionsmenu displays the following options:
Menu Option Description
Edit Enables edit features for a record. You must click Edit before
you make changes to a row.
Undo Reverses the last change you made in a record.
Redo Makes the change to a record that you previously reversed.
Comment Adds a comment that you type to the audit trail for the record.
Reprocess Record Adds a value to the record to indicate that the record must be
reprocessed by another application.
Accept Record Adds a value to the record to indicate th at the record is
acceptable for storage in the database.
Reject Record Adds a value to the record to indicate that the record can be
deleted from the database.
Clear Record Status Clears the status indicator that you set or another user set for
the record.
Approve Record Edi t Review tasks o nly. A dds a value to the record to confi rm the
edit made to the record.
Exception Record Review 15
-
8/13/2019 DQ 951 IDDUserGuide En
23/39
Menu Option Description
Reject Record Edit Review tasks only. Adds a value to the record to reject the
edit made to the record.
Mark as Reviewed Correct tasks only. Confirms that you have reviewed therecord.
Clear Reviewer Status Clears the status indicator set for the record.
Export Data Exports all record and task data from the task in a delimited
file.
View Audit Trail Opens the audit trail view for the task.
Close Closes the task.
Note: When you approve or reject an edit in a review task, you add a status indicator to the record. The status
update does not update the data in other fields in the record. The next steps in the workflow or data project
determine any further update performed on the record.
Exception Record Example
You are part of a team of data stewards at a retail organization. Your role is to maintain the data quality of a set of
customer account records. You are concerned that the records in the data set contain errors.
A member of your team uses the Developer tool to evaluate the accuracy of the customer account data. The
developer creates a workflow with a Mapping task and a Human task. The Mapping task categorizes the data
according to different levels of data accuracy. In some cases, the mapping cannot verify the accuracy of the
records. The workflow passes the unresolved records to the Human task, and the Human task creates taskinstances for Informatica Data Director for Data Quality users.
When you open a task in Informatica Data Director for Data Quality, the Dataview displays the records assigned
to you. The task uses red indicators to identify the cells that contain problem data. You update every cell that
contains data that you can fix.
If you determine that the record is correct in its current state, select Accept Record. If you cannot edit the record,
select Reprocess Record. If the record does not belong in the table, select Reject Record.
Use the Task Actionsmenu to set the task status when you complete work on the task. You can set the task
status at any time. Select the task status that represents the state of the records in the task and the type of data
operation that the records now require. The workflow may define a single task status that moves the records to the
next task in the workflow.
Consider the following factors when you examine the data:
A record may contain no errors. The purpose of the task is to evaluate data qual ity in cases where a software
process was unable to do so.
You update the record status when you finish work on the highlighted cells in the record. You must update the
record status whether you edit the record or not.
16 Chapter 3: Exception Records
-
8/13/2019 DQ 951 IDDUserGuide En
24/39
Editing Exception Records
When you open a task that contains exceptions, the Dataview uses a red icon to identify data values that may
contain errors. Examine the values and correct any error you find.
You can edit records when you correct exceptions and review exceptions.
1. Open the task in Dataview, and click Edit.
2. In any record, select a data value that specifies an error.
3. Enter the correct data value.
4. Click Save.
The icon applied to the data value changes from red to green.
5. Set the record status to Accept Record. The status indicates that the record is now acceptable for storage in
the database.
Repeat the steps for other records in the task.
Note: You may not know the correct data values for every record in the task. If you cannot edit a record, set the
status to Reprocess Record. If you determine that the record is not acceptable in any form, set the status to
Reject Record.
Filtering Exception Tasks in the Data View
You can filter the data records on the types of issue they contain, on the priority of the data quality issue in the
record, and the status of the record. By default, the task does not apply a filter to the data.
Use the Filteroption to filter the records that display in the Data view.
1. Open a correct exception or review exception task.
2. In the Dataview, click Filter.The Filterdialog box opens.
3. Select the filter criteria to apply to the task data.
4. Click Applyto apply the filter to the clusters in the task.
Setting the Status of an Exception Record
When you complete work on a record, you set the record status. You do not need to edit a record to set the status.
1. Click Edit in the Dataview.2. Select a record.
3. Open the Record Actionsmenu and select the status you want to apply to the record.
To indicate that a record contains correct business information and can remain in the database, select
Accept Record.
To indicate that a record does not contain usable information and can be deleted from the database, select
Reject Record.
Editing Exception Records 17
-
8/13/2019 DQ 951 IDDUserGuide En
25/39
To indicate that the record needs further processing before it can be returned to the database, click
Reprocess Record.
You can use the Clear Record Statusoption on the Record Actionsmenu to clear the status you set.
Reviewing Exception Records
When you review the output of a task that corrects exceptions, you validate the status of each record. The status
determines how the records are treated in the next stage of the workflow. The review task ends when you review
all records and set the task status.
Perform the following steps for all records in the task:
1. Open the task in Dataview.
2. Verify that the status of each record represents the information in the record.
For example, a record may be marked for deletion from the database, but you may decide that the record
contains usable information. You update the status so that the record is reprocessed and not deleted. If a record is marked for storage in the database but needs additional work, click Editand clear the record
status.
If you identify an error in the record, click Editand update the record. When you are satisfied that the
record is correct, you can update the record status.
3. After you review all records, set the task status.
18 Chapter 3: Exception Records
-
8/13/2019 DQ 951 IDDUserGuide En
26/39
C H A P T E R 4
Duplicate Records
This chapter includes the following topics:
Duplicate Records Overview, 19
Data View for Duplicate Records, 20
Duplicate Record Correction, 21
Duplicate Record Review, 22
Duplicate Record Actions and Status, 22
Duplicate Record Example, 23
Filtering Duplicate Tasks in the Data View , 24
Editing a Cluster in a Duplicate Task, 24
Creating a Cluster, 25
Finding Duplicate Records in Multiple Clusters, 25
Setting the Status of a Cluster, 25
Reviewing Duplicate Records, 26
Duplicate Records Overview
When you correct duplicate records, you edit the contents of one or more clusters. A cluster contains records that
may or may not be duplicates of each other.
If two or more records are duplicates, you consolidate the records into a single record called the preferred record.
If a record is not a duplicate of another record in the cluster, you remove it from the cluster. You can move a
record from one cluster to another. You can create a cluster with a single record if the record is unique. The task is
complete when you edit all clusters.
A cluster contains two or more records. Each cluster identifies a preferred record. The preferred record contains
the most accurate representation of the information in the cluster. By default, Informatica Data Director for Data
Quality selects the first record in the cluster as the preferred record. To update a cluster, replace data values in
the preferred record with more accurate values from any other record in the cluster. When the preferred record is
complete, mark the cluster as reviewed and start work on the next cluster in the task.
Note: Two or more records are duplicates when they contain the same business information. Records can contain
similar data but not represent the same information to the business.
Perform the following actions to correct duplicates:
19
-
8/13/2019 DQ 951 IDDUserGuide En
27/39
Examine and edit the cluster records.
When you open a task, the Dataview displays the records in the first cluster and nominates a record as the
preferred record. Examine the values in each record in the cluster. If you find values that contain more
accurate information than the preferred record values, replace the preferred record values.
Find records in other clusters.
If you expect that the task data set contains duplicate records across more than one cluster, search for
records in other clusters and display the clusters together onscreen. If duplicate records exist across the
clusters, move records from one cluster to another.
Create a cluster, and identify unique records.
A cluster may contain records that you can consolidate into two unique preferred records. In this case, create
a cluster and define a preferred record for each cluster.
A cluster may contain a record that is not a dupl icate of any other record in the cluster. In this case, create a
cluster and add the record to it. The new cluster contains a single record.
Identify redundant records.
When you complete the preferred record in a cluster, mark the cluster as reviewed. Informatica Data Director
for Data Quality marks the preferred record for storage in the database table and marks the remaining recordsas redundant. The database deletes the redundant records in a later stage of the workflow.
Change the status of the cluster.
You can mark a cluster as reviewed, and you can undo the status that you set for cluster. For example, if you
create a preferred record, but later you change your mind, you can clear the status.
Complete the task.
To indicate that you completed work on the task, use the Task Actionsmenu options. The options determine
the possible next steps for the task data. The workflow defines the options.
Data View for Duplicate RecordsThe Dataview shows the clusters in the task and provides a set of options you can use to complete the task.
When you open a duplicate task, the Dataview organizes the clusters on a series of tabs. Each tab displays the
records in a single cluster.
Use the Customize Tableoption to organize the data columns in the Dataview. Use the Filteroption to filter the
clusters that appear in the Dataview.
The following table describes the options on the Dataview:
Option Description
Customize Table Selects the table columns to display.
Filter Filters the clusters that display in the Data view. You can use
filters to view clusters with a specified status or to find records
that contain specified values.
Edit Enables edit features for a cluster. You must click Edit before
you make changes to a row.
20 Chapter 4: Duplicate Records
-
8/13/2019 DQ 951 IDDUserGuide En
28/39
Option Description
Undo Reverses the last change you made in a record.
Redo Makes the change to a record that you previously reversed.
Task Actions Sets the status of the task. The workflow uses the Task
Actions sett ing to determine the next step for the task. Select
a task action when all clusters in the task are ready for the
next stage in the workflow.
Cluster Actions Sets the status of a cluster. Set the status when you finish
work on the cluster. You can also undo or redo any update to
the cluster you select.
Duplicate Task Filters
You can sort the clusters that appear in the Dataview. You can also filter the view to display only the clusters that
contain a data value that you specify.
Use the Filteroptions to filter and sort the clusters. You sort the clusters by the status assigned to each cluster by
the user who worked on the task.
The following table describes the status options:
Option Description
Accepted Ident ifies cluste rs marked as accepted fo r storage in the da tabase .
Rejec ted Ident if ies c lusters marked as unsuitable for storage in the database.
Reviewed Ident if ies clusters tha t a re marked as reviewed.
Duplicate Record Correction
The clusters in a correct duplicate task contain records that may contain duplicate information.
When you work in a correct duplicates task, complete one or more of the following steps:
Verify that the records in the cluster represent different versions of the same record.
If the cluster records are duplicates of a single record, select the most accurate data values in each record and
add the values to the preferred record.
If the cluster contain information from more than one record, create a cluster for each unique record andconfigure a preferred record in each cluster.
If the cluster contains a record that does not match the other records, create a cluster and add the record to the
cluster. The new cluster contains a single preferred record.
Find records in other clusters that may be duplicates of a record in the current cluster. If you believe that
duplicate records exist across more than one cluster, you can search the task data set for records that match
data values you specify.
Update the status of a cluster. The status you set determines how the workflow processes the preferred record.
Duplicate Record Correction 21
-
8/13/2019 DQ 951 IDDUserGuide En
29/39
Set the task status. Select the status that best represents the current state of the cluster data. Set the task
status when you finish work on the task.
Duplicate Record ReviewWhen you perform a task to review the records in a cluster, you validate the work done by another user in an
earlier task.
The steps to complete the review task are similar to the steps performed in the task that identified the preferred
record data. In a review task, you verify the work performed in the earlier task.
Note: You can find both types of task in your Inbox.
When you review cluster data, examine the preferred record defined by the previous user and the other records in
the cluster. Consider the following questions:
Does the preferred record represent the most accurate version of the records in the cluster? Update the
preferred record if you find more accurate data in another record in the cluster. Do the other records in the cluster include any record that the business may require? Create a cluster and add
non-redundant records to the new cluster. Then define a preferred record in the new cluster.
Duplicate Record Actions and Status
You use the Cluster Actionsmenu options to configure one or more preferred records and to set the status of
one or more clusters. When you set the cluster status, you update a metadata field that other users or processes
can use after the task completes.
The Cluster Actionsmenu displays the following options:
Menu Option Description
Find Cluster(s) Finds records in other clusters that contain data values you
specify.
Create Cluster Creates an empty cluster below the current cluster in the Data
view.
Use the Move Records to move a record between clusters.
Confirm Cluster Review Correct tasks only. Confirms that you have reviewed the
cluster and do not plan further changes.
Clear Cluster Status Clears the status indicator for the cluster.
Export Data Exports all cluster and task data from the task in a delimited
file.
View Audit Trail Opens the audit trail view for the task.
Close Closes the task.
22 Chapter 4: Duplicate Records
-
8/13/2019 DQ 951 IDDUserGuide En
30/39
Menu Option Description
View Comment Review tasks only. Opens any comment added to the cluster.
Mark as Accepted Review tasks only. Adds a value to the cluster to indicate that
you accept the cluster update operation performed in anearlier task.
Mark as Rejected Review tasks only. Adds a value to the cluster to indicate that
you reject the cluster update operation performed in an earlier
task.
Note: When you mark a cluster as accepted or rejected in a review task, you add a status indicator to the cluster.
The status update does not update the data in the cluster records. The next steps in the workflow or data project
determine any further update of the cluster records.
Duplicate Record ExampleAs a data steward in a retai l organization, you are concerned that the customer account data includes duplicate
records.
A member of your team uses the Developer tool to evaluate the levels of duplication in the customer account data.
The developer creates a workflow with a Mapping task and a Human task. The Mapping task sorts the records into
clusters according to the levels of similarity between them. Some records are similar but non-identical, and the
mapping cannot determine if the records are genuine duplicates. The workflow passes the unresolved records to
the Human task, and the Human task creates task instances for Informatica Data Director for Data Quality users.
When you open a task in Informatica Data Director for Data Quality, the Dataview displays the clusters assigned
to you. You review the records in the cluster and identify the data values that most accurately identify the
customer. Each cluster includes a default preferred record. If the preferred record does not contain the mostaccurate data values, update the preferred record with the values from other records in the cluster. You must also
determine if the records in the cluster represent one or more customer account.
Consider the following questions when you examine the data:
What type of business rules apply to the data? For example, can a table contain more than one account record
for a customer? The answers can help you determine the primary key columns in the data. Primary key
columns must contain unique values.
By default, the task selects the first record in every cluster as the preferred record. Do you need to update the
current preferred record? Update column values one by one to build the preferred version of the account
information.
Do you need to search outside the current cluster for duplicate records? You can search the task for records
that share data values with the current cluster. You can add records from one cluster to another cluster. You
can create a cluster with one more records in the current cluster.
Do you want to edit the data in a record? You may want to edit a record if you cannot create a preferred record
that you think is valid. However, you cannot edit data in a cluster. A workflow does not evaluate the accuracy of
a record when it adds it to a cluster. The purpose of the task is to define a single record that best represents
the information in the data set.
When you finish work on the cluster, you open the Cluster Actionsmenu and set the cluster status to Confirm
Cluster Review. You update the cluster status whether you edit the preferred record or not.
Duplicate Record Example 23
-
8/13/2019 DQ 951 IDDUserGuide En
31/39
Use the Task Actionsmenu to set the task status when you complete work on the task. You can set the task
status at any time.
The workflow defines the task status options that you see in the task. The task options indicate the paths that the
records can take through the workflow. Select the task status that represents the overall state of the records in the
task and the type of data operation that the records now require. The workflow may define a single task status that
moves the records to the next task in the workflow.
Filtering Duplicate Tasks in the Data View
Use the Filteroption to filter the duplicate clusters that display in the Dataview.
By default, the task does not apply any filter.
1. Open a correct duplicate or review duplicate task.
2. In the Dataview, click Filter. The Filter dialog box opens.
3. Enter a data value to use as a column filter. You select a data column and you enter a value that must occur
in the column.
When you apply the filter, the Data view displays the clusters in which one or more records contains the data
value in the column you specify.
4. Use the Up and Down arrows to sort the clusters by status.
If you do not sort the clusters, the Data view displays the clusters in numerical order by cluster ID.
5. Click Applyto apply the filter to the clusters in the task.
Editing a Cluster in a Duplicate Task
When you open a task that contains clusters, the clusters in the task appear on a series of tabs under the Data
view. The first cluster is open.
The cluster identifies the fist record in the cluster as the preferred record by default. Examine the records in the
cluster and select any data value that you want to add to the preferred record. You can select values from multiple
records.
1. Compare the preferred record with the other records in the cluster.
Identify the most accurate values in each column in the cluster.
2. Click Editin the Dataview.
3. Click a value in a record to add that value to the preferred record.
Repeat the steps for all values that you want to add to the preferred record. When you complete work in a cluster,
update the cluster status.
24 Chapter 4: Duplicate Records
-
8/13/2019 DQ 951 IDDUserGuide En
32/39
Creating a Cluster
Create a cluster when the current cluster contains information that identifies more than one non-duplicate record.
When you create a cluster, verify the preferred record in the new cluster.
1. On the Cluster Actions menu, select Create Cluster.
The new cluster appears in the Dataview below the current cluster.
2. Select a record to add to the new cluster.
3. Click Move Record.
The record becomes the preferred record in the new cluster.
4. Move any other record that matches the preferred record in the new cluster.
If the new cluster contains a single record, the task treats the preferred record as a unique record.
Finding Duplicate Records in Multiple ClustersUse the Find Cluster(s) option to find records that other clusters that may match records in the current cluster. You
specify a data value to search for and the record column that must store the data value.
1. In the Dataview, select Find Cluster(s).
The Find dialog box opens.
2. Enter the data value you want to find. You enter the full data value as it appears in the record column, or you
enter a wildcard value, such as an asterisk.
3. Select the column that contains the data value to search for.
4. Click Find.
The search operation returns all records that contain the value in the column you specify.
5. Select any record in the search results that matches a record in the open cluster. You can use the CTRL key
to select multiple records.
The Dataview displays the clusters that contain the records you select. You can use the Move Record option
to move a record from one cluster to the other.
Setting the Status of a Cluster
When you complete work on a cluster, you set the cluster status. You do not need to update the preferred record
in the cluster before you set the status.
1. Click Edit in the Dataview.
2. Open the Record Actionsmenu and select the Reviewedoption.
You can use the Clear Record Statusoption on the Cluster Actionsmenu to clear the status you set.
Creating a Cluster 25
-
8/13/2019 DQ 951 IDDUserGuide En
33/39
Reviewing Duplicate Records
When you review the output of a task that corrected duplicate records, you validate that the preferred records
represent the best version of the data in the clusters. You review one cluster at a time. The review task ends when
you review all clusters and set the task status.
Perform the following steps for all clusters in the task:
1. Open a cluster in Dataview.
2. Compare the preferred record with the other records in the cluster.
3. Verify that the preferred record contains the most accurate version of the data in the cluster.
If a cluster is classified as reviewed but needs additional work, click Editand clear the cluster status.
If you identify an error in the preferred record, click Editand update the preferred record. When you are
satisfied that the cluster is correct, you can update the cluster status.
After you review the records in al l clusters, you can set the task status.
26 Chapter 4: Duplicate Records
-
8/13/2019 DQ 951 IDDUserGuide En
34/39
C H A P T E R 5
Audit Trail Operations
This chapter includes the following topics:
Audit Trai l Operations Overview, 27
Audit Trai l Data, 27
Audit View Fil ters, 28
Audit View, 29
Filtering Records in the Audit View, 30
Audit Trail Operations Overview
Informatica Data Director for Data Quality stores audit trail data for all updates made in a task. Use the audit trail
data to review the changes to the task data.
You can perform the following operations on an audit trail:
View the list of task updates since the task was created.
Filter the audit trail by date, user name, and by type of update.
Add or remove columns from the audit trail view.
When a user edits a value in a record, the audit trail adds an edit tool icon to the value. Place the cursor over the
icon to see the earlier state of the value.
Note: When you view the audit trail for a duplicate task, the audit trail lists the records that users updated in the
task. The audit trail does not display cluster data.
Audit Trail Data
Each row in the audit trail represents a single data update. If you make multiple updates to a record, the audit trail
adds an entry for each update. The audit trail organizes record updates in chronological order. If a task contains
no updates, the audit trail is empty.
An audit trail d isplays the record data columns that you can edit in the task. In addition, an audit trail displays
metadata columns that identify the user who updated the task and the date and type of update.
27
-
8/13/2019 DQ 951 IDDUserGuide En
35/39
The following table describes the metadata columns in an audit trail:
Column Name Description
Updated By The user who updated the record.
Updated The date of the record update.
Comment Any comment added by a user.
Status Any status update made by a user.
Review Any review status set by a user.
Use the Customize Tableoption to organize the data columns that display in the Auditview.
Audit View FiltersYou can use the Filteroption to filter the records that display in the Auditview.
The following table describes the filter options:
Option Description
From
To
The date range for the updates you want to view.
User The user who performed the updates you want to view.
Status The status of the record in the current task, based on the dataquality of the record.
Review The review status of the record in the current task. You can
choose from the following review options:
- Reviewed. Records that are marked as reviewed.
- Empty. Records that are not yet reviewed.
- Cleared. Records in a previously reviewed state that a user
updated to unreviewed.
28 Chapter 5: Audit Trail Operations
-
8/13/2019 DQ 951 IDDUserGuide En
36/39
Status Filter Options
The status options you can use in the Audit view depend on the type of task you open.
The following table describes the status options you can set as filters in the Audit view:
Status Task Type Description
Accepted Except ion Records accep ted for sto rage i n the
database.
Cleared Exception Records with a status update that a
user deleted.
Empty Duplicate
Exception
Records with no status update.
Moved into cluster Duplicate Records that moved into the specified
cluster.
Moved out of cluster Duplicate Records that were moved out of thespecified cluster
Rejected Exception Records rejected as unsuitable for
storage in the database.
Reprocessed Exception Records that need further analysis in
another application.
Audit View
An audit trail opens in the Auditview. You can open an audit trail from the Dataview or from the Inbox.
When you open an audit trail from the Dataview, the audit trail displays the user updates in the current task.
When you open an audit trail from the Inbox, the audit trail displays user updates for the task you select.
Opening an Audit Trail from the Data View
1. Open a task in the Dataview.
2. In an exception task, select Record Actions.
In a duplicate task, select Cluster Actions.
3. Click View Audit Trail.
The audit data displays in the Auditview.
Opening an Audit Trail from the Inbox
1. Select the Tasksview, and open the Inbox.
2. Select a task.
3. Click View Audit Trail.
Aud it Vie w 29
-
8/13/2019 DQ 951 IDDUserGuide En
37/39
The audit data displays in the Auditview.
Filtering Records in the Audit View
Use the Filteroption to filter the records that display in the Audit view.
By default, the audit trail does not apply any filter.
1. Open a task in the Audit view, and click Filter. The Filterdialog box opens.
2. Select the filter criteria to apply to the task data.
3. Click Apply to apply the filter to the clusters in the task.
30 Chapter 5: Audit Trail Operations
-
8/13/2019 DQ 951 IDDUserGuide En
38/39
I N D E X
Aaudit trails
reading an audit trail 27
Audit view
filtering audit records 28, 30
Ccluster
creating a cluster 25
Find Cluster(s) option 25
updating cluster status 25
DDashboard view
columns 3
options 3
Data view
duplicate tasks 20
exception tasks 13
filtering the cluster data view 21, 24
filtering the exception data view 14, 17
duplicate records
correct duplicates task 19
creating a cluster 25editing duplicate record clusters 24
searching clusters 25
steps to correct duplicates 21
table metadata 8
updating cluster status 25
Eexception records
correct exceptions task 12
editing exception records 17
steps to correct exceptions 14
table metadata 7
updating record status 17
IInbox tab
columns 3
options 3
task lists 3
Inbox view
options 4
Informatica Data Director for Data Quality
logging in 2
overview 1
user interface 2
MModel repository 1
Ooptions
cluster actions 22cluster status 22
exception record status 15
Task Administration tab 4
Ppreferred record
creating a preferred record 24
Rreview task
review duplicates 26
review exceptions 18
steps to review clusters 22
steps to review exceptions 15
Ttask
correct duplicates 19
correct exceptions 12
description 1
exporting task data 7, 10
Human task 1, 6
Mapping task 1
task instances 1, 6
task operations 5
task status 7
tasks and workflows 1types of task 6
task administration
assigning a task to a user 10
viewing tasks assigned to others 10
task administration options 9
Task Administration tab 2
Tasks view
Dashboard 3
Inbox 3
Task Administration 4
31
-
8/13/2019 DQ 951 IDDUserGuide En
39/39
Vviews
Dashboard view 2