DQ 951HF2 UserGuide En

download DQ 951HF2 UserGuide En

of 115

Transcript of DQ 951HF2 UserGuide En

  • 8/11/2019 DQ 951HF2 UserGuide En

    1/115

    Informatica Data Quality (Version 9.5.1 HotFix 2)

    User Guide

  • 8/11/2019 DQ 951HF2 UserGuide En

    2/115

    Informatica Data Quality User Guide

    Version 9.5.1 HotFix 2June 2013

    Copyright (c) 2009-2013 Informatica Corporation. All rights reserved.

    This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by anymeans (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents andother Patents Pending.

    Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013 (1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

    The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.

    Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart,Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand,Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and Informatica Master DataManagement are trademarks or registered trademarks of Informatica Corporation in the U nited States and in jurisdictions throughout the world. All other company and productnames may be trade names or trademarks of their respective owners.

    Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved.Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rights reserved.Copyright

    Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright Meta Integration Technology, Inc. Allrights reserved. Copyright Intalio. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. All rights reserved. Copyright DataAInc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved. Copyright Rogue Wave Software, Inc. All rightsreserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright Glyph & Cog, LLC. All rights reserved. Copyright Thinkmap, Inc. All rights reserved. Copyright Clearpace Software Limited. All rights reserved. Copyright Information Builders, Inc. All rights reserved. Copyright OSS NokalvInc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights reserved. Copyright International Organization for Standardization 1986. All rights reserved. Copyright ej-technologies GmbH. All rights reserved. Copyright Jaspersoft Corporation. All rights reserved. Copyright isInternational Business Machines Corporation. All rights reserved. Copyright yWorks GmbH. All rights reserved. Copyright Lucent Technologies. All rights reserved. Copyright(c) University of Toronto. All rights reserved. Copyright Daniel Veillard. All rights reserved. Copyright Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright

    MicroQuill Software Publishing, Inc. All rights reserved. Copyright PassMark Software Pty Ltd. All rights reserved. Copyright LogiXML, Inc. All rights reserved. Copyright2003-2010 Lorenzi Davide, All rights reserved. Copyright Red Hat, Inc. All rights reserved. Copyright The Board of Trustees of the Leland Stanford Junior University. A ll rightsreserved. Copyright EMC Corporation. All rights reserved. Copyright Flexera Software. All rights reserved. Copyright Jinfonet Software. All rights reserved. Copyright ApInc. All rights reserved. Copyright Telerik Inc. All rights reserved. Copyright BEA Systems. All rights reserved.

    This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions of the Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in writing, softwaredistributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licenses for the specific language governing permissions and limitations under the Licenses.

    This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License Agreement, which may befound at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, includingbut not limited to the implied warranties of merchantability and fitness for a particular purpose.

    The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, andVanderbilt University, Copyright ( ) 1993-2006, all rights reserved.

    This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of thissoftware is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.

    This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, . All Rights Reserved. Permissions and limitations regarding this softwareare subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or wi thout fee is herebygranted, provided that the above copyright notice and this permission notice appear in all copies.

    The product includes software copyright 2001-2005 ( ) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available athttp://www.dom4j.org/ license.html.

    The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availableat http://dojotoolkit.org/license.

    This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

    This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

    This product includes OSSP UUID software which i s Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & Wireless Deutschland.Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

    This product includes software developed by Boost (http://www.boost.org/) or under the Boost software l icense. Permissions and limitations regarding this software are subject toterms available at http:/ /www.boost.org/LICENSE_1_0.txt.

    This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

    This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available athttp:// www.eclipse.org/org/documents/epl-v10.php.

    This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/license.html, http:// asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/ license.html,http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- license-agreement;http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/

    jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://

  • 8/11/2019 DQ 951HF2 UserGuide En

    3/115

    www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html;http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://www.jmock.org/license.html; http://xsom.java.net; and http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js; http://www.h2database.com/html/license.html#summary; and http://jsoncpp.sourceforge.net/LICENSE.

    This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License(http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License AgreementSupplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mit-license.php) andthe Artistic License (http://www.opensource.org/licenses/artistic-license-1.0).

    This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software aresubject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further informationplease visit http://www.extreme.indiana.edu/.

    This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226;6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422; 7676516; 7,720,842; 7,721,270; and 7,774,791, international Patents and other P atents Pending.

    DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the impliedwarranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. Theinformation provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject tochange at any time without notice.

    NOTICES

    This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software Corporation("DataDirect") which are subject to the following terms and conditions:

    1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITEDTO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

    2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL,SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THEPOSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF

    CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIAB ILITY, MISREPRESENTATION AND OTHER TORTS.

    Part Number: DQ-UG-95100-HF2-0001

  • 8/11/2019 DQ 951HF2 UserGuide En

    4/115

    Table of Contents

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v i iInformatica R esources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    Informatica M y Support Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    Informatica D ocumentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    Informatica W eb Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    Informatica H ow-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    Informatica K nowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    Informatica S upport YouTube Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    Informatica M arketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    Informatica V elocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    Informatica G lobal Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

    Part I: Informatica Data Quality Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Chapter 1: Introduction to Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    Chapter 2: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Reference Da ta Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    User-Defined Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Informatica R eference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Reference Da ta and Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Reference Ta bles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Referenc e Table Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Managed and Unmanaged Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Content Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Character Set s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Classifier Mod els. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Pattern Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Regular Expr essions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Token Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Creating a Co ntent Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Creating a Re usable Content Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Generating R eference Data from a Midstream Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Chapter 3: Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Classifier Mod els Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Table of Contents

  • 8/11/2019 DQ 951HF2 UserGuide En

    5/115

  • 8/11/2019 DQ 951HF2 UserGuide En

    6/115

    Part II: Data Quality Features in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    Chapter 5: Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . 34Column Profil e Concepts Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    Column Profil e Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    Column Profil es in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    Filtering Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    Sampling Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    Creating a Sin gle Data Object Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    Chapter 6: Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . 38Column Profil e Results in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    Column Value Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    Column Patte rn Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    Column Statis tics Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    Exporting Pro file Results from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    Chapter 7: Rules in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Rules in Infor matica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    Creating a Ru le in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    Applying a Ru le in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    Chapter 8: Scorecards in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Scorecards in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    Creating a Sc orecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    Exporting a R esource File for Scorecard Lineage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    Viewing Scor ecard Lineage from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    Chapter 9: Mapplet and Mapping Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Mapplet and Mapping Profiling Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    Running a Pr ofile on a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    Comparing Pr ofiles for Mapping or Mapplet Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    Generating a Map ping from a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    Chapter 10: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Reference Ta bles Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    Reference Ta ble Data Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    Creating a Re ference Table Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    Creating a Re ference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    Creating a Re ference Table from a Relational Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    Table of Contents

  • 8/11/2019 DQ 951HF2 UserGuide En

    7/115

    Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    Editing Reference Table Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    Finding Data Values in a Reference Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    Part III: Data Quality Features in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    Chapter 11: Co lumn Profiles in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . 54Column Profiles in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    Column Profil ing Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    Profile Option s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    Profile R esults Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    Sampling Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    Drilldown Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    Creating a Column Profile in the Analyst Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    Editing a Column Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    Running a Pr ofile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    Creating a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    Managing Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    Synchronizing a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    Synchronizing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    Chapter 12: Column Profile Results in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . 60Column Profil e Results in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    Profile Summ ary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    Column Value s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    Column Patte rns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    Column Statis tics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    Column Profil e Drilldown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    Drilling D own on Row Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    Apply ing Filters to Drilldown Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    Column Profil e Export Files in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    Profile Ex port Results in a CSV File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    Profile Ex port Results in Microsoft Excel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    Exporting Pro file Results from Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    Chapter 13: Rules in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Rules in Infor matica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    Predefined R ules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    Predefine d Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    Apply ing a Predefined Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    Expression R ules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    Expressi on Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    Creating an Expression Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    iv Table of Contents

  • 8/11/2019 DQ 951HF2 UserGuide En

    8/115

    Chapter 14: Scorecards in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Scorecards in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    Informatica A nalyst Scorecard Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    Metric W eights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    Adding Colum ns to a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    Running a Sc orecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    Viewing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    Editing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    Defining Thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    Metric Gr oups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    Drilling Down on Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    Viewing Tren d Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    Scorecard Notifica tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    Notification E mail Message Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    Setting Up Sc orecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Configuring G lobal Settings for Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    Scorecard Int egration with External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    Viewing a Sco recard in External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    Scorecard Lineage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    Viewing Scorecard Lineage in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    Chapter 15: Exception Record Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Exception Record Management Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    Exceptio n Management Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    Reserved Col umn Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Exception Ma nagement Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    Viewing and E diting Bad Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    Updating Bad Record Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    Viewing and F iltering Duplicate Record Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    Editing D uplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    Consolidating Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    Viewing the A udit Trail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    Chapter 16: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Reference Table P roperties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    General Refe rence Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    Reference Ta ble Column Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    Create Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    Creating a Reference Table in the Reference Table Editor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    Create a Refe rence Table from Profile Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    Table of Contents

  • 8/11/2019 DQ 951HF2 UserGuide En

    9/115

    Creating a Reference Table from Profile Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    Creating a Reference Table from Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    Creating a Reference Table from Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    Create a Reference Table From a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    Analyst Tool Flat Fi le Propert ies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    Create a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

    Creating a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

    Creating a Database Connection for a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    Reference Table Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    Managing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    Managing Rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    Finding and Replacing Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    Exporting a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    Enable and Disable Edits to an Unmanaged Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . 98

    Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Viewing Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    Rules and Guidelines for Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    vi Table of Contents

  • 8/11/2019 DQ 951HF2 UserGuide En

    10/115

    PrefaceThe Informatica Data Quality User Guide is written for Informatica users who create and run data quality processes inthe Informatica Developer and Informatica Analyst client applications. The Informatica Data Quality User Guidecontains information about profiles and other objects that you can use to analyze the content and structure of data andto find and fix data quality issues.

    Informatica Resources

    Informatica My Support Portal As an Informatica customer, you can access the Informatica My Support Por tal at h ttp://mysupport. informatica.com.

    The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base,Informatica Product Documentation, and access to the Informatica user community.

    Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you havequestions, comments, or ideas about this documentation, contact the Informatica Documentati on team through emailat [email protected]. We will use your feedback to improve our documentation. Let us know if wecan contact you regarding your comments.

    The Documentation team updates documentation as needed. To get the latest documentation for your product,navigate to Product Documentation from http://mysupport.informatica.com.

    Informatica Web SiteYou can access the Informatica corporate web site at http://www.informati ca.com. The site contains information aboutInformatica, its background, upcoming events, and sales offices. You will also find product and partner information.The services area of the site includes important information about technical support, training and education, andimplementation services.

    Informatica How-To Library As an Informati ca customer, you can access the Informatica How-To Library at http://mysupport .informati ca.com. TheHow-To Library is a collecti on of resources to help you learn more about Informatica product s and features. It includesarticles and interactive demonstrations that provide solutions to common problems, compare features and behaviors,and guide you through performing specific real-world tasks.

    vii

    http://mysupport.informatica.com/http://www.informatica.com/http://mysupport.informatica.com/mailto:[email protected]://mysupport.informatica.com/
  • 8/11/2019 DQ 951HF2 UserGuide En

    11/115

    Informatica Knowledge Base As an Informatica customer, you can access the Informat ica Knowledge Base at http: //mysupport .informatica.com.Use the Knowledge Base to search for documented solutions to known technical issues about Informatica products.You can also find answers to frequently asked questions, technical white papers, and technical tips. If you havequestions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team through

    email at KB_Feedback @informatica.com.

    Informatica Support YouTube ChannelYou can access the Informatica Support YouTube channel at http://www.youtube.com/user/INFASupport. TheInformatica Support YouTube channel includes videos about solutions that guide you through performing specifictasks. If you have questions, comments, or ideas about the Informatica Support YouTube channel, contact theSupport YouTube team through email at [email protected] or send a tweet to @INFASupport.

    Informatica MarketplaceThe Informatica Marketplace is a forum where developers and partners can share solutions that augment, extend, or

    enhance data integration implementations. By leveraging any of the hundreds of solutions available on theMarketplace, you can improve your productivity and speed up time to implementation on your projects. You canaccess Informatica Marketplace at http://www.informaticamarketplace.com.

    Informatica VelocityYou can access Informatica Velocity at http://mysupport.informatica.com. Developed from the real-world experienceof hundreds of data management projects, Informatica Velocity represents the collective knowledge of our consultants who have worked with organizations from around the world to plan, develop, deploy, and maintainsuccessful data management solutions. If you have questi ons, comments, or ide as about Informatica Velocity,contact Informatica Professional Services at [email protected].

    Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the Online Support. Online Support requires auser name and password. You can request a user name and password at http://mysupport.informatica.com.

    viii Preface

    http://[email protected]/http://mysupport.informatica.com/http://[email protected]/http://mysupport.informatica.com/http://www.informaticamarketplace.com/http://[email protected]/http://www.youtube.com/user/INFASupportmailto:[email protected]://mysupport.informatica.com/
  • 8/11/2019 DQ 951HF2 UserGuide En

    12/115

    Use the following telephone numbers to contact Informatica Global Customer Support:

    No rth Am eri ca / So uth Am er ica Eu ro pe / M id dle East / Afr ic a A sia / Au str ali a

    Toll FreeBrazil 0800 891 0202Mexico 001 888 209 8853North America +1 877 463 2435

    Toll FreeFrance 0805 804632Germany 0800 5891281Italy 800 915 985Netherlands 0800 2300001Portugal 800 208 360Spain 900 813 166Switzerland 0800 463 200United Kingdom 0800 023 4632

    Standard RateBelgium +31 30 6022 797France +33 1 4138 9226Germany +49 1805 702702

    Netherlands +31 30 6022 797United Kingdom +44 1628 511445

    Toll FreeAustralia 1 800 120 365Asia Pacific 00 080 00016360China 400 810 0900

    Preface ix

  • 8/11/2019 DQ 951HF2 UserGuide En

    13/115

    x

  • 8/11/2019 DQ 951HF2 UserGuide En

    14/115

  • 8/11/2019 DQ 951HF2 UserGuide En

    15/115

    CH A P T E R 1

    Introduction to Data QualityThis chapter includes the following topic:

    Data Quality Overview, 2

    Data Quality OverviewUse Informatica Data Quality to analyze the content and structure of your data and enhance the data in ways that meetyour business needs.

    You use Informatica applications to design and run processes to complete the following tasks:

    Profile data. Profiling reveal s the content and structure of data. Profiling is a key step in any data project, as it canidentify strengths and weaknesses in data and help you define a project plan.

    Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements ina profile.

    Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run aprofile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure thatthe city, state, and ZIP code values are consistent.

    Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to thetype of information it contains. Parsing can also add information to records. For example, you can define a parsingoperation to add units of measurement to product data.

    Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postaladdress data. Address validation corrects errors in addresses and completes partial addresses by comparingaddress records against address reference data from national postal carriers. Address validation can also addpostal information that speeds mail delivery and reduces mail costs.

    Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing datafrom one or more fields in each record. You select the fields to be analyzed, and you select the comparisonstrategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, whichidentifies simi lar or duplicate records , and identity matching, which identifies similar or duplicate identities inrecord data.

    Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You canrun a mapping to capture any exception record that remains in a data set after you run other data quali ty processes.You review and edit exception records in the Analyst tool or in Informatica Data Director for Data Quality.

    Create reference data tables. Informatica provides reference data that can enhance several types of data qualityprocess, including standardization and parsing. You can create reference tables using data from profile results.

    Create and run data quality rules. Informatica provides rules that you can run or edit to meet your projectobjectives. You can create mapplets and validate them as rules in the Developer tool.

    2

  • 8/11/2019 DQ 951HF2 UserGuide En

    16/115

    Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository isavailable to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users cantake ownership of objects at different stages of a project.

    Export mappings to PowerCenter. You can export and run mappings i n PowerCenter. You can export mappings toPowerCenter to reuse the metadata for physical data integration or to create web services.

    Data Quality Overview

  • 8/11/2019 DQ 951HF2 UserGuide En

    17/115

    CH A P T E R 2

    Reference DataThis chapter includes the following topics:

    Reference Data Overview, 4

    User-Defined Reference Data, 5

    Informatica Reference Data, 6

    Reference Data and Transformations, 6

    Reference Tables, 7

    Content Sets, 8

    Reference Data Overview A reference data objec t contains a set of data values that you perform search operations in source data. You cancreate reference data objects in the Developer tool and Analyst tool, and you can import reference data objects to theModel repository. The Data Quality Content installer includes reference data objects that you can import.

    You can create and edit the following types of reference data:

    Reference tables

    A reference table contains standard and alternati ve versions of a set of data values. You add a reference tabl e toa transformation in the Developer tool to verify that source data values are accurate and correctly formatted.

    A database table contains at least two columns. One column contains the standard or preferred version of astring, and other columns contain alternative versions. When you add a reference table to a transformation, thetransformation searches the input port data for values that also appear in the table. You can create tables withany data tha t is useful to the data project you wor k on.

    Content Sets

    Content sets are repository and file objects that contain reference data values. Content sets are similar instructure to r eference tables but they ar e more commonly used for lower-level There are different types of contentsets. When you add a content set to a transformation, the transformation searches the input port data for valuesthat appear in the content or for strings that match the data patterns defined in the content set.

    The Data Quality Content installer includes reference data objects that you can import. You download the Data QualityContent Installer from Informatica.

    The Data Quality Content installer includes the following types of reference data:

    4

  • 8/11/2019 DQ 951HF2 UserGuide En

    18/115

    Informatica reference tables

    Database tables created by Informatica. You import Informatica reference tables when you import accelerator objects from the Content Installer. The reference tables contain standard and alternative versions of commonbusiness terms from several countries. The types of reference information include telephone area codes,postcode formats, first names, Social Security number formats, occupations, and acronyms. You can edit

    Informatica reference tables.Informatica content sets

    Content sets created by Informatica. You import content sets when you import accelerator objects from theContent Installer. A content set contains different types of reference data that you can use to perform searchoperations in data quality transformations.

    Address reference data files

    Reference data files that identify all valid addresses in a country. The Address Validator transformation reads thisdata. You cannot create or edit address reference data files.

    The Content Installer installs files for the countries that you have purchased. Address reference data is current for a defined period and you must refresh your data regularly, for example every quarter. You cannot view or editaddress reference data.

    Identity population files

    Contain information on types of personal, household, and corporate identiti es. The Match transformation and theComparison transformation use this data to parse potential identiti es from input fields. You cannot create or editaddress identity population files.

    The Content Installer writes population files to the file system.

    User-Defined Reference DataYou can use the values in a data object to create a reference data object.

    For example, you can select a data object or profile column that contains values that are specific to a project or organization. The column values let you create custom reference data objects for a project.

    You can build a reference data object from a data column in the following cases:

    The data rows in the column contain the same type of information.

    The column contains a set of data values that are either correct or incorrect for the project.

    Note: Create a reference object with incorrect values when you want to search a data set for incorrect values.

    The following table lists common examples of project data columns that can contain reference data:

    Information Reference Data Example

    Stock Keeping Unit (SKU) codes Use an SKU column to create a reference table of valid SKcode for an organization. Use the reference table to findcorrect or incorrect SKU codes in a data set.

    Employee codes Use an employee code or employee ID column to create areference table of valid employee codes. Use the referencetable to find errors in employee data.

    User-Defined Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    19/115

    Information Reference Data Example

    Customer account numbers Run a profile on a customer account column to identifyaccount number patterns. Use the profile to create a tokenset of incorrect data patterns. Use the token set to find

    account numbers that do not conform to the correct accountnumber structure.

    Customer names When a customer name column contains first, middle, andlast names, you can create a probabilistic model that definesthe expected structure of the strings in the column. Use theprobabilistic model to find data strings that do not belong inthe column.

    Informatica Reference DataYou purchase and download address reference data and identity population data from Informatica. You purchase anannual subscription to address data for a country, and you can download the latest address data from Informatica atany time during the subscription period.

    The Content Installer user downloads and installs reference data separately from the applications. Contact an Administrator tool user for information about the reference data ins talled on your system

    Reference Data and TransformationsSeveral transformations read reference data to perform data quality tasks.

    The following transformations can read reference data:

    Addres s Val idator . Reads address reference data to veri fy the accuracy of addresses .

    Case Converter. Reads reference data tables to identify strings that must change case.

    Classifier. Reads content set data to identify the type of information in a string.

    Comparison. Reads identity population data during duplicate analysis.

    Labeler. Reads content set data to identify and label strings.

    Match. Reads identity population data during duplicate analysis.

    Parser. Reads content set data to parse strings based on the information the contain.

    Standardizer. Reads reference data tables to standardize strings to a common format.

    You can create reference data objects in the Developer tool and Analyst tool. For example, you can create a referencetable from column profile data. You can export reference tables to the file system.

    The Data Quality Content Installer file set includes Informatica reference data objects that you can import.

    6 Chapter 2: Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    20/115

    Reference Tables A reference table contains the standard versions of a s et of data values and al ternative versions of the values thatmight occur in business data.

    You add a reference table to a transformation in the Developer tool. You use the transformations to find reference datavalues in input data and to write the alternative values as output data.

    You create a reference table in the following ways:

    Create an empty reference table and enter the data values.

    Create a reference table from data in a flat file.

    Create a reference table from data in another database table.

    Create a reference table from column profile results.

    You can create a reference table in the Developer tool or Analyst tool. You can edit reference table data in theDeveloper tool. You can edit reference table data and metadata in the Analyst tool. When you cr eate a reference table,the Model repository stores the table metadata as a repository object.

    Reference Table StructureMost reference tables contain at least two columns. One column contains the correct or required versions of the datavalues. Other columns contain different versions of the values, including alternative versions that may appear in thesource data.

    The column that contains the correct or required values is called the valid column. When a transformation reads areference table in a mapping, the transformation looks for values in the non-valid columns. When the transformationfinds a non-valid value, it returns the corresponding value from the valid column. You can also configure atransformation to return a single common value instead of the valid values.

    The valid column can contain data that is formally cor rect, such as ZIP codes. It can contain data that is relevant to aproject, such as stock keeping unit (SKU) numbers that are unique to an organization. You can also create a validcolumn from bad data, such as values that contain known data errors that you want to search for.

    For example, a Developer tool user creates a reference table that contains a list of valid SKU numbers in a retailorganization. The user adds the reference table to a Labeler transformation and creates a mapping with thetransformation. The user runs the mapping on a product database tabl e. When the mapping runs, the Labeler createsa column that identifies the product records that do not contain valid SKU numbers.

    Reference Tables and the Parser Transformation

    You create a reference table with a single column when you want to use the table data in a pattern-based parsingoperation. You configure the Parser transformation to perform pattern-based parsing, and you import the data to thetransformation configuration.

    Managed and Unmanaged Reference TablesReference tables store metadata in the Model repository. Reference tables can store column data in the referencedata warehouse or in another database.

    When the reference data warehouse stores the column data, Informatica services identify the table as a managedreference table. When another database stores the column data, Informatica services identify the table as anunmanaged reference table. The Content Management Service specifies the database connection for the referencedata warehouse.

    You can edit managed and unmanaged reference table data in the Developer tool and Analyst tool. You can editmanaged and unmanaged reference table metadata in the Analyst Tool.

    Reference Tables

  • 8/11/2019 DQ 951HF2 UserGuide En

    21/115

    Before you edit the table, verify that you have the required privileges on the following services:

    Content Management Service. To edit reference table data, you need the Edit Reference Table Data privilege. Toedit reference table metadata, you need the Edit Reference Table Metadata privilege.

    Model Repository Service. To view the project that contains the reference table, you need the Create Projectprivilege.

    Use the Security options in the Administrator tool to review or update the service privileges.

    To edit data in an unmanaged reference table, verify also that you configured the reference table object to permitedits.

    Note: If you edit the metadata for an unmanaged reference table in a database application, use the Analyst tool tosynchronize the Model repository with the database table. You must synchronize the Model repository and thedatabase table before you use the unmanaged reference table in the Developer tool.

    Content Sets A content se t is a Model repository obj ect that you use to store reusable content expressions . A content expression isan expression that you can use in Labeler and Parser transformations to identify data.

    You can create content sets to organize content expressions into logical groups. For example, if you create a number of content expressions that identify Portuguese strings, you can create a content set that groups these contentexpressions. Create content sets in the Developer tool.

    Content expressions include character sets, pattern sets, regular expressions, and token sets. Content expressionscan be system-defined or user-defined. Sys tem-defined content expressions cannot be added to content sets. User -defined content expressions can be reusable or non-reusable.

    Character Sets A character set contains express ions that identi fy specifi c characters and character ranges . You can use character sets in Labeler transformations that use character labeling mode.

    Character ranges specify a sequential range of character codes. For example, the character range "[A-C]" matchesthe uppercase characters "A," "B," and "C." This character range does not match the lowercase characters "a," "b," or "c."

    Use character sets to identify a specific character or range of characters as par t of labeling operations. For example,you can label all numerals in a column that contains telephone numbers. After labeling the numbers, you can identifypatterns with a Parser transformation and write problematic patterns to separate output ports.

    8 Chapter 2: Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    22/115

    Character Set PropertiesConfigure properties that determine character labeling operations for a character set.

    The following table describes the properties for a user-defined character set:

    Property Description

    Label Defines the label that a Labeler transformation applies todata that matches the character set.

    Standard Mode Enables a simple editing view that includes fields for the startrange and end range.

    Start Range Specifies the first character in a character range.

    End Range Specifies the last character in a character range. For a rangewith a single character, leave this field blank.

    Advanced Mode Enables an advanced editing view where you can manuallyenter character ranges using range characters and delimiter characters.

    Range Character Temporarily changes the symbol that signifies a character range. The range character reverts to the default character when you close the character set.

    Delimiter Character Temporarily changes the symbol that separates character ranges. The delimiter character reverts to the defaultcharacter when you close the character set.

    Classifier Models A classif ier model analyzes i nput s trings and determines the types of information that they conta in. You use a

    classifier model in a Classifier transformation.Use a classifier model when input strings contain significant amounts of data. For example, you can use a classifier model to identify the subject matter in a set of documents. You export the text from each document, and you store eachdocument as a separate field in a single data column. The Clas sifier transformation reads the data and class ifies thesubject matter in each field according to the labels defined in the classifier model.

    The classifier model contains the following columns:

    Data column

    A column that contains the words and phrases that are likely to exist i n the input data. The transformationcompares the input data with the data in this column.

    Label column

    A column that contains desc riptive labels tha t can define the information in the data. The transformation returns alabel from this column as output.

    The classifier model also contains compilation data that the Classifier transformation uses to calculate the correctinformation type for the input data.

    You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier modelobject. The column data and compilation data are stored in a file in the Informatica directory structure.

    Content Sets 9

  • 8/11/2019 DQ 951HF2 UserGuide En

    23/115

    Pattern Sets A pattern set conta ins expressions that identify data patterns in the output of a token labeling operation. You can usepattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports. Usepattern sets in Parser transformations that use pattern parsing mode.

    For example, you can configure a Parser transformation to use pattern sets that identify names and initials. Thistransformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. You canconfigure the Parser transformation to write names and initials in the output to separate ports.

    Pattern Set PropertiesConfigure properties that determine the patterns in a pattern set.

    The following table describes the property for a user-defined pattern set:

    Property Description

    Pattern Defines the patterns that the pattern parser searches for.You can enter multiple patterns for one pattern set. You canenter patterns constructed from a combination of wildcards,characters, and strings.

    Probabilistic Models A probabili stic model ident ifies data values by the types of information that they represent and by the position of thevalues in an input string.

    You use probabilistic models with the Labeler and Parser transformations.

    A probabil istic model contains the following columns:

    An input column that represents the data on the input port. You populate the column with sample data from theinput port. The model uses the sample data as reference data in parsing and labeling operations.

    One or more label columns that identify the types of information in each input stri ng. You add the label columns tothe model, and you assign labels to the data values in each string. Use the label columns to indicate the correctposition of the data values in the string.

    You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier modelobject. The column data and compilation data are stored in a file in the Informatica directory structure.

    The probabilistic model also contains compilation data that the transformations can use to calculate the correctinformation type for the input data. You update the model logic when you compile the model in the Developer tool.

    Regular ExpressionsIn the context of content sets, a regular expression is an expression that you can use in parsing and labelingoperations. Use regular expressions to identify one or more str ings in input data. You can use regular expressions inParser transformations that use token parsing mode. You can also use regular expressions i n Labeler transformationsthat use token labeling mode.

    Parser transformations use regular expres sions to match patterns in input data and parse all matching strings to oneor more outputs. For example, you can use a regular expression to identify al l email addresses in i nput data and parseeach email address component to a different output.

    Labeler transformations use regular expressions to match an input pattern and create a single label. Regular expressions that have multiple outputs do not generate multiple labels.

    10 Chapter 2: Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    24/115

    Regular Expression PropertiesConfigure properties that determine how a regular expression identifies and writes output strings.

    The following table describes the properties for a user-defined regular expression:

    Property Description

    Number of Outputs Defines the number of output ports that the regular expression writes.

    Regular Expression Defines a pattern that the Parser transformation uses tomatch strings.

    Test Expression Contains data that you enter to test the regular expression.As you type data in this field, the field highlights strings thatmatches the regular expression.

    Next Expression Moves to the next string that matches the regular expressionand changes the font of that string to bold.

    Previous Expression Moves to the previous string that matches the regular expression and changes the font of that string to bold.

    Token Sets A token set contains express ions that identify speci fic tokens. You can use token sets in Labeler transformations thatuse token labeling mode. You can also use token sets in Parser transformations that use token parsing mode.

    Use token sets to identify specifi c tokens as part of labeling and parsing operati ons. For example, you can use a tokenset to label all email addresses t hat use that use an "AccountName@DomainName" format. After labeling the tokens,you can use the Parser transformation to write email addresses to output ports that you specify.

    Token Set PropertiesConfigure properties that determine the labeling operations for a token set.

    The following table describes the properties for a user-defined character set:

    Property Token Set Mode Description

    Name N/A Defines the name of the tokenset.

    Description N/A Describes the token set.

    Token Set Options N/A Defines whether the token setuses regular expression mode or

    character mode.

    Label Regular Expression Defines the label that a Labeler transformation applies to datathat matches the token set.

    Regular Expression Regular Expression Defines a pattern that theLabeler transformation uses tomatch strings.

    Content Sets 11

  • 8/11/2019 DQ 951HF2 UserGuide En

    25/115

    Property Token Set Mode Description

    Test Expression Regular Expression Contains data that you enter totest the regular expression. Asyou type data in this field, the

    field highlights strings that matchthe regular expression.

    Next Expression Regular Expression Moves to the next string thatmatches the regular expressionand changes the font of thatstring to bold.

    Previous Expression Regular Expression Moves to the previous string thatmatches the regular expressionand changes the font of thatstring to bold.

    Label Character Defines the label that a Labeler transformation applies to data

    that matches the character set.Standard Mode Character Enables a simple editing view

    that includes fields for the startrange and end range.

    Start Range Character Specifies the first character in acharacter range.

    End Range Character Specifies the last character in acharacter range. For single-character ranges, leave this fieldblank.

    Advanced Mode Character Enables an advanced editing

    view where you can manuallyenter character ranges usingrange characters and delimiter characters.

    Range Character Character Temporarily changes the symbolthat signifies a character range.The range character reverts tothe default character when youclose the character set.

    Delimiter Character Character Temporarily changes the symbolthat separates character ranges.The delimiter character revertsto the default character whenyou close the character set.

    Creating a Content SetCreate content sets to group content expressions according to business requirements. You create content sets in theDeveloper tool.

    1. In the Object Explorer view, select the project or folder where you want to store the content set.

    12 Chapter 2: Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    26/115

    2. Click File > New > Content Set .

    3. Enter a name for the content set.

    4. Optionally, select Browse to change the Model repository location for the content set.

    5. Click Finish .

    Creating a Reusable Content ExpressionCreate reusable content expressions from within a content set. You can use these content expressions in Labeler transformations and Parser transformations.

    1. Open a content set in the editor and select the Content view.

    2. Select a content expression view.

    3. Click Add .

    4. Enter a name for the content expression.

    5. Optionally, enter a text description of the content expression.

    6. If you selected the Token Set expression view, select a token set mode.

    7. Click Next .

    8. Configure the content expression properties.

    9. Click Finish .

    Tip: You can create content expressions by copying them from another content set. Use the Copy To and PasteFrom options to create copies of existing content expressions. You can use the CTRL key to select multiple contentexpressions when using these options.

    Generating Reference Data from a Midstream ProfileYou can run a profile on mapping data to create a data source for a reference data obj ect. For example, run a profi le onthe object that you connect to a Labeler or Parser transformation. You can add the profile data to a probabilistic model.

    When you create a probabilistic model with data from the midstream profile, you customize the model for the mappingdata.

    Complete the following steps to run a midstream mapping profile and generate input data for a probabilistic model:

    1. Open the mapping that contains the transformation you will connect to the Labeler or Parser.

    2. Select a data object and click Profile Now .

    Select the Results tab in the profile, and review the profile results.

    3. Under Column Profiling, select the column you want to add to the probabilistic model.

    4. Under Details, select the option to Show Values.

    The editor displays the data values in the column you selected.

    Note: You can select all values in the column or a subset of values.

    5. If you want to add a subset of column values to a probabilistic model, follow these steps:a. Use the Shift or Ctrl keys to select one or multiple values from the editor.

    b. Right-click the values and select Send to > Export Results to File.

    6. If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies toFile .

    Content Sets 13

  • 8/11/2019 DQ 951HF2 UserGuide En

    27/115

    7. In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or on theDeveloper client machine.

    If you save the file on the client machine, enter a path to the file.

    You can use the file as a data source for the Label or Data column in the probabilistic model.

    14 Chapter 2: Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    28/115

    CH A P T E R 3

    Classifier ModelsThis chapter includes the following topics:

    Classifier Models Overview, 15

    Classifier Model Structure, 16

    Classifier Model Reference Data, 16

    Classifier Model Label Data, 18

    Classifier Scores, 19

    Classifier Model Views, 19

    Classifier Model Filters, 20

    Creating a Classifier Model from a Data Object, 21

    Copy and Paste Operations, 22

    Classifier Models Overview A clas sifier model is a r eference data object. Use a class ifier model to analyze long text strings that conta in mul tiplevalues. A classifi er model identifies the most common type of information in each string.

    You add a classi fier model to a Classifier transformation. The tra nsformation searches for common values betweenthe classifier mod el data and the data in e ach input row. The transformation uses the common values to categorize thetype of information that each row represents.

    You use a classifier model when the input data has the following characteristics:

    The input dat a contains text. Classifier mo dels apply natural language processes to text data to identify the typesof information in the text. Natural language processes detect relevant words in the input string. Natural languageprocesses dis regard words that are not releva nt.

    The input dat a strings contain multiple v alues. For example, you can create a data column that contains thecontents of an email message in each fie ld.

    The Classifier transformation reads string datatypes. The transformation imposes no limit on the length of the inputstrings.

    You compile classifier models in the Developer tool. When you compile a model, you create associations betweensimilar data values in the model. The Classifier trans formation uses the compiled data to search for information in theinput data.

    15

  • 8/11/2019 DQ 951HF2 UserGuide En

    29/115

    Classifier Transformation ExampleYou can use a classifier model and a Classifier transformation to categorize email messages based on the text thatthey contain.

    For example, you work in a customer support center, and you review the email messages that the organization

    receives from customers. The organization has customers in many countries, and it receives emails in manylanguages. You want to sort the emails by language, so that you can send each email to the center that can best replyto the customer.

    You complete the following steps to sort the emails:

    1. You write the email messages to a single file or a database table.

    2. You create a classifier model that contains sample text for each language.

    Note: You can use sample data from the email messages data as source data for the model. Copy the emailmessage text to a file or database table, and create a data source from the fi le or table in the Model reposi tory.

    3. You add the classifier model to a Classifier transformation.

    4. You add the transformation to a mapping, and you connect the transformation ports to the data source and datatargets. You create a data target for each language.

    When you run the mapping, the Classifier transformation analyzes the email messages and writes the email text to thecorrect data target. You can share the data target with the team members in the appropriate support center.

    Classifier Model Structure A classif ier model contains a column of reference data values and a column that specif ies a label for each row of reference data values. When a Classifier transformation compares the input data and the model data, thetransformation returns the label that most closely describes the input data.

    A classi fier model also contains compi lation data. The Classi fier trans formation uses the compilati on data to calculate

    similarities between the reference data and the input data. When you compile a model, you create or update thecompilation data.

    The Model repository stores the classifier model object. When you create or update a classifier model, you write thereference data and the compiled metadata to a fil e on the Developer tool machine. The fil e is read-only. You can readthe file path in the classifier model in the Developer tool.

    Compiling a Classifier ModelEach time you edit a label value or reference data value in a cl assifier model, you must compi le the model. When youcompile the model, you update the compilation data in the model.

    u To update the compilation data, open the model in the Developer tool and click Compile .

    Classifier Model Reference Data A classif ier model contains a r eference data column that can include sentences, paragraphs, or pages of text. Thereference data represents the different types of text input that a Classifier trans formation can read in a mapping. When

    16 Chapter 3: Classifier Models

  • 8/11/2019 DQ 951HF2 UserGuide En

    30/115

    you create a model, verify that the reference data includes the types of text that you expect to find when you run themapping.

    You can use the mapping source data to c reate a classifier model. Select a sample of the source data and copy thedata sample to the model.

    Consider the following rules and guidelines when you work with classifier model reference data:

    A reference data f ield can be of any length . You can enter pages of text into each data f ield .

    You import reference data from a data object.

    You cannot edit reference data values. However, you can delete a data row.

    Adding Data and Label Values to a Classifier ModelUse a data source to append data to a classifier model. You can add data values and label values.

    1. Open the content set that contains the model.

    2. Select the model name and click Edit .

    3. Click Append Data .

    The Classifier Model wizard opens.

    4. Browse the Model repository and select the data object that you want to use. Click Next .

    Note: Do not select a social media data object as a data source.

    5. Review the columns on the data object, and select a column to add as a data column or label column for themodel. You can add a reference data column and a label column in the same operation.

    To use a data source column as the reference data column in the model, select the column name and clickData .

    You can select multiple data columns. The class ifier model merges the contents of the columns you select intoa single column.

    To use a data source column as the label column for the model, select the column name and click Label .

    Click Next .

    6. Select the number of rows to copy from the data source.

    Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from thestart of the data set.

    7. Click Finish and save the model.

    After you append the data, verify that t he data rows you added include label values.

    Deleting Data Values from a Classifier ModelYou can delete data values in the default view and in the detailed view of a classifier model. To delete all data values,use the default view.

    1. Open the classifier model in the Developer tool. To open the model, select the model name in the content set andclick Edit .

    2. Select the row that contains the data you want to delete.

    You can select a single row, multiple rows, or all rows.

    3. Click Delete .

    Classifier Model Reference Data

  • 8/11/2019 DQ 951HF2 UserGuide En

    31/115

  • 8/11/2019 DQ 951HF2 UserGuide En

    32/115

    4. In the Manage Labels dialog box, select one or more label to delete.

    You can select multiple labels.

    5. Click Delete .

    6. Click OK to close the dialog box.

    Classifier Scores A Classif ier transformation compares each row of input data with every row of reference data in a classifier model. Thetransformation calculates a score for each comparison. The scores represent the degrees of similarity between theinput row and the reference data rows.

    When you run a mapping that contains a Classifier transformation, the mapping returns the label that identifies thereference data row with the highest score. The score range is 0 through 1. A high score indicates a strong matchbetween the input data and the model data.

    Review the classifier scores to verify that the label output accurately describes each row of input data. You can alsoreview the scores to verify that the classifier model is appropriate to the input data. If the transformation outputcontains a large percentage of low scores, the classifier model might be inappropriate. To improve the comparisons,compile the model again. If the compiled model does not improve the scores, replace the model in thetransformation.

    Classifier Model ViewsYou can use the default view and the detailed v iew to update the data in a classifi er model. The default view displaysthe label values and data values in a table. The detailed view displays the data values in a series of text boxes.

    Use the default view to review and update the label s on each row. You can select one row, multiple rows, and all r ows.The default view can display approximately 100 characters of row data. The detailed view can display all data in eachrow. Click the Classifier Model Data option to toggle between the views. Use the detailed view to review and updatethe data in a single row.

    You can add data, filter data rows, and add l abels to rows in each view. You can search the data values i n a single rowin the detailed view.

    Classifier Scores 1

  • 8/11/2019 DQ 951HF2 UserGuide En

    33/115

    The following image show the default view of a classifier model that contains data for language classification:

    Classifier Model FiltersYou can apply filters to classifier model data in the default and detailed views.

    You can use filters to perform the following tasks:

    Find data values that do not have an associated label. Use the label options to filter t he data rows that display in aclassifier model. If a data row does not use a label, add a label to the row.

    Find data values in reference data rows . Use the filter in the default vi ew to find data values in the reference data.Verify that the reference data overlaps with the source data in a mapping.

    Find a data value within a reference data row. Use the filter in the detailed view when you need to verify that areference data row contains a data value. A data row can contain a large quantity of data values.

    Finding Values in Reference DataUse the filter in the default view to verify that reference data rows contain the data values you expect.

    1. Open the content set that contains the model.

    2. Select the model name and click Edit .

    3. Type a text value in the filter field.

    The Developer tool displays the data rows that contain the filter text.

    20 Chapter 3: Classifier Models

  • 8/11/2019 DQ 951HF2 UserGuide En

    34/115

    Finding Data Rows with no LabelsClear all label options to display the reference data rows that do not have a label.

    When you open a classifier model, the Developer tool displays all rows and labels by default.

    1. Open the content set that contains the model.

    2. Select the model name and click Edit .

    3. In the default view, select or clear a label.

    When you select a label, the model displays the data strings that you assoc iated with the label. When you clear alabel, the model hides the data strings that you associated with the label.

    4. To verify that all data strings use a label, clear all the label values. The model displays any string that does not usea label.

    5. Click a label value to add the label to the data string.

    Finding a Value in a Data RowUse the filter in the detailed view to search for a data value in a single data row.

    1. Open the content set that contains the model.

    2. Select the model name and click Edit .

    3. Select the detailed view.

    4. In the default view, enter a value in the filter field.

    The model displays the data rows that contain the value.

    5. Select a data row to search.

    6. Type the search value in the search field below the data row.

    The model highlights the first instance of the value in the row.

    7. Click the Down arrow to find the next instance of the value in the row.

    Use the Up and Down arrows to move through the values in the data row.

    Creating a Classifier Model from a Data ObjectUse a data object as the source for classifier model data.

    A class ifier model performs optimally when you use the input data from the Class ifier transformation as the source for the model reference data. For example, you can run a profile on the transformation object in the mapping. Create adata object from the profile results.

    1. In Object Explorer, open or create a content set.

    2. Select the Content view.3. Select Classifier Models , and click Add .

    The Classifier Model wizard opens.

    4. Enter a name for the classifier model.

    Optionally, enter a text description of the model.

    5. Browse the Model repository and select the data object that contains the reference data.

    Click Next.

    Creating a Classifier Model from a Data Objec

  • 8/11/2019 DQ 951HF2 UserGuide En

    35/115

    6. Review the columns on the data object, and select a column to add as reference data values or label values for the model.

    To add a data column as reference data, select the column name and click Data .

    To use a data column as a source for label values, select the column name and click Label .

    Click Next.7. Select the number of rows to copy from the data source.

    Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from thestart of the data set.

    8. Click Finish and save the model.

    After you create the c lassi fier model , compile the model.

    Copy and Paste OperationsYou can copy a classifier model from one content set to another in a Model repository. Copy a classifier model to shareresources with other Developer tool users.

    You can copy a model to another content set, or you can import a model to the current content set. You can importmultiple models from multiple content sets in the repository in a single operation.

    When you copy a model, the Content Management Service creates a copy of the model data file on the servicemachine. Each model uses a different data file.

    Copying a Classifier Model to Another Content SetYou can copy a classifier model from one content set to another in a Model repository. When you copy a classifier model, you specify the model object and the source and destination content sets.

    1. Open the content set that contains the classifier model.2. Select a classifier model and click Copy To .

    3. Browse the Model repository and select a content set.

    You can copy the classifier model to a content set in the current project or another project.

    4. Click OK .

    The Developer tool copies the classifier model to the selected content set.

    Importing a Classifier Model from Another Content SetYou can import a classifier model from one content set to another in a Model repository. When you import a classifier model, you specify one or more model objects and the source and destination content sets.

    1. Open the content set to contain the classifier model.

    2. Select a classifier model and click Paste From .

    3. Browse the Model repository and select a classifier model.

    You can paste the classifier model from a content set in the current pr