Informatica 8.1 -Advanced Features Training v1

download Informatica 8.1 -Advanced Features Training v1

of 89

Transcript of Informatica 8.1 -Advanced Features Training v1

  • AgendaPower Center 8 Framework and Services Power Center Options Workflow RecoveryMetadata ManagerData AnalyzerError HandlingPower Center Upgrade

  • Power Center 8 Framework and ServicesNew services framework Domains, Nodes, and ServicesAdmin ConsoleApplication ServicesCore Services

  • PowerCenter 8.x Architecture

    Center Of Excellence-Data Warehousing

  • Power Center DomainsA domain is the primary unit for management and administration of services in PowerCenter. PowerCenter has a service-oriented architecture that provides the ability to scale services and share resources across multiple machines. PowerCenter provides the PowerCenter domain to support the administration of the PowerCenter Services. A domain contains the following components: One or More nodesService ManagerApplication Services

    Domain with three nodes

  • Domain WindowOverview tab. Displays an overview grid with a list of all services and the status of the related service processes in the domainProperties tab. View or modify domain resilience properties or the LDAP authentication module configuration Resources tab. View available resources for each node in the domain. Permissions tab. View or modify user permission on the domain Log Management. Purge and export service logs Shutdown. Shut down the domain to perform administrative tasks on the domain Legend link. Click the Legend link to view information about icons used in the overview grid

    Center Of Excellence-Data Warehousing

  • Domain advanced featuresBackup can be taken.Domain and all services can be shut down from the console.Can filter domain log events based on category.Move a Node to another physical machine.Can get the domain activity monitoring report.Users can assign read-only privilege The Integration services can recover terminated tasks while a workflow is running.

    Center Of Excellence-Data Warehousing

  • Administration ConsoleAdministration Console is a web application that you use to manage a PowerCenter domainAdministration Console is used to perform administrative tasks such as:Managing logs, User accounts, Domain objects (services, nodes & licenses)Manage Application services-Manage all application services in the domain, such as the Integration Service and the Repository Service. Configure Nodes- Configure node properties, such as the backup directory and resources. You can also shut down and restart nodes. Manage domain objects-Create folders to organize domain objects and to manage security by setting permissions for domain objects. View and edit domain object properties -You can view and edit properties for all objects in the domain, including the domain object. View Log Events-Use the Log Viewer to view domain, Integration Service, SAP BW Service, Web Services Hub, and Repository Service log events.

    Center Of Excellence-Data Warehousing

  • NodesA node is a logical representation of a physical machine in the domain When you install PowerCenter Services on a machine, you add the machine to the domain as a node. You can add multiple nodes to a domainEach node in the domain runs a Service Manager that manages domain operations on that node A node can be a gateway node or a worker node depending on the service.Gateway node-A gateway node is any node you configure to serve as a gateway for the domain, that is the master gateway.Worker node-A worker node is any node not configured to serve as a gateway. It can run application services, but it cannot serve as a gateway

    Center Of Excellence-Data Warehousing

  • Node windowNode status. View the status of the node Properties tab. View or modify node properties, such as the repository backup directory or range of port numbers for the processes that run on the node Processes tab. View the status of processes configured to run on the node Resources tab. View or modify resources assigned to the node Permission tab. View or modify user permission on the node.

    Center Of Excellence-Data Warehousing

  • Application ServicesApplication services are a group of services that represent PowerCenter server-based functionality. When you configure an application service, you designate the node where it runs. Type of Application Services:Repository ServiceIntegration ServiceSAP BW ServiceWeb Services Hub

    Center Of Excellence-Data Warehousing

  • Repository ServiceService and service process status-View the status of the service and the service processes for each node Action list-Manage the contents of the repository and perform other administrative tasks Properties tab-View and edit service process properties on each assigned node Connection tab-View and terminate repository connections repositoryLocks tab-View the object locks in the repositoryPermission tab-View or modify user permission on the Repository Service Logs tab-view log event for the serviceThe Repository Service is an application service that manages the repository.It retrieves, inserts, and updates metadata in the repository database tables

    Center Of Excellence-Data Warehousing

  • Integration ServiceThe Integration Service is an application service that runs data integration sessions and workflows Service and service process status-View the status of the service and the service processes for each node Properties tab-View or modify integration service propertiesAssociated Repositort tab-View the Repository Service associated with the Integration serviceProcesses tab-View or modify the service processes properties on each assigned nodePermission tab-View or modify user permission on the Integration Service Logs tab-view log events for the service

    Center Of Excellence-Data Warehousing

  • SAP BW ServiceThe SAP BW Service is an application service that listens for RFC requests from SAP BW and initiates workflows to extract from or load to SAP BW Service and service process status-View the status of the service and the service processes for each node Properties tab-manage general properties and node assignmentsAssociated Integration Service tab-View or modify Integration Service associated with the SAP BW ServiceProcesses-View or modify the directory of the BW Param parameter file. Permission-View or modify user permission on the SAP BW Service. Logs tab-View log events for the service

    Center Of Excellence-Data Warehousing

  • Web Services HubThe Web Services Hub is a web service gateway for external clients. It processes SOAP requests from web service clients that want to access PowerCenter functionality through web services. Web service clients access the Integration Service and Repository Service through the Web Services Hub Properties tab-View or modify Web Services Hub propertiesAssociated Rpeository-View the Repository Services associated with the Web Services Hub Permission-View or modify user permission on the Web Services Hub.Logs tab-View log events for the service

    Center Of Excellence-Data Warehousing

  • Core ServicesThe PowerCenter Architecture has a new set of Core Services which comprises of:Log ServiceGateway ServiceAdministration ServiceConfiguration ServiceAuthentication Service and Domain Service

    Center Of Excellence-Data Warehousing

  • Log ServiceThe Service Manager on the master gateway node controls the Log Manager When the Log Manager receives log events, it generates log event filesThe Log Manager creates the following types of log files: Log events files. Stores log events in binary format Guaranteed Message Delivery files. Stores Service Manager and application log events

    Center Of Excellence-Data Warehousing

  • Gateway ServiceThe gateway service receives service requests from clients and route them to the appropriate service and node. In the Administration Console, you designate nodes that can serve as the gateway The master gateway node maintains a connection to the domain configuration database

    Center Of Excellence-Data Warehousing

  • Administration ServiceAdministration console has the Administration tab to perform the Administration Service such as:Manage Users. You can create, view, and modify domain users User Activity Monitoring. Generate a user activity report based on user name and time period Domain Activity Monitoring. Generate a domain activity report for a specific time period

    Center Of Excellence-Data Warehousing

  • Configuration ServicePowerCenter has a configuration service where you can configure session and miscellaneous properties such as whether to enforce code page compatibility or not

    Center Of Excellence-Data Warehousing

  • Authentication ServicePowerCenter has an Authentication Service to validate the users and access.It uses the following methods to authenticate users:PowerCenter default authentication Lightweight Directory Access Protocol (LDAP) authentication You can create users and maintain passwords in the repository with PowerCenter default authentication. The security module verifies users against these user names and passwordsLDAP defines a network protocol for accessing a directory service. If you use LDAP to authenticate users, the repository security module passes a user login to the external directory for authentication

    Center Of Excellence-Data Warehousing

  • Domain ServicePowerCenter introduces a service-oriented architecture that provides the ability to scale services and share resources across multiple machines PowerCenter introduces a domain, which serves as the primary unit of administration for the PowerCenter environment A domain is a collection of nodes and services in the PowerCenter environment It is the primary unit of administration

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

  • PowerCenter OptionsPowerCenter 8 PackagingTeam based DevelopmentData Profiling OptionData Cleanse and Match OptionPartitioningHigh AvailabilityGridsPushdown OptimizationUnstructured DataData FederationReal Time Option

    Center Of Excellence-Data Warehousing

  • PowerCenter 8 Packaging

    Center Of Excellence-Data Warehousing

  • Team Based DevelopmentVersion control can be enabled using team based development optionTeam based development is used when different roles has been assigned to each team member and after completion of the activity objects are merged in one common placeYou can enable version control for a new or existing repository A versioned repository can store multiple versions of objects you can maintain multiple versions of an object, control development of the object, and track changes You can also use labels and deployment groups to associate groups of objects and copy them from one repository to another By enabling version control for a repository, the repository assigns all versioned objects version number 1, and each object has an active status

    Center Of Excellence-Data Warehousing

  • Version Control of a RepositoryEnsure that all users disconnect from the repository.In the Administration Console, change the operating mode of the Repository Service to exclusive. Enable the Repository Service.Select the Repository Service in the Navigator. In the general properties, click Edit. Select Enable Version Control. The Repository Authentication window appears. Enter the user name and password for a repository user who has the necessary administrative privileges. The user must be the Administrator user or have the Super User privilege. Change the operating mode of the Repository Service to normal

    Center Of Excellence-Data Warehousing

  • Data Profiling OptionData profiling is a technique used to analyze source data You can profile source data to suggest candidate keys, detect data patterns, evaluate join criteria, and determine information, such as implicit data typeData profiling option can be used in the following situations:During Mapping DevelopmentDuring production to maintain data qualityThe Designer provides a Profile Manager and Profile Wizard to create, upgrade & run a data profileYou can create 2 types of data profile:Auto Profile-Contains a predefined set of functions for profiling source data Custom profile. A data profile you define with the functions you need to profile source data

    Center Of Excellence-Data Warehousing

  • Data Profiling OptionSource-level FunctionsFunctional Dependencies Analysis function Candidate key and Redundancy column analysisBusiness Rule validationColumn-level FunctionsAggregate FunctionDomain ValidationInter-source FunctionsInter-source Structure Inference functionOrphan analysis

    Dashboard and reporting capability

    Center Of Excellence-Data Warehousing

  • Creating an Auto ProfileWhen you create an auto profile, the Designer creates a data profile with the following functions:Aggregate functions Candidate Key EvaluationDistinct Value Count Domain Inference Functional Dependency Analysis Redundancy Evaluation Row Count

    Center Of Excellence-Data Warehousing

  • Creating a Custom profileTo Create a Custom profile complete the following steps:Enter a data profile name and optionally add a description Add sources to the data profile Add, edit, or delete a profile function and enable session configuration Configure profile functions Configure the profile session if you enable session configuration

    Center Of Excellence-Data Warehousing

  • Profile ManagerYou can run a profile session from the Profile Manager to quickly profile sources during mapping development. A session you run from the Profile Manager is called an interactive session

    Center Of Excellence-Data Warehousing

  • Data Cleanse and Match OptionData Cleansing and ParsingStandardizes, validates, enhances, and corrects name/address and corporate dataProvides address validation for every country worldwideData MatchingIdentifies relationships between data records for de-duplication or group-based processingProcesses multiple sets of business rules concurrently and Uses householding techniques to identify members of common households or corporationsFull integration across entire Data Integration platformLeverages PowerCenters performance and scalabilitySpeeds integration process by allowing data quality solutions to be executed from within PowerCenter mappings and workflows

    Center Of Excellence-Data Warehousing

  • Data Cleanse and Match Option ContdThe transformation language includes a group of functions to eliminate data errors You can complete the following tasks with data cleansing functions: Test input values.Convert the data type of an input value.Trim string values.Replace characters in a string.Encode strings.Match patterns in regular expressionsA Data cleansing transformation is build using the Custom TransformationYou can create your own procedure or program using C, C++ or Java for the data cleansing logic and call that in the Custom transformation.

    Center Of Excellence-Data Warehousing

  • PartitioningPartitioning is to add partition points to increase the number of transformation threads and increase session performance

    Center Of Excellence-Data Warehousing

  • Pipeline PartitioningSet partition attributes including partition points, the number of partitions, and the partition typesYou can enable the Integration Service to set partitioning at run time by enabling dynamic partitioningConfigure memory requirements and cache directories for each transformation The Integration Service evaluates mapping variables for each partition in a target load order group For Multiple partitions the Workflow Manager verifies that the Integration Service can maintain data consistency in the session using the partitions Add or Edit partition points in the session properties

    Center Of Excellence-Data Warehousing

  • Partitioning AttributesPartition points-Partition points mark thread boundaries and divide the pipeline into stages Number of partitions-set the number of partitions at any partition point Partition Types-The partition type controls how the Integration Service distributes data among partitions at partition points

    Center Of Excellence-Data Warehousing

  • Types of PartitionDatabase partitioning-reads partitioned data from the corresponding nodes in the database Hash auto-keys-to group rows of data among partitions. All grouped or sorted ports are used as a compound partition key Hash user keys-A hash function is used to group rows of data among partitionsKey range- distributes rows of data based on a port or set of ports that you define as the partition key Pass-through-processes data without redistributing rows among partitionsRound-robin-each partition processes approximately the same number of rows

    Center Of Excellence-Data Warehousing

  • High Availability

    Center Of Excellence-Data Warehousing

  • High AvailabilityHigh availability is a PowerCenter option that eliminates a single point of failure in a domain and provides minimal service interruption in the event of failureDifferent components of High Availability are:Resilience-The ability of PowerCenter services to tolerate transient network failures until either the resilience timeout expires or the external system failure is fixed Failover. The migration of a service or task to another node when the node running the service process becomes unavailable Recovery. The automatic completion of tasks after a service is interrupted. Automatic recovery is available for Integration Service and Repository Service tasksyou can achieve a greater degree of availability when you configure more than one node to serve as a gateway and when you configure backup nodes for application services.

    Center Of Excellence-Data Warehousing

  • Achieving High AvailabilityWhen you design a highly available PowerCenter environment, you can configure the nodes and services to minimize failover or to optimize performanceMinimize service failover Configure two nodes as gateway. Configure different primary nodes for each application service Optimizing Performance Configure gateway nodes on machines that are dedicated to serve as gateway. Configure backup nodes for the Integration Service and the Repository Service

    Center Of Excellence-Data Warehousing

  • Grid Option13424

    Center Of Excellence-Data Warehousing

  • Features in Enterprise Grid OptionGrid is a group of nodes in a domain. Sophisticated distribution of sessions across a server grid as well as execution of a session across multiple nodesHeterogeneous grid can be created - both Unix and Windows machines in the same gridThe Load Balancer is a component of the Integration Service. It dispatches tasks to nodes in a grid, matching resource requirements. Distributes to available nodes only

    Dynamic partitioningIntegration Service determines the number of partitions to create at run timeScales the number of session partitions based on factors, such as source database partitions or the number of nodes in a gridUseful if volume of data increases over time, or you add more CPUsSession on grid: Distributes session partitions to different nodesWorkflow on grid: Distributes workflow tasks to different nodesRetry of sessions and fault tolerance for automatic failoverImproved recovery so that a session can be handed off to a new node and can be automatically restartedGRID management is dynamic. GRID nodes can be added and removed dynamically and seamlessly addressing to the scalability requirements.

    Center Of Excellence-Data Warehousing

  • GridsA Grid is an alias assigned to a group of nodes that run sessions and workflows When you run a workflow or session on a grid, you distribute the processing across multiple nodes in the grid In the Administration Console, you assign nodes to the grid. Properties Tab-View or modify node assignments to a grid Permissions tab-View or modify user permission on the grid When you run an Integration Service on a grid, a master service process runs on one node and worker service processes run on the remaining nodes in the grid The master service process runs the workflow and workflow tasks, and it distributes the Session, Command, and predefined Event-Wait tasks to itself and other nodes

    Center Of Excellence-Data Warehousing

  • Running a Workflow on a GridIntegration Service designates one service process as the master service process, and the service processes on other nodes as worker service processes The master service process receives requests, runs the workflow and workflow tasks including the Scheduler, and communicates with worker service processes on other nodes The master service process also runs the Load Balancer, which dispatches tasks to nodes in the grid Service Process Distribution for a workflow running on a grid

    Center Of Excellence-Data Warehousing

  • Running a Session on a GridThe master service process runs the workflow and workflow tasks, including the Scheduler The Load Balancer distributes Command tasks Load Balancer dispatches a Session task, it distributes the session threads to separate DTM processesThe master service process starts a temporary preparer DTM process, after the preparer DTM process prepares the session, it acts as the master DTM process, which monitors the DTM processes running on other nodes. Service Process and DTM Distribution for a Session Running on a Grid

    Center Of Excellence-Data Warehousing

  • Pushdown OptimizationIntegration Service executes SQL against the source or target database instead of processing the transformation logic within the Integration Service.

    When sources and targets are on the same database, pushdown optimization avoids having to pull the data into PowerCenter and then pushing it back out again. This can be useful when you move data from a staging area to a data warehouse that exist on the same database.

    The Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic and push it to the database. Converts the expression in the transformation by determining equivalent operators, variables, and functions in the database.

    If there is no equivalent operator, variable, or function, the Integration Service processes the transformation logic.

    For example, the Integration Service translates the aggregate function, STDDEV() to STDDEV_SAMP() on Teradata and STDEV() on Microsoft SQL Server. However, no database supports the aggregate function, FIRST(), so the Integration Service processes any transformation that uses the FIRST() function.

    Center Of Excellence-Data Warehousing

  • Pushdown Optimization ConfigurationYou can Configure in Performance settings on the Properties tab in the session propertiesDifferent Pushdown optimization Options for a session:Using source-side pushdown optimization-The Integration Service pushes as much transformation logic as possible to the source database Using target-side pushdown optimization-The Integration Service pushes as much transformation logic as possible to the target database Using full pushdown optimization-The Integration Service pushes as much transformation logic as possible to both source and target databases

    Center Of Excellence-Data Warehousing

  • Pushdown Optimization Configuration

    $$PushdownConfig mapping parameter lets you run the same session using the different types of pushdown optimization.

    For example, you might want to use full pushdown optimization during the day, but use no pushdown optimization from midnight until 2 a.m. when the database is scheduled for routine maintenance.

    OR, you might want to use partial pushdown optimization during the peak hours of the day, but use full pushdown optimization from midnight until 2 a.m. when activity is low.

    Use the Pushdown Optimization Viewer to preview the SQL statements

    It increases performance by Intelligently leveraging the PowerCenter data server and a relational database engine.Pushing transformation logic to the source or target database. Reducing movement of data (when source and target are on the same database)

    Center Of Excellence-Data Warehousing

  • Unstructured Data OptionUnstructured/ Semi-Structured DataParsing DesignerMetadataOffice DocumentsNativeIndustry StandardsTemplate Libraries

    Center Of Excellence-Data Warehousing

  • Unstructured DataUsing Informatica Unstructured Data option, organizations can:Define complex data transformations without writing codeImmediately deploy and reuse transformations across enterprise software infrastructure, preserving investment and promoting loosely-coupled, service-oriented integrationUnstructured, semi-structured, and unstructured data access Microsoft Word, Excel, PowerPoint, PDF, WordPerfect, Star Office, ASCII reports, HTML, undocumented binaries, RPG, ANSIIndustry-specific formats such as HL7, ACORD, FIXML, SWIFT, PL1, MVR, ASTM, EDI-X12, EDIFACT, XML standardsLegacy formats include COBOL, and legacy reports (positional, non positional, variant files, and undocumented binaries)Leads the industry in platform coverage, scalability, and throughput for complex data transformations.

    Center Of Excellence-Data Warehousing

  • Unstructured Data Option FeaturesOne Codeless Data Transformation Designer for all Types of DataVisual data transformation designer for easy management of complex transformations of structured, semi-structured, and unstructured data. Wizard-driven and Eclipse-Ready.Integrated with the Eclipse development framework. Guided mode - context sensitive help in Intelliscript windowXSD editor integrated with CM studioExample-driven transformationExample-driven transformation allows users to define, test, and debug a transformation using a visual mark-and-map process directly on a sample of the data source.Internationalization supportLocalized for German, Japanese and FrenchAvailable in the form of pre-integrated agents for its supported environments including open source platforms, or as a callable software component from any C/C++, Java, or .NET application, on virtually all major distributed hardware/OS platforms.

    Center Of Excellence-Data Warehousing

  • ComponentsTop-Level ComponentsParser - Converts source documents, which can be in any format, to XML.Serializer - Converts XML documents to output documents, which can be in any format.Transformer - A component that modifies data. Mapper - Converts XML documents to a different XML structure or schema.Nested Components Formats - Define the overall format of documents, such as the delimiters that ContentMaster should use to interpret the documents.Document processors - Operate on a document as a whole, performing preliminary conversions before parsing, or final operations after serializing. Anchors - Define the data in a source document, which a parser should process and extract. The anchors specify how a parser should search for the data, and where it should store the data that it extracts.Data Holders - Data holders are the XML elements, XML attributes, and variables that transformations use for data storage. The elements and attributes are defined in XSD schemas. ContentMaster uses XSD to define data holders, to help it process XML input, and to help it construct valid XML output.

    Center Of Excellence-Data Warehousing

  • Data FederationData Federation integrates PowerCenter and Composite Information Server to give you access to multiple, disparate data sourcesPowerCenter Data Federation Option provides Enterprise Information Integration (EII) capabilitiesAllows you to combine these data sources into a virtual database layer for use by applications and front-end reporting toolsPowerCenter integrates with the following components of Composite Information Server to provide federated data access:Composite Server-Core runtime environment that lets users access data sources through JDBC and ODBC Composite JDBC-Lets clients access the Composite Server with a JDBC driver Composite ODBC-Lets clients access the Composite Server with an ODBC driver

    Center Of Excellence-Data Warehousing

  • Data Federation Options ContdRead data from a Composite virtual database-Useful for data warehouse prototyping and application migration prototyping

    Extend Composite data sources by writing data to Composite Information Server-Useful for populating a virtual database with data from any supported PowerCenter source through ODBC or web services

    Profile Composite virtual database data-Useful for verifying assumptions about your source data

    View virtual database data in Data Analyzer-Useful for obtaining a single view of the customer

    Center Of Excellence-Data Warehousing

  • Composite Data ServicesComposite data services are tabular data and procedures that you publish as a relational schemaThere are two types of data services: Composite databases-A Composite database is a virtual database on the Composite Server. It can contain tables and views from several data sources Web services-You publish data sources, views, and web services to Composite data services to make them available to client applicationsPowerCenter can access Composite data services through ODBC or SOAP. Data Analyzer can access Composite data services through JDBC Composite Information Server, PowerCenter and Data Analyzer Integration

    Center Of Excellence-Data Warehousing

  • Real-time DataReal-time data processing is on-demand processing of data from operational data sources, databases, and data warehousesReal-time data can be processed by configuring the latency for a session or workflow according to the time-value of the dataLatency is the time from when source data changes on a source to the time when the a workflow or session extracts and loads the data to a target If you have the Real-time option you can configure the session for flush latency.Flush latency determines when the Integration Service commits real-time data to target

    Center Of Excellence-Data Warehousing

  • Real-time ProcessingReal-time option with flush can be used to process the following types of real-time data:Messages and message queues. You can read from messages and message queues and write to messages, messaging applications, and message queues Web service messages. Receive a message from a web service client through the Web Services Hub, transform the data, and load the data to a target or send a message back to a web service client Changed source data. Extract changed data in real time from a source table using the PowerExchange Listener and write data to a target.

    Center Of Excellence-Data Warehousing

  • Configuring Real-time sessionsConfigure the following reader and flush latency properties:Reader session conditions. The Integration Service stops reading from a source when it reaches the reader session conditions Flush latency. The Integration Service commits the messages to the target when it reaches the flush latency interval Commit type. You can configure a source-based or target-based commit type for real-time sessions Message recovery. You can enable recovery on a real-time session to recover read messages from a failed session

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

  • Workflow RecoveryWorkflow Recovery PrinciplesTask Recovery StrategyWorkflow Recovery OptionsState of OperationResume SessionsRecovery using the Command line

    Center Of Excellence-Data Warehousing

  • Workflow Recovery PrinciplesWorkflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption You can recover a workflow if the Integration Service can access the workflow state of operation It includes the status of tasks in the workflow and workflow variable valuesYou can configure the workflow to:Enable Recovery- When you enable a workflow for recovery, the Integration Service saves the workflow state of operation in a shared location. You can recover the workflow if it terminates, stops, or aborts Suspend- When you configure a workflow to suspend on error, the Integration Service stores the workflow state of operation in memory. You can recover the suspended workflow if a task fails

    Center Of Excellence-Data Warehousing

  • Task Recovery StrategyEach task in the workflow has a recovery strategy:Restart task. When the Integration Service recovers a workflow, it restarts each recoverable task that is configured with a restart strategy. You can configure Session and Command tasks with a restart recovery strategy.Fail task and continue workflow. When the Integration Service recovers a workflow, it does not recover the task. The task status becomes failed, and the Integration Service continues running the workflow Resume from the last checkpoint. The Integration Service recovers a stopped, aborted, or terminated session from the last checkpoint. You can configure a Session task with a resume strategy.

    Center Of Excellence-Data Warehousing

  • Workflow Recovery OptionsTo configure a workflow for recovery, you must enable the workflow for recovery or configure the workflow to suspend on task error You can recover a workflow if it stops, aborts, terminates, or suspends

    Center Of Excellence-Data Warehousing

  • State of operationWhen you recover a workflow or session, the Integration Service restores the workflow or session state of operation to determine where to begin recovery processingThe Integration Service stores the workflow state of operation in the shared location, $PMStorageDir Workflow state of operation includes the following information:Active service requests Completed and running task statusWorkflow variable valuesSession state of operation includes the following information:SourceTransformationRelational Target recovery data

    Center Of Excellence-Data Warehousing

  • Resuming SessionsTypes of Recovery:Incremental-The Integration Service starts processing data at the point of interruption. It does not read or transform rows that it processed before the interruption Full-The Integration Service reads all source rows again and performs all transformation logic if it cannot perform incremental recovery. The Integration Service begins writing to the target at the last commit point. If any session component requires full recovery, the Integration Service performs full recovery on the session. When you configure session recovery to resume from the last checkpoint, the Integration Service creates checkpoints in $PMStorageDir to determine where to start processing session recovery.

    Center Of Excellence-Data Warehousing

  • Recovery using the command linepmcmd which is the command line has a new -recovery command option that works with the starttask and startworkflow commands. If the task you start is a session, specify the -recovery option to run the session based on the configured recovery strategyResumeWorkflow-Resumes and suspends workflowsResumeWorklet-Resumes a suspended worklet

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

  • Metadata ManagerIntroductionMetadata reportingMetadata BrowserData LineageWhere-used analysis

    Center Of Excellence-Data Warehousing

  • IntroductionMetadata Manager is a metadata management tool that you can use to browse and analyze metadata from disparate metadata repositories Metadata Manager uses Data Analyzer functionality Use the embedded Data Analyzer features to design, develop, and deploy metadata reports and dashboards Metadata Manager uses workflows to extract metadata from source repositories and load it into a centralized metadata warehouse called the Metadata Manager Warehouse Metadata Manager provides the following tools:Metadata Manager Console Metadata Manager Custom Metadata Configurator Metadata Manager Interface

    Center Of Excellence-Data Warehousing

  • Metadata Manager ComponentsMetadata Manager works within a web-based framework that requires the interaction of the following components: Application server-Helps the Metadata Manager Server manage its processes efficiently Metadata Manager Server-Manages the source repository metadata stored in the Metadata Manager WarehouseMetadata Manager Warehouse-Stores the Metadata Manager metadata, such as the Metadata Manager reporting schema, user profiles, and reportsPowerCenter repository-Stores the workflows, which are XConnect components that extract source metadata and load it into the Metadata Manager WarehouseWeb server-Fetches and transmits Metadata Manager pages to web browsers

    Center Of Excellence-Data Warehousing

  • Metadata Manager Architecture

    Center Of Excellence-Data Warehousing

  • Metadata ReportsYou can browse and analyze PowerCenter metadata with PowerCenter Repository Reports Metadata reports prepackages a set of reports and dashboards which can be easily customized to meet the business needsYou can analyze the following types of metadata stored in the repository:Source and target metadataTransformation metadataMapping and mapplet metadataWorkflow and worklet metadataSession metadataChange management metadataUser and group metadataOperational metadata

    Center Of Excellence-Data Warehousing

  • Repository ReportsUse PowerCenter Repository Reports to browse and analyze PowerCenter metadataTypes of reports:Configuration Management. With Configuration Management reports, you can analyze deployment groups and PowerCenter repository object labels. Operations. With Operations reports, you can analyze operational statistics for workflows, worklets, and sessions. Operational reports provide information such as connection usage, service load by period, and workflow and session load times, completion status, and errors PowerCenter Objects. With PowerCenter Object reports, you can identify PowerCenter objects, their properties, and their interdependencies with other repository objects Security. With the Security report, you can analyze users, groups, and their association within the repository

    Center Of Excellence-Data Warehousing

  • Metadata BrowserMetadata Manager ConsoleSet up, configure, and run XConnects, which load source repository metadata into the Metadata Manager Warehouse. Each XConnect consists of a preliminary transformation process and PowerCenter workflows that load metadata from a particular source repository into the Metadata Manager Warehouse. Metadata Manager Console can also be used to set up connections to source repositories and other Metadata Manager components Metadata Manager Custom Metadata ConfiguratorCreate XConnects to load metadata from source repositories for which Metadata Manager does not package XConnects. Metadata Manager InterfaceBrowse source repository metadata and run reports to analyze the metadata. Also, use it to configure metamodels, set up source repositories, configure the reporting schema, and set up access and privileges for users and groups.

    Center Of Excellence-Data Warehousing

  • Where-Used AnalysisUsed to determine where an object is used in one or more source repositories.

    Center Of Excellence-Data Warehousing

  • Data LineageMetadata manager data lineage is used to analyze where data originates, how the data is transformed and what objects transform it, and where the data endsWhen you display the data lineage for an object, the Designer connects to the Metadata Manager server and makes a request to run data lineage on the specified PowerCenter object The Metadata Manager server displays data lineage on the object in an Internet Explorer browser window. Before you can display data lineage for PowerCenter objects, make sure that the following requirements are met: Metadata Manager is installed and working properlyThe PowerCenter repository metadata is loaded into Metadata Manager. The Adobe SVG Viewer is installed on the machine where you run the Designer.

    Center Of Excellence-Data Warehousing

  • Accessing Data lineageYou can access data lineage from the following tools in the Designer: Source AnalyzerTarget DesignerTransformation DeveloperMapplet DesignerMapping DesignerYou can display data lineage on the following PowerCenter objects and their ports: Source definitionsTarget definitionsTransformations

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

  • Error HandlingError CategoriesError LoggingError Handling Strategies

    Center Of Excellence-Data Warehousing

  • Error Handling StrategiesYou can configure the session to either stop or continue running the session upon encountering a pre- or post-session stored procedure errorThe Integration Service stops a session when a pre- or post-session stored procedure database error occurs.

    Center Of Excellence-Data Warehousing

  • Session ErrorsPre-read and pre-load stored procedures are considered pre-session stored procedures configure the session to stop upon stored procedure error configure the session to continue upon stored procedure error Post-read and post-load stored procedures are considered post-session stored procedures configure the session to stop upon stored procedure error configure the session to continue upon stored procedure error Error causes the Integration Service to skip a row and issue an error messageThe error message displays in the session log file ERROR function can be used in expression transformations to validate the dataUse ERROR within an IIF or DECODE function to set rules for skipping rows.

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

  • PowerCenter UpgradeA new Upgrade wizard in Admin ConsolePowerCenter Pre-upgrade StepsPowerCenter Upgrade StepsPowerCenter Post-upgrade Steps

    Center Of Excellence-Data Warehousing

  • Integrated UI that takes the user through the various steps in the upgradeAllows user to switch in and out of the Upgrade UI to perform any other administrative activitiesCan handle multiple repositories (global /local) and multiple PowerCenter Servers in one shotLive feedback during repository upgrade as user goes through the upgrade processProvides a detailed upgrade summary report in the end

    A new Upgrade wizard in Admin ConsoleUpgrade Path

    Center Of Excellence-Data Warehousing

  • PowerCenter pre-upgrade stepsInstall and configure PowerCenter domain /nodes

    Prepare repository for upgradeCreate a copy of repository by copying the repository to a new database or by restoring a backup of the repository to a new database

    Copy Repository Agent and PowerCenter server configuration files to appropriate folder locationsCopy the configuration file to a node in the domain under the server/upgrade/cfgfiles directoryTo upgrade a single configuration file, copy the configuration file to a directory that can be accessed from the node hosting the PowerCenter domain.

    Center Of Excellence-Data Warehousing

  • PowerCenter upgrade StepsSelect and validate the configuration files for the Repository Agent or PowerCenter Server. Upgrade the global repository: Upgrade a Repository Agent configuration to a Repository Service and upgrade the contents of the global repository.Upgrade the local repository: Upgrade a Repository Agent configuration to a Repository Service and upgrade the contents of the local repository.Upgrade the PowerCenter Server configuration: Upgrade a PowerCenter Server configuration to an Integration Service.View the results of the upgrade: The Upgrade Wizard lists the results of all upgrade activities.

    Center Of Excellence-Data Warehousing

  • PowerCenter post-upgrade referenceIf you plan to use a UTF-8 repositoryBack up the upgraded repository. Create a UTF-8 database and restore the repository backup file to the new database.

    Command line script filesPassword encryption algorithm has changed. Environment password variables need to be resetPmcmd command line changes are backward compatiblePmrepagent commands replaced with corresponding pmrep command

    Move PowerCenter Server run-time files: Move run-time files used by the PowerCenter Server to the new installation.

    Center Of Excellence-Data Warehousing

  • Questions

    Center Of Excellence-Data Warehousing

    The wizard-driven auto approach enables developers to gain valuable insights quickly and with minimal effort. Users have a choice of auto or custom profiling for generating rules that drive profiling. The custom profile feature provides users full control over the profile creation process, allowing users to gather detailed metrics about their source data and capture exception rows for analysis prior to building an integration workflow.

    Functional Dependencies Analysis function. Determines exact and approximate dependencies between columns in a source.. Candidate key Calculates the number and percentage of unique values in one or more source columns.Redundancy column analysis. Calculates the number of duplicate values in one or more source columns.Aggregate functions. Calculates an aggregate value for numeric or string values in a column. Use aggregate functions to count null values, determine average values, determine minimum or maximum values, and minimum and maximum length for strings values.Domain Validation: You can give a list of values to be validated against a column data

    Inter-source Structure Inference function. Determines primary key-foreign key relationships among multiple sources. Use this function to analyze up to 15 sources. Orphan analysis. You can specify up to six join conditions in the Orphan Analysis and Join Complexity Evaluation functions.Reporting. You can load source rows as part of verbose data and view in integrated report.

    This feature enables organizations to compare and correct address information against postal service directories from more than 195 countries, resulting in greater quality and accuracy of address information.

    Increases developer productivity with pre-built data quality rules

    Structured data is what INFA has handled in the past, however 80% of data in the enterprise is un-structured and not stored in databases or flat files. Data stored in Excel, Word, PDFs, PowerPoints etc.Partnership with Item Field allows PowerCenter to integrate all of the un-structured or semi-structured data with the structured data across the enterprise.With this unstructured data option, organizations can seamlessly access, discover, integrate, and deliver enterprise data currently locked in documents and industry-specific data formats.

    Over time, you may need to modify metadata to support changes in applications. For example, you may have a business intelligence application that calculates metric A based on a given rule. Report M requires a new definition for metric A. Before you change the metric definition, you need to determine the impact on all objects in the business intelligence application that use metric A. Multiple reports may use metric A, where some of the reports require the old metric calculation.An object may be used more than once in a single repository or in multiple repositories.

    Difference between where used analysis and data lineage is where used analysis finds where about of a information whereas data lineage gives how the information is derived and how it is been used

    Select the location of the PowerCenter Server and Repository Agent configuration filesCan either select a single configuration file and configuration file typeOr select a node in the domain on which all these configuration files are availableAdditionally allows validation of configuration files to check if they are appropriateUpgrade Global repositoryCan upgrade multiple global repositoriesIf there are no global repositories, you can skip this step and move onNeed to specify License , Node on which Repository Service should run, Folder in the domain (Optional)

    Upgrade Local repositoryCan upgrade multiple local repositoriesCan assign to a particular global repository if needed.

    Specify associated Repository Service and Domain for Repository Service for each Integration Service