Informatica Debugger and Performance Tuning

download Informatica Debugger and Performance Tuning

of 12

Transcript of Informatica Debugger and Performance Tuning

Informatica Debugger and Performance Tuning

Wipro Technologies

Informatica Powercenter debugger Tracing level and performance tuning

Informatica Powercenter Debugger Tracing level and Performance TuningAUTHOR

Amaranadh PalappareddiRevision History

Document ReferenceAuthorReason for issueEffective Date

Approved By

Amaranadh Palappareddi

Table of Content4Informatica Debugger

41. Overview

42. Debugger Sessions

43. Debug Process

54. Privileges Required

6Informatica Tracing Levels

61. Normal:

62. Verbose initialization:

63. Verbose data:

74. Terse

8Informatica Performance Tuning

81. Effective Mapping Design

81.1 Description

81.2 General Suggestions for Optimizing

91.2 Lookup Transformation Optimizing Tips

101.3 Operations and Expression Optimizing Tips

102.Performance Tuning at Workflows

102.1 Overview

112.2 Identifying workflow bottlenecks

113. Optimizing Workflow for better performance (session)

Informatica Debugger

1. Overview

To gain troubleshooting information, we can debug a valid mapping. We configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When we run the Debugger, it pauses at breakpoints and we can view and edit transformation output data.We can configure the Debugger in two situations: Before running a session: After we save a mapping, we can have some initial tests on the mapping. After running a session: If the session fails, we can configure the Debugger against the session and find out the reason for failure or unexpected results.2. Debugger Sessions

While configuring a Debugger, we can select the following different debugger session types: Existing non-reusable session: Uses existing source, target, and non-reusable session configuration properties. The Debugger does not suspend on error. Existing reusable session: Uses existing source, target, and session configuration properties. When we run the Debugger, it creates and runs a debug workflow for the reusable session. Create a debug session instance: We can create source, target and session configuration properties on our own through Debugger Wizard. It creates and runs a debug workflow for the session.3. Debug Process

Here are the steps to debug a mapping: Create breakpoints: We can create breakpoints in a mapping where we want to evaluate data and error conditions. Configure the Debugger: Using the Debugger Wizard configure the Debugger and select a session. When we create a debug session, we can select sub sessions like sources and targets under debugger wizard. Run the Debugger: When we run the debugger, the debugger connects to Integration Service. Integration service runs the session and connected workflow. It reads the breakpoints and pause the debugger when condition evaluates to true. Monitor the Debugger: While running the Debugger we can view the transformation, target and mapping output data, debug log and session log. There are few windows we can view target data and transformation data etc:o Debug log: View messages from the Debugger.o Target window: View target data.o Instance window: View transformation data. Modify data and breakpoints: We can modify the data when the Debugger pauses and see the effect on transformations, mappings and targets as data moves through pipeline.Note: Designer saves the breakpoint information in the workspace files. We can copy this information and use this in another mapping.4. Privileges Required

There is a set of privileges required to work with Debugger, To create or edit breakpoints, Use Designer privilege with read permission for the folder Super User privilege To run the Debugger, Use Designer and Use Workflow Manager privilege with read and execute permission on the folder Use Designer and Workflow Operator privilege with read permission on the folder Super User privilege To run the Debugger using an Integration Service enabled in safe mode, Admin Integration Service, Use Designer, and Use Workflow Manager privilege with read and execute permission on the folder Admin Integration Service, Use Designer, and Workflow Operator privilege with read permission on the folder Super User privilege

Informatica Tracing Levels

Informatica server creates session log file for each session. It writes information about session into log files such as initialization process, Creation of SQL commands for reader and writer threads.

Errors encountered and load summary

Start and end times for target loading. Errors encountered during the session and general information. Execution of post-session commands. Load summary of reader, writer, and DTM statistics. Integration Service version and build number.The amount of detail in session log file depends on the tracing level that is set at Session Properties. There are different types of tracing levels, they are

1. Normal

2. Verbose initialization

3. Verbose data

4. Terse

1. Normal:

Power center server logs initialization and status Information, skipped rows and summarization of the success rows and target rows due to transformation errors.

2. Verbose initialization:

In addition to Normal tracing, the session log file contains location of the data cache files and index cache files that are treated and detailed transformation statistics for each and every transformation within the mapping.

3. Verbose data:

Along with verbose initialization records it logs each and every record processed by the informatica

Verbose initialization and verbose data are used for debugging purpose.

Advantages: Finding out the error where exactly Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. It writes errors to both Session log and Error log when enable row error logging. Verbose initialization and verbose data are used for debugging purpose. Disadvantages:

It will decrease the performance due to debugging the records on row by row basis. The space consumed by the cache files is more when compared to other tracing levels.

4. Terse:

To reduce the amount of time spent writing to the session log file; need to set the tracing level to terse. At this the power center server does not write Error Messages or row-level information for Rejected data.

Advantages:

For better performance of mapping execution the tracing level should be specified as TERSE. Disadvantages:

It can not find out properly where exactly where the record is getting truncated and rejected.Informatica Performance Tuning1. Effective Mapping Design

1.1 Description

Although Power Center environments vary widely, most sessions and/or mappings can benefit from the implementation of common objects and optimization procedures. Follow these procedures and rules of thumb when creating mappings to help ensure optimization.

1.2 General Suggestions for Optimizing

1. Reduce the number of transformations There is always overhead involved in moving data between transformations. 2. Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice. 3. Calculate once, use many timeso Avoid calculating or testing the same value over and over. o Calculate it once in an expression, and set a True/False flag. o Within an expression, use variable ports to calculate a value than can be used multiple times within that transformation. 4. Only connect what is used o Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier. o This is also helpful for maintenance. If a transformation needs to be reconnected,it is best to only have necessary ports set as input and output to reconnect. o In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, whichcuts down on the amount of cache necessary and thereby improves performance. 5. Watch the data typeso The engine automatically converts compatible types. o Sometimes data conversion is excessive. Data types are automatically converted when types are different between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping. 6. Facilitate reuseo Plan for reusable transformations upfront. o Use variables. Use both mapping variables as well as ports that are variables. Variable ports are especially beneficial when they can be used to calculate a complex expression or perform a disconnected lookup call only once instead of multiple times o Use mapplets to encapsulate multiple reusable transformations. o Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions. 7. Only manipulate data that needs to be moved and transformedo Reduce the number of non-essential records that are passed through the entire mapping. o Use active transformations that reduce the number of records as early in the mapping as possible (i.e., placing filters, aggregators as close to source as possible). o Select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table for a faster join. 8. Utilize single-pass reads. o Redesign mappings to utilize one Source Qualifier to populate multiple targets. This way the server reads this source only once. If you have different Source Qualifiers for the same source (e.g., one for delete and one for update/insert), the server reads the source for each Source Qualifier. o Remove or reduce field-level stored procedures. If you use field-level stored procedures, the Power Center server has to make a call to that stored procedure for every row, slowing performance.

1.2 Lookup Transformation Optimizing Tips

1. When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This typically improves performance by 10 to 20 percent. 2. The rule of thumb is not to cache any table over 500,000 rows. This is only true if the standard row byte count is 1,024 or less. If the row byte count is more than 1,024, then the 500k rows will have to be adjusted down as the number of bytes increase (i.e., a 2,048 byte row can drop the cache row count to between 250K and 300K, so the lookup table should not be cached in this case). This is just a general rule though. Try running the session with a large lookup cached and not cached. Caching is often still faster on very large lookup tables. 3. When using a Lookup Table Transformation, improve lookup performance by placing all conditions that use the equality operator = first in the list of conditions under the condition tab. 4. Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows. For fewer number of lookup calls, do not cache if the number of lookup table rows is large. For small lookup tables(i.e., less than 5,000 rows), cache for more than 5 to 10 lookup calls. 5. Replace lookup with decode or IIF (for small sets of values). 6. If caching lookups and performance is poor, consider replacing with an unconnected, uncached lookup. 7. For overly large lookup tables, use dynamic caching along with a persistent cache. Cache the entire table to a persistent file on the first run, enable the update else insert option on the dynamic cache and the engine will never have to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains. 8. Review complex expressions. Examine mappings via Repository Reporting and Dependency Reporting within the mapping. Minimize aggregate function calls. Replace Aggregate Transformation object with an Expression Transformation object and an Update Strategy Transformation for certain types of Aggregations. 1.3 Operations and Expression Optimizing Tips

1. Numeric operations are faster than string operations. 2. Optimize char-varchar comparisons (i.e.,trim spaces before comparing). 3. Operators are faster than functions (i.e., ||vs. CONCAT). 4. Optimize IIF expressions. 5. Avoid date comparisons in lookup; replace with string. 6. Test expression timing by replacing with constant. 7. Use flat files. Usingflat files located on the server machine loads faster than a database located in the server machine. Fixed-width files are faster to load than delimited files because delimited files require extra parsing. If processing intricate transformations, consider loading first to a source flat file into a relational database, which allows the Power Center mappings to access the data in an optimized fashion by using filters and custom SQL Selects where appropriate. 8. If working with data that is not able to return sorted data (e.g., Web Logs), consider using the Sorter Advanced External Procedure. 9. Use a Router Transformation to separate data flows instead of multiple Filter Transformations. 10. Use a Sorter Transformation or hash-auto keys partitioning before an Aggregator Transformation to optimize the aggregate. With a Sorter Transformation, the Sorted Ports option can be used, even if the original source cannot be ordered. 11. Use a Normalizer Transformation to pivot rows rather than multiple instances of the same target. 12. Rejected rows from an update strategy are logged to the bad file. Consider filtering before the update strategy if retaining these rows is not critical because logging causes extra overhead on the engine. Choose the option in the update strategy to discard rejected rows. 13. When using a Joiner Transformation, be sure to make the source with the smallest amount of data the Master source. 14. If an update override is necessary in a load, consider using a Lookup transformation just in front of the target to retrieve the primary key. The primary key update will be much faster than the non-indexed lookup override.

2.Performance Tuning at Workflows2.1 Overview

1) The first step in performance tuning is to identify performance bottlenecks. Session task is one of the places where we can encounter the bottleneck. 2) The strategy is to identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the performance. To tune session performance, you can use the test load option to run sessions.

2.2 Identifying workflow bottlenecks

1) If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck.2) To identify a session bottleneck, use the performance details. The Integration Service creates performance details when you enable Collect Performance Data in the Performance settings on the session properties.3) Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.4) Encountering deadlocks can slow session performance.

3. Optimizing Workflow for better performance (session)1. Use a grid to balance the Integration Service workload.2. You can run independent sessions and workflows concurrently to improve session and workflow performance3. You can increase the buffer memory allocation for sources and targets that require additional memory blocks. If the Integration Service cannot allocate enough memory blocks to hold the data, it fails the session4. Set the optimal location and size for the caches.5. Increase the commit interval. Each time the Integration Service commits changes to the target, performance slows. Increase the interval at which the Integration Service commits changes.6. Disable high precision. Performance slows when the Integration Service reads and manipulates data with the high precision data type.7. Reduce errors tracing. Reduce the error tracing level, which reduces the number of log events generated by the Integration Service.8. Remove staging areas. When you use a staging area, the Integration Service performs multiple passes on the data.9. Running workflows and sessions on the nodes of a grid provides the following performance gains: balances the Integration Service workload. Processes concurrent sessions faster. Processes partitions faster.10. When you run a workflow on a grid, the Integration Service loads memory and CPU resources on nodes without requiring coordination between the nodes.11. When you configure the Integration Service with high availability, the Integration Service recovers workflows and sessions that may fail because of temporary network or machine failures. To recover from a workflow or session, the Integration Service writes the states of each workflow and session to temporary files in a shared directory. This may decrease performance.

PAGE 1 Wipro Confidential