Informatica Best Practices

36
Informatica - Best Practices - HOW TO: Take repository backup on weekly basis - HOW TO: Retain multiple session and workflow log files in the session log folder - HOW TO: Renaming Informatica Repository Name HOW TO: Take Informatica repository backup on a daily basis It is always recommended to take full backup of development repository on a daily basis and keep it at least for a week period, so that in case of accidental deletion of mapping it can be restored from the backup. Basic batch program that automatically takes repository backup and zip the backup and keep it for a week duration:- REM ---------------------------------------------------------- REM --SET VARIABLES REM ---------------------------------------------------------- set LOGFILENM=<Repository Name> set REP=<Repository Name> set USER=Administrator set PWD=<Admin Password> set BKPPATH=Drive:\<Location> Set DAY=%DATE:~0,3% REM ---------------------------------------------------------- ECHO. |TIME > %LOGFILENAME%.log ECHO "Backup Started & Connecting to Repository" >> %LOGFILENAME%.log pmrep connect -r %REP% -n %USER% -x %PWD% -h DEVMAX -o 5001 ECHO "Repository Connected" >> %LOGFILENAME%.log pmrep Backup -o %BKPPATH%%LOGFILENAME%_%DAY%.rep -f -b -j -q ECHO. |TIME >> %LOGFILENAME%.log ECHO "Backup Completed" >> %LOGFILENAME%.log zip -r -9 -S -m %LOGFILENAME%_%DAY%.rep.zip %LOGFILENAME%_%DAY%.rep ECHO "Zipping complete!!!" >> %LOGFILENAME%.log Example If LOGFILENM = DV_REP then Output of the above script will be DV_REP_Mon.zip The above batch program on Informatica server can be scheduled to run in night on a daily basis.

Transcript of Informatica Best Practices

Page 1: Informatica Best Practices

Informatica - Best Practices

  - HOW TO: Take repository backup on weekly basis  - HOW TO: Retain multiple session and workflow log files in the session log folder  - HOW TO: Renaming Informatica Repository Name HOW TO: Take Informatica repository backup on a daily basis  It is always recommended to take full backup of development repository on a daily basis and keep it at least for a week period, so that in case of accidental deletion of mapping it can be restored from the backup.  Basic batch program that automatically takes repository backup and zip the backup and keep it for a week duration:- REM ----------------------------------------------------------REM --SET VARIABLESREM ----------------------------------------------------------set LOGFILENM=<Repository Name>set REP=<Repository Name>set USER=Administrator set PWD=<Admin Password>set BKPPATH=Drive:\<Location>Set DAY=%DATE:~0,3%REM ----------------------------------------------------------ECHO. |TIME > %LOGFILENAME%.logECHO "Backup Started & Connecting to Repository" >> %LOGFILENAME%.logpmrep connect -r %REP% -n %USER% -x %PWD% -h DEVMAX -o 5001ECHO "Repository Connected" >> %LOGFILENAME%.logpmrep Backup -o %BKPPATH%%LOGFILENAME%_%DAY%.rep -f -b -j -q

ECHO. |TIME >> %LOGFILENAME%.logECHO "Backup Completed" >> %LOGFILENAME%.logzip -r -9 -S -m %LOGFILENAME%_%DAY%.rep.zip %LOGFILENAME%_%DAY%.repECHO "Zipping complete!!!" >> %LOGFILENAME%.log  Example If LOGFILENM = DV_REP then Output of the above script will be DV_REP_Mon.zip 

 The above batch program on Informatica server can be scheduled to run in night on a daily basis.  HOW TO: Retain multiple session and workflow log files in the session log folder

You can set sessions to retain log files according to the following options: 

1. Session Runs. The Integration Service creates a designated number of session log files. Configure the number of session logs in the Save Session Log for These Runs option. The Integration Service does not archive binary logs. 

2. Session Time Stamp. The Integration Service creates a log for all sessions, appending a time stamp to each log. When you save a session log by time stamp, the Integration Service archives the binary logs and text log files. 

Page 2: Informatica Best Practices

To retain multiple session and workflow log files in the session log folder edit the default session configuration as follows: 

1.     Open the Workflow Manager. 2.     Select Tasks > Session Configuration > Edit 3.     Click the Properties tab. 4.     For Save Session log by select Session runs or Session timestamp from the list 5.     Optional. If you selected Session runs in step 4, enter a numeric value for the Save session log

for these runs attribute: 

 

Note: These settings will take effect for new sessions you create.For old existing sessions you will get the Revert option.  If you click on Revert, then it will take your modified configuration file settings. 

 

Page 3: Informatica Best Practices

 This change will be applicable to all the new sessions. $PMSessionLogCount Integration Service Variable You can also use the $PMSessionLogCount service variable to create the configured number of session logs for the Integration Service. 

1.     Go to Administration Console. 2.     Go to the Integration Service > Properties > General Properties. 3.     Set the value $PMSessionLogCount.

In this example it is set to 2 which will keep a history of 3 (2+1) log files.   

 4.     Open the Workflow Manager. 5.     Select Tasks > Session Configuration > Edit. 6.     Select the Properties tab. 7.     Use of variable $PMSessionLogCount set in the Administration Console: 

 

Page 4: Informatica Best Practices

HOW TO: Renaming Informatica Repository Name

 There may be cases when you would like to rename Informatica Repository, the effective way of doing as follows -  Make sure while performing following operation, none of the developers are connected to Informatica Repository.

Step 1: Using Repository Server Administration - Create repository backup: File Name “BI_Informatica_Rep” and skip workflow and session logs (this will save time and 

space) as shown in figure 1.

 Figure 1: Repository Backup - Advance Options

 Step 2: Validate the backup is created on X:\Infa_714\RepositoryServer\bin\Backup Step 3: Create a copy of license file to NewName.lic from OldName.lic, the license file can be found at X:\Infa_714\RepositoryServer\bin Step 4: Using Repository Server Administration – Drop the existing repository Step 5: Using Repository Server Administration – Create new repository from backup the file "BI_Informatica_Rep" as shown in Figure 2  

 Figure 2: Create New Repository using Restore

Page 5: Informatica Best Practices

Informatica Performance Improvement Tips

We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target.  Following standards/guidelines can improve the overall performance.

Use Source Qualifier if the Source tables reside in the same schema  Make use of Source Qualifer  “Filter” Properties if the Source type is Relational.  If the subsequent sessions are doing lookup on the same table, use persistent cache in the first 

session. Data remains in the Cache and available for the subsequent session for usage.  Use flags as integer, as the integer comparison is faster than the string comparison.  Use tables with lesser number of records as master table for joins.  While reading from Flat files, define the appropriate data type instead of reading as String and 

converting.  Have all Ports that are required connected to Subsequent Transformations else check whether 

we can remove these ports  Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations  Minimize the number of Update strategies.  Group by simple columns in transformations like Aggregate, Source Qualifier  Use Router transformation in place of multiple Filter transformations.  Turn off the Verbose Logging while moving the mappings to UAT/Production environment.  For large volume of data drop index before loading and recreate indexes after load.  For large of volume of records Use Bulk load Increase the commit interval to a higher value large 

volume of data  Set ‘Commit on Target’ in the sessions 

Leveraging Metadata in Informatica Workflow-Session/Analysis

We can  leverage  the  metadata  collected   in   the  Informatica  repository   for  many  interesting analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.

This   SQL  Queries   can  be   executed   in  Oracle   database  with  no   changes   and   requires   little modification with other databases.

Failed Sessions     The  following  query   lists   the   failed  sessions.  To make  it  work   for   the   last   ‘n’  days,   replace 

SYSDATE-1 with SYSDATE – n QUERY:    SELECT SUBJECT_AREA AS FOLDER_NAME, SESSION_NAME, LAST_ERROR AS ERROR_MESSAGE, DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS, ACTUAL_START AS START_TIME, SESSION_TIMESTAMP FROM REP_SESS_LOG WHERE RUN_STATUS_CODE != 1 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE) RESULT:   

Page 6: Informatica Best Practices

Long running Sessions     The following query lists long running sessions. To make it work for the last ‘n’ days, replace 

SYSDATE-1 with SYSDATE – n QUERY:    SELECT SUBJECT_AREA AS FOLDER_NAME, SESSION_NAME, SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS, SUCCESSFUL_ROWS AS TARGET_ROWS, ACTUAL_START AS START_TIME, SESSION_TIMESTAMP FROM REP_SESS_LOG WHERE RUN_STATUS_CODE = 1 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE) AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60)) ORDER BY SESSION_TIMESTAMP  RESULT:   

Invalid Tasks    The following query lists folder names and task name, version number, and last saved for all 

invalid tasks.

Page 7: Informatica Best Practices

QUERY:    SELECT SUBJECT_AREA AS FOLDER_NAME, DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE, TASK_NAME AS OBJECT_NAME, VERSION_NUMBER, LAST_SAVED FROM REP_ALL_TASKS WHERE IS_VALID=0 AND IS_ENABLED=1 ORDER BY SUBJECT_AREA,TASK_NAME RESULT:   

Merge Rows as Columns / Transpose records

The Requirement: Converting rows to columnsCustomer  Product CostCust1 P1 10Cust1 P2 20Cust1 P3 30Cust2 ABC 10Cust2 P2 25Cust2 Def 10Customer

Product1

Cost1

Product2

Cost2

Product3

Cost3

Cust1 P1 10 P2 20 P3 30Cust2 ABC 10 P2 25 Def 10The above illustration would help in understanding the requirement. We had to merge multiple records into one record based on certain criteria. The design had to be reusable since each dimension within the data mart required this flattening logic.

Page 8: Informatica Best Practices

1. Approach:

The use of aggregator transformation would group the records by a key, but retrieval of the values for a particular column as individual columns is a challenge, hence designed a component ‘Flattener’ based on expression transformation.Flattener is a reusable component, a mapplet that performs the function of flattening records.Flattener consists of an Expression and a Filter  transformation.  The expression is used to club each incoming record based on certain logic. Decision to write the record to target is taken using the Filter transformation.2. Design:The mapplet can receive up to five inputs, of the following data types:i_Col1 (string),  Customeri_Col2 (string), Producti_Col3 (decimal), Costi_Col4 (decimal) andi_Col5 (date/time)Have kept the names generic trying to accept different data types, so that the mapplet can be used in any scenario where there is a need for flattening records. The mapplet gives out 15×5 sets of output, in the following manner:o_F1_1 (string), Customero_F2_1 (string), Product1o_F3_1 (decimal), Cost1o_F4_1 (decimal) ando_F5_1 (date/time)o_F1_2 (string), Customer o_F2_2 (string), Product2o_F3_2 (decimal), Cost2o_F4_2 (decimal) ando_F5_2 (date/time)… … and so onThe output record  is  going to have repetitive sets of 5 columns each (Each set  would refer to one incoming row). Based on the requirement the number of occurrence of these sets can be increased. The required fields alone can be used / mapped.  For the above example we use just 2 strings and one decimal for mapping Customer, Product and Cost.The  mapplet   receives   records   from   its   parent  mapping.   The   Expression  would   initially   save   each incoming value to a variable and compare it with its counterpart that came in earlier and is held in its cache as long as the condition to flatten is satisfied. 

Syntax to store current and previous values:i_Col2 string  iprv_Col2 string v curr_Col2curr_Col2 string v i_Col2

The   condition/logic   to  flatten   records   is  parameterized  and  decided  before  mapping   is   called   thus increasing codes’ scalability. The parameterized logic is passed to the Expression transformation via a Mapplet parameter. The value is used as an expression to perform the evaluation and the result is a flag value either ‘1’ or ‘2’. 

Syntax for port – flagFlag integer v $$Expr_compare

An example for parameterized expression

Page 9: Informatica Best Practices

$$Expr_compare =  iif (curr_Col1 = prv_Col1 AND curr_Col2 != prv_Col2, 1, iif (curr_Col1 != prv_Col1,2))

A variable port named “rec_count” is incremented, based on the flag. Syntax for port – rec_count

rec_count integer v iif (flag=2,0, iif (flag=1,rec_count + 1,rec_count))The expression transformation now uses the value in ports “flag” and “rec_count” to decide the place holder for each incoming input value, i.e. the column in target table where this data would move into ultimately. This process is an iterative one and goes on until the comparison logic ($$Expr_compare) holds good, i.e. until all records get flattened per the logic. An example of the place holder expression is shown below:

v_Field1 data type v iif(flag=2 AND rec_count=0,curr_Col1, v_Field1)Port “write_flag_1” is set to 1 when the comparison logic fails (meaning flattening is complete). Filter transformation filters out the record once it is completely transposed.

Filter condition: write_flag_1 integer v iif (flag=2 AND write_flag>1 ,1,0) 

3. Outcome:After developing the code and implementing the same we found it to be a useful utility, so thought of sharing it  and would like to hear suggestions from readers on performing the same functionality in a different way. Please do share your views.

Informatica and Stored Procedures

A. Described below is a scenario where the requirement is to have a stored procedure that returns a cursor as a source.By and large PowerCenter does not support a stored procedure that returns a cursor as a source. The workaround for this is1. The procedure that will load the data to a new table: 

CREATE OR REPLACE procedure load (p_initial_date in date, p_final_Date in date) as str_load varchar2 (500);str_clean varchar2 (500);BEGIN str_clean:= ‘DELETE FROM EMP’;str_load:= ‘INSERT INTO EMP select * from EMPLOYEE where DOJ between trunc (p_initial_date) and trunc (p_final_Date) ‘;execute immediate str_clean;execute immediate str_load;EXCEPTIONWHEN OTHERSTHENROLLBACK;END LOAD;

2. Create the table that will receive the data from the procedure: 

SQL> create table EMP as SELECT * from EMPLOYEE where 1 > 2;

3. Add a Store Procedure transformation to the PowerCenter mapping. This transformation will execute this new procedure called as LOAD on this example. 

4. Set the run method to be Source Pre Load, to be executed before read the source table. 

Page 10: Informatica Best Practices

5. Import the EMP table as a Source Definition. This table will be populated by the new Store Procedure. 

If the original store procedure is used by the customer application and you can’t change the source code, you can create a new store procedure that call the original one (without inserting into a table), and execute the insert on the new table executing a loop on the returned cursor.

B. Given below is a situation where you wanted to pass a mapping variable to a stored procedure transformation (it can either be connected or unconnected).

Connected Stored Procedure 

The parameters that are passed to a connected Stored Procedure have to be linked from another transformation.Given below are the steps to pass mapping variable to a connected Stored Procedure transformation: 

1. Create an Expression transformation. 2. Create an output port in the Expression transformation with the following expression: 

$$mapping_variable 

This sets the value of this output port to the mapping variable. 

1. Link this output port to the Stored Procedure transformation. 

Unconnected Stored Procedure 

For unconnected Stored Procedure transformations you can use the mapping variable in the expression calling the stored procedure.Follow the steps below to pass mapping variable to a unconnected Stored Procedure transformation: 

1. Create an Expression transformation. 2. Create an output port in the Expression transformation with the following expression: 

: SP.GET_NAME_FROM_ID ($$mapping_variable, PROC_RESULT) 

In case if you are attempting to use a mapping variable to store the output value of the stored procedure, the session will fail with the below error.

“TE_7002 Transformation Parse Fatal Error; transformation stopped: invalid function reference. Failed to Initialize Server Transformation.”

To resolve the issue replace the mapping variable with the PROC_RESULT system variable. 

Example: Incorrect, using a mapping variable: :SP.PROCEDURE(FIELD1, $$mapping_variable) Correct, using the PROC_RESULT system variable: :SP.PROCEDURE(FIELD1,PROC_RESULT) Or

Page 11: Informatica Best Practices

:SP.PROCEDURE($$mapping_variable,PROC_RESULT) 

The PROC_RESULT system variable assigns the stored procedure output to the port with this expression.

Output Files in Informatica

The Integration Service process generates output files when we run workflows and sessions. By default, the Integration Service logs status and error messages to log event files.

Log event files are binary files that the Log Manager uses to display log events. When we run each session, the Integration Service also creates a reject file. Depending on transformation cache settings and target types, the Integration Service may create additional files as well.The Integration Service creates the following output files:

Output Files

Session Details/logs:

When we run a session, the Integration service creates session log file with the load statistics/table names/Error information/threads created etc based on the tracing level that have set in the session properties. 

Page 12: Informatica Best Practices

We can monitor session details in the session run properties while session running/failed/succeeded. 

Workflow Log:

Workflow log is available in Workflow Monitor.  The Integration Service process creates a workflow log for each workflow it runs.  It writes information in the workflow log such as 

o Initialization of processes, o Workflow task run information, o Errors encountered and o Workflows run summary. 

The Integration Service can also be configured to suppress writing messages to the workflow log file. 

As with Integration Service logs and session logs, the Integration Service process enters a code number into the workflow log file message along with message text. 

Performance Detail File:

The Integration Service process generates performance details for session runs.  Through the performance details file we can determine where session performance can be 

improved.  Performance details provide transformation-by-transformation information on the flow of data 

through the session. 

Reject Files:

By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets. 

The writer may reject a row in the following circumstances: o It is flagged for reject by an Update Strategy or Custom transformation. o It violates a database constraint such as primary key constraint o A field in the row was truncated or overflowed o The target database is configured to reject truncated or overflowed data. 

Note: By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. We can view this file name in session level.

Open Session – Select any of the target View the options o Reject File directory. o Reject file name. 

If you enable row error logging, the Integration Service process does not create a reject file. 

Row Error Logs:

When we configure a session, we can choose to log row errors in a central location.  When a row error occurs, the Integration Service process logs error information that allows to 

determine the cause and source of the error. 

Page 13: Informatica Best Practices

The Integration Service process logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information. 

we enable flat file logging, by default, the Integration Service process saves the file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager. 

Recovery Tables Files:

The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery. 

When you run a session in recovery mode, the Integration Service process uses information in the recovery tables to complete the session. 

When the Integration Service process performs recovery, it restores the state of operations to recover the workflow from the point of interruption. 

The workflow state of operations includes information such as active service requests, completed and running status, workflow variable values, running workflows and sessions, and workflow schedules. 

Control File:

When we run a session that uses an external loader, the Integration Service process creates a control file and a target flat file. 

The control file contains information about the target flat file such as data format and loading instructions for the external loader. 

The control file has an extension of .ctl. The Integration Service process creates the control file and the target flat file in the Integration Service variable directory, $PMTargetFileDir, by default. 

Email:

We can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer and the Email task can be placed in a workflow, or can be associated it with a session. 

The Email task allows to automatically communicate information about a workflow or session run to designated recipients. 

Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, we can create two different messages, one to be sent if the session completes successfully, the other if the session fails. 

We can also use variables to generate information about the session name, status, and total rows loaded. 

Indicator File:

If we use a flat file as a target, we can configure the Integration Service to create an indicator file for target row type information. 

For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. 

The Integration Service process names this file target_name.ind and stores it in the Integration Service variable directory, $PMTargetFileDir, by default. 

Target or Output File:

Page 14: Informatica Best Practices

If the session writes to a target file, the Integration Service process creates the target file based on a file target definition. 

By default, the Integration Service process names the target file based on the target definition name. 

If a mapping contains multiple instances of the same target, the Integration Service process names the target files based on the target instance name. 

The Integration Service process creates this file in the Integration Service variable directory, $PMTargetFileDir, by default. 

Cache Files:

When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects: 

o Aggregator transformation o Joiner transformation o Rank transformation o Lookup transformation o Sorter transformation o XML target 

By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations   and   XML   targets   in   the  directory   configured   for   the   $PMCacheDir   service process variable. 

Processing Multiple XML Files through Informatica – 1 Problem Statement: Data to be processed in Informatica were XML files in nature. The number 

of XML files to be processed was dynamic in nature. The need was also to ensure that the XML file name from which data is being processed is to be captured.

Resolution:Option 1 – Using File list as part of Indirect File Sources in sessionOption 2 – Using Parameter File and workflow variable

Implementation Details for option 1: Using File list XML file names to be processed were read using batch script and file list was created containing 

XML file. This file list name was set under source properties at session level. XML file were read sequentially and data pertaining to every XML file was processed. Since the number of XML files to be processed was dynamic the need of the hour was to achieve looping in Informatica.

Challenge in using File List  – Created in a session to run multiple source files for one source instance in the mapping. When file list is used in a mapping as multiple source files for one source   instance,   the   properties   of   all   files  must  match   the   source   definition.   File   list   are configured   in   session  properties  by  mentioning   the  file  name  of   the  file   list   in   the  Source Filename field and location of the file list in the Source File Directory field. When the session starts, the Integration Service reads the file list, then locates and reads the first file source in the list. After the Integration Service reads the first file, it locates and reads the next file in the list. The issue using XML file names in file  list  was further compounded by Informatica grouping records pertaining  to similar  XML node together.  This   lead to difficultly   in   identifying which record belonged to which XML file.

Batch Script – batch scripts controlled over all looping in Informatica by encompassing below mentioned tasks:• Reading XML file names from staging location and creating file list containing XML file names.• Moving XML files from staging location to archive location.• Verifying whether there are any more XML files to be processed and depending on the 

Page 15: Informatica Best Practices

outcome either loop the process by invoking first workflow or end the process• Using PMCMD commands invoke appropriate workflows.

Workflow Details –There were two Informatica workflows designed to achieve looping:• First workflow –created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are:o Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server.o Command task which will execute the second workflow to process data within XML files.

• Second workflow will read process XML files and populate staging tables. Details of workflow are:o A session will read XML file names using indirect file and load into staging tables.o A command task will move the XML file just processed in file into an archive folder. Using batch scripto A command task will execute a batch script which will:Check whether there are any more XML files to be processed. If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables. If no then process will complete.

Informatica Development Best Practice – Workflow

Page 16: Informatica Best Practices

Workflow Manager default properties can be modified to improve the overall performance and few of them are listed below.    This properties can impact the ETL runtime directly and needs to configured based on :

i) Source Databaseii)  Target Databaseiii) Data Volume

Category Technique

 Session Properties

While loading Staging Tables for FULL LOADS,  Truncate target table option should be checked. Based on the Target database and the primary key defined, Integration Service fires TRUNCATE or DELETE statement.Database                  Primary Key Defined                   No Primary KeyDB2                             TRUNCATE                                       TRUNCATEINFORMIX                 DELETE                                              DELETEODBC                         DELETE                                                DELETEORACLE                    DELETE UNRECOVERABLE            TRUNCATEMSSQL                       DELETE                                               TRUNCATESYBASE                     TRUNCATE                                        TRUNCATE 

Workflow Property “Commit interval” (Default value : 10,000) should be increased for increased for Volumes more than 1 million records.  Database Rollback Segment size should also be updated, while increasing “Commit Interval”.

 Insert/Update/Delete options should be set as determined by the target population method. 

Target Option                                   Integration ServiceInsert                                                   Uses Target update Option                                                               Update as Update                                                              Update as Insert                                                               Update else InsertUpdate as update                             Updates all rows as UpdateUpdate as Insert                               Inserts all rowsUpdate else Insert                            Updates existing rows else Insert

 Partition Maximum number of partitions for a session should be 1.5 times the number of processes in the Informatica server. i.e. 1.5 X 4 Processors = 6 partitions. Key Value partitions should be used only when an even Distribution of data can be obtained.  In other cases, Pass Through partitions should be used.A Source filter should be added to evenly distribute the data between Pass through Partitions. Key Value should have ONLY numeric values. MOD(NVL(<Numeric Key Value>,0),# No of Partitions defined)  Ex: MOD(NVL(product_sys_no,0),6) If a session contains “N” partition, increase the DTM Buffer Size to at least “N” times the value for the session with One partition If the Source or Target database is of MPP( Massively Parallel Processing ), enable Pushdown Optimization.  By enabling this, Integration Service will push as much Transformation Logic to Source database or Target database or FULL ( both ) , based on the settings.  This property 

Page 17: Informatica Best Practices

can be ignored for Conventional databases.

Informatica Development Best Practices – Mapping

The following are generally accepted “Best Practices” for Informatica PowerCenter ETL development and if implemented, can significantly improve the overall performance.

Category Technique   Benefits       Source Extracts Loading data from Fixed-width files 

take less time than delimited, since delimited files require extra parsing.  Incase of Fixed width files, Integration service know the Start and End position of each columns upfront and thus reduces the processing time.

  Performance Improvement

     Using flat files located on the server machine loads faster than a database located on the server machine.

  Performance Improvement

       Mapping Designer There should be a place holder 

transformation (Expression) immediately after the Source and one before the target.  Data type and Data width changes are bound to happen during development phase and these place holder transformations are used to preserve the port link between transformations.

  Best Practices

     Connect only the ports that are required in targets to subsequent transformations.  Also, active transformations that reduce the number of records should be used as early in the mapping.

  Code Optimization

     If a join must be used in the Mapping, select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table.

  Performance Improvement

       Transformations If there are multiple Lookup    Code 

Page 18: Informatica Best Practices

condition, make the condition with the “=” sign first in order to optimize the lookup performance.  Also, indexes on the database table should include every column used in the lookup condition.

Optimization

     Persistent caches should be used if the lookup data is not expected to change often.   This  cache files  are saved   and   can   be   reused   for subsequent   runs,   eliminating querying the database.

  Performance Improvement

     Integration   Service   processes numeric   operations   faster   than string operations. For example, if a lookup is done on a large amount of data   on   two   columns, EMPLOYEE_NAME   and EMPLOYEE_ID,   configuring   the lookup   around   EMPLOYEE_ID improves performance.

  Code Optimization

     Replace Complex filter expression with a flag (Y/N). Complex logic should be moved to the expression transformation and the result should be stored in a port.  Filter expression should take less time to evaluate this port rather than executing the entire logic in Filter expression.

  Best Practices

     Power Center Server automatically makes conversions between compatible data types which slowdown the performance considerably.  For example, if a mapping moves data from an Integer port to a Decimal port, then back to an Integer port, the conversion may be unnecessary.

  Performance Improvement

     Assigning default values to a port; Transformation errors written to session log will always slow down the session performance.  Try  removing default values and eliminate transformation errors.

  Performance Improvement

Page 19: Informatica Best Practices

     

Worldwide Computer Products News-8 May 2006-Informatica launches PowerCenter 8

How can we use pmcmd command in a workflow or to run a session

pmcmd>startworkflow -f foldername workflowname

Unit testing are of two types

1. Quantitaive testing

2.Qualitative testing

Steps.

1.First validate the mapping

2.Create session on themapping and then run workflow.

Once the session is succeeded the right click on session and go for statistics tab.

There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected.This is called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps

1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded according to the DATM then go and check in the code and rectify it.

This is called Qualitative testing.

This is what a devloper will do in Unit Testing.

Difference between static and dynamic cache- Static- Once the data is cached , it will not change. example unconnected lookup uses static cache.Dynamic- The cache is updated as to reflect the update in the table( or source) for which it is reffering to.(ex. connected lookup).

while using a static cache in lookup we can use all operators like =,<,>... while giving condition in condition tab

but in using dynamic cache we only can use = operator

Static cache: It is readonly cacheDynamic Cache: It is Read and WriteStatic: Informatica returns value when condition is true and if it is false it will return default value in connected look up and Null value in unconnected look up Dynamic: It will return only if condition is false

Page 20: Informatica Best Practices

Connected lookupUnconnected lookup Receives input values diectly from the pipe line.Receives input values from the result of a lkp expression in a another transformation.

U can use a dynamic or static cache U can use a static cache.

Cache includes all lookup columns used in the mapingCache includes all lookup out put ports in the lookup condition and the lookup/return port.

Support user defined default valuesDoes not support user defiend default values

Stop Vs Abort

stop: _______If the session u want to stop is a part of batch you must stop the batch,

if the batch is part of nested batch, Stop the outer most bacth

Abort:----

You can issue the abort command , it is similar to stop command except it has 60 second time out .

If the server cannot finish processing and committing data within 60 sec

Here's the difference:

ABORT is equivalent to:1. Kill -9 on Unix (NOT kill -7) but YES, Kill -92. SIGTERM ABEND (Force ABEND) on Mainframe3. Windows FORCE QUIT on application.

What does this do?Each session uses SHARED/LOCKED (semaphores) memory blocks. The ABORT function kills JUST THE CODE threads, leaving the memory LOCKED and SHARED and allocated. The good news: It appears as if AIX Operating system cleans up these lost memory blocks. The bad news? Most other operating systems DO NOT CLEAR THE MEMORY, leaving the memory "taken" from the system. The only way to clear this memory is to warm-boot/cold-boot (restart) the informatica SERVER machine, yes, the entire box must be re-started to get the memory back.

If you find your box running slower and slower over time, or not having enough memory to allocate new sessions, then I suggest that ABORT not be used.

So then the question is: When I ask for a STOP, it takes forever. How do I get the session to stop fast?

well, first things first. STOP is a REQUEST to stop. It fires a request (equivalent to a control-c in SQL*PLUS) to the source database, waits for the source database to clean up. The bigger the data in the source query, the more time it takes to "roll-back" the source query, to maintain transaction consistency in the source database. (ie: join of huge tables, big group by, big order by).

Page 21: Informatica Best Practices

It then cleans up the buffers in memory by releasing the data (without writing to the target) but it WILL run the data all the way through to the target buffers, never sending it to the target DB. The bigger the session memory allocations, the longer it takes to clean up.

Then it fires a request to stop against the target DB, and waits for the target to roll-back. The higher the commit point, the more data the target DB has to "roll-back".

FINALLY, it shuts the session down.

WHAT IF I NEED THE SESSION STOPPED NOW?Pick up the phone and call the source system DBA, have them KILL the source query IN THE DATABASE. This will send an EOF (end of file) downstream to Informatica, and Infa will take less time to stop the session.

If you use abort, be aware, you are choosing to "LOSE" memory on the server in which Informatica is running (except AIX).

If you use ABORT and you then re-start the session, chances are, not only have you lost memory - but now you have TWO competing queries on the source system after the same data, and you've locked out any hope of performance in the source database. You're competing for resources with a defunct query that's STILL rolling back.

There are 3 ways to improve session performance for an aggregator transformation :-

A)

1)Size of data cache = Bytes required for variable columns + bytes required for output columns.

2) Size of index cache = size of ports used in group by clause.

B) If you provide sorted data for group by ports aggregation will be faster, so for ports which are used in group by of an aggregator sort those ports in a sorter.

C) We can use incremental aggregation if we think that there will be no change in data which is already aggregated.

When we create a target as flat file and source as oracle.. how can i specify first rows as column names in flat files...

use a pre sql statement....but this is a hardcoding method...if you change the column names or put in extra columns in the flat file, you will have to change the insert statement

You can also achive this by changing the setting in the Informatica Repository manager to display the columns heading. The only disadvantage of this is that it will be applied on all the files that will be generated by This server

IIF(CUME(1)=1, 'col1,col2,col3,col4'||CHR(10)||to_char(col1 ),to_char(col1))

In Version 8.2 of Informatica , we have in session properties , an option Called " Header options". Use that to get the field names as the first row in the Target Flat file

Page 22: Informatica Best Practices

When importing a flat file into target designer a flat file import wizard appears. In this there is an option as 'import field names from first line'. Just check this option so integration server treats first row values as column names.

Can anyone explain error handling in informatica with examples so that it will be easy to explain the same in the interview.

There is one file called the bad file which generally has the format as *.bad and it contains the records rejected by informatica server. There are two parameters one fort the types of row and other for the types of columns. The row indicators signifies what operation is going to take place ( i.e. insertion, deletion, updation etc.). The column indicators contain information regarding why the column has been rejected.( such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in the data preesent in the bad file and then reloads the data in the target,then the table will contain only valid data.

What is difference between IIF and DECODE function

You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative:

IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3, BONUS))), 0 )

You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF :

SALES > 0 and SALES < 50, SALARY1,

SALES > 49 AND SALES < 100, SALARY2,

SALES > 99 AND SALES < 200, SALARY3,

SALES > 199, BONUS)

Decode function can used in sql statement. where as if statment cant use with SQL statement.

DEODE is similar to switch statement in c language where we test for multiple conditions.Single IIF can test for only one condition. To test for multiple conditions we have to use nested IIF. But DECODE is more readable compared to IIF.

Question: What is Target Update Override? What is the Use ?

Page 23: Informatica Best Practices

Answers

It overrides the default update statemet in the target properties

target update override it is also like souce qualifier override target update override is use ful to update the target with out using the update strategy transformation.

When we don't have primary keys defined on database level. And still we need update on this target from Informatica. We neeed to define keys at informatica level and use update override in target property. This way we can update the table.

The Integration Service updates target tables based on key values. However you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns.

When the Integration Service executes SQL against a source target or lookup database it searches the reserved words file stored in the Integration Service installation directory. It encloses matching reserved words in quotes. If you use target update override you must manually put all reserved words in quotes.

For a mapping without an Update Strategy transformation or a Custom transformation with the update strategy property enabled configure the session to mark source rows as update. The Target Update option only affects source rows marked as update. The Integration Service processes all rows marked as insert delete or reject normally.

When you configure the session mark source rows as data-driven. The Target Update Override only affects source rows marked as update by the Update Strategy or Custom transformation.

For example a mapping passes the total sales for each salesperson to the T_SALES table.

The Designer generates the following default UPDATE statement for the target

T_SALES:UPDATE T_SALES SET EMP_NAME :TU.EMP_NAME DATE_SHIPPED :TU.DATE_SHIPPED TOTAL_SALES :TU.TOTAL_SALES WHERE EMP_ID :TU.EMP_ID

Page 24: Informatica Best Practices

Because the target ports must match the target column names the update statement includes the keyword:TU to specify the ports in the target transformation.

If you modify the UPDATE portion of the statement be sure to use :TU to specify ports.

You can override the WHERE clause to include non-key columns.

For example you might want to update records for employees named Mike Smith only. To do this you edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED :TU.DATE_SHIPPED TOTAL_SALES :TU.TOTAL_SALES WHERE :TU.EMP_NAME EMP_NAME andEMP_NAME 'MIKE SMITH'Submitted by: miteshcpatel When we don't have primary keys defined on database level. And still we need update on this target from Informatica. We neeed to define keys at informatica level and use update override in target property. This way we can update the table.

Target Update (by using Update strategy) will work only when we have the Primary key in Target. Update override will happen based on this key.If we don't have the Pkey Update override will not work.

To overcome the above issue..If our Target is not having any key column or if we want to update the Target based on a column other than PKey column then we need to go for  Target Update override option available in Target properties.

How did u implement the Update Strategy multiple

Ans: Target Update Override By default, the Informatica Server updates targets based on key values. However, you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns. For a mapping without an Update Strategy transformation,configure the session to mark source records as update.  If your mapping includes an Update Strategy transformation, the Target Update option only affects source records marked as update. The Informatica Server processes all records marked as insert, delete, or reject normally. When you configure the session, mark source records as data-driven. The Target Update Override only affects source rows marked as update by the Update Strategy transformation. 

Overriding the WHERE Clause  You can override the WHERE clause to include non-key columns. For example, you might want to update records for employees named Mike Smith only. To do this, you edit the WHERE clause as follows: 

UPDATE T_SALES SET DATE_SHIPPED = :TU.DATE_SHIPPED,

Page 25: Informatica Best Practices

TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME and EMP_NAME = 'MIKE SMITH'Entering a Target Update Statement Follow these instructions to create an update statement. To enter a target update statement: 1. Double-click the title bar of a target instance. 2. Click Properties. 4. Click the arrow button in the Update Override field. 5. The SQL Editor displays. 5. Select Generate SQL. The default UPDATE statement appears. 6. Modify the update statement. You can override the WHERE clause to include non-keycolumns. 7. Click OK.

How to list Top 10 salary, without using Rank Transmission

BY USING SORTER TRANSFORMATION USING SORTED PORT AS SAL AND FILTER TRANSFORMATION TO GET FIRST 10 RECORDS

To configure a Joiner transformation to emulate a Lookup transformation in a PowerCenter mapping do the following: 1.Create a Source Definition based on the database table used for the lookup. 2.Add this Source Definition to the mapping. 3.Connect this Source Definition to the Joiner transformation as the master source 4.Connect the original source table to the Joiner as the detail source. 5.Change the Join Type to Master Outer Join . 6.Configure the join condition. The join condition must use the same ports as the lookup condition.

Limitations A Joiner can be used to emulate the lookup, but with certain limitations:

Only equi- conditions are supported. Duplicate matching is not supported. Joiner will always cache, Non-Cached is not an option.

The Lookup will always return one value when it can find data and you can choose to get the first value or last value using the lookup property Lookup Policy on Multiple Match .  However, as the Joiner does not have that property, you will have to ensure that that all key values only occur once. If this is not ensured then you will be allowing duplicates of your input data flow into the mapping.

Scenario Very often you might have to compare incoming data to previous loaded data in the warehouse. Often the target table is loaded into a lookup, comparisons are made and actions are taken. This is not always the most efficient way.  Searching for a value in a cache with millions of rows will take longer than searching for a value in a cache with a few hundred rows which can result in poor performance.  You might want to keep the caches small, however, a data warehouse only gets bigger and performance goes down. The input data often has the same size for each delivery as deltas are often created by some smart mechanism and how would you cache your input data instead of target data and make comparisons that way? How will you make the cache to be smaller and yet increase the performance?

Page 26: Informatica Best Practices

You can put your input data in a lookup cache and read all rows from the warehouse table as a source, then make comparisons and act. However, this cannot detect new data because the new data will never be looked up because it does not exist in the warehouse table. One way to address this issue is to use the Joiner transformation. The Lookup transformation is a specialized Joiner transformation and when you use the outer join option the Joiner can operate similar to a Lookup and if you use the sorted input option a very small cache is created.

Yes, we can resemble the functionality of a lookup transformation using joiner transformation.

The only difference is when there is multiple match. If there is multiple match a joiner will return all the matched rows where as a lookup will return only either the first row or last row or report an error.

As per performance joiner is more costly than lookup.

SCD 123

http://www.dwforum.net/data_warehousing_concepts/scd_12_129.0.html

SCD2 Concept is that if the source record found  match in target,then that will Inserted into the target as a new record.So in SCD2 there are multiple record will be in target redundantly i,e let say for a single Empno there is more than one record that's why it maintain history.

 

The points are as below

 

1)when the matched record is inserted into the Target Dimension table,it remain keep the source key value(Let say Empno),but the target table primary key is the surogate key which is not created upon the Employee Key.I suppose to say that it means in target table,the employee no may be redundant but the surogate key will be unique.

Page 27: Informatica Best Practices

2)The Lookup transformation always make condition upon Src.empno=Tgt.empno.Here the situation is when for a single source record if it match more than one record in target,then you can choose First,Last or any when you use Static Lookup.So by static lookup it is easier.But in case of Dynamic Lookup if for a single source record found more than one corrosponding record in target,that will fail.So you make sql override in Dynamic lookup that will extract only one match record (Last record)from target.then it will work,other wise not.

3)In dynamic Lookup,if the newlookup row is 0-You can drop it.

                                                                    1-Insert it by using the sequence genarator for Surogate key.

                                                                     2-It will be inserted but use a new sequence generator having the start value somthing more than the           privious sequence generator's last value,then the both are not contradict.

4)Overwride the sql query in dynamic lookup,that's why in every run for each source record if it found match in target then always return the last matched record.

 

Then it work fine.But the better solution is Static Lookup for SCD2.

SQL Override

Target update override:(done in designer in target instance and also view in session's target option)Update Override it is an option available in TARGET instance===============================================Update override with non Primary Key field in the where condition:Ex: Target table Target1 has A1,B1 as Primary key defined (local to informatica in designer)UPDATE OVERRIDE SQL ON TARGET TABLE:

UPDATE Target1 SET C=:TU.C,D=:TU.D WHERE A1=:TU.A1 AND B1=:TU.B1

--// already generated by informatica================================================================SQLOverride and Lookup SQL Override==============================================================Coming to SQL override it is an option available in Source Qualifier and Lookup SQL Override available in Lookup transafornmation where u can include joins ,filters,Group by and order byObviously we have some differences between those 2 Overrides...1. SQL Override is to limit the number of incoming rows entering the mapping pipelineLookup SQL Override is to limit the number of lookup rows to avoid the whole table scan by saving the lookup time & cache it uses...2. Lookup SQL Override uses "Order By" clause by defaultSQL Override doesn't uses it & should provide in our query if we require it3. SQL Override can provide any kind of join by writing the queryLookup SQL Override provides only Non-Equi joins4. Lookup SQL Override gives only one record even if it finds multiple records for a single condition SQL Override doesn't do that...=========================================================What is the difference between ODBC , DSN , NATIVE and TCP/IP Connections===========================================================

Page 28: Informatica Best Practices

ODBC is the database drivers which is used to connect from clinet to the relational database.DSN is the data service name which is the name of the odbc name give by the customer in order to access from the clientNATIE Drivers is the drivers which allows the client to acess the source and target after loading them from database to the workspaceTCP/IP it is the protocal used for connecting the remote computers.here the repository contacts with informatica repository server through TCP/IP