Informatica Best Practices
-
Upload
zipzapdhoom -
Category
Documents
-
view
955 -
download
4
Transcript of Informatica Best Practices
Informatica - Best Practices
- HOW TO: Take repository backup on weekly basis - HOW TO: Retain multiple session and workflow log files in the session log folder - HOW TO: Renaming Informatica Repository Name HOW TO: Take Informatica repository backup on a daily basis It is always recommended to take full backup of development repository on a daily basis and keep it at least for a week period, so that in case of accidental deletion of mapping it can be restored from the backup. Basic batch program that automatically takes repository backup and zip the backup and keep it for a week duration:- REM ----------------------------------------------------------REM --SET VARIABLESREM ----------------------------------------------------------set LOGFILENM=<Repository Name>set REP=<Repository Name>set USER=Administrator set PWD=<Admin Password>set BKPPATH=Drive:\<Location>Set DAY=%DATE:~0,3%REM ----------------------------------------------------------ECHO. |TIME > %LOGFILENAME%.logECHO "Backup Started & Connecting to Repository" >> %LOGFILENAME%.logpmrep connect -r %REP% -n %USER% -x %PWD% -h DEVMAX -o 5001ECHO "Repository Connected" >> %LOGFILENAME%.logpmrep Backup -o %BKPPATH%%LOGFILENAME%_%DAY%.rep -f -b -j -q
ECHO. |TIME >> %LOGFILENAME%.logECHO "Backup Completed" >> %LOGFILENAME%.logzip -r -9 -S -m %LOGFILENAME%_%DAY%.rep.zip %LOGFILENAME%_%DAY%.repECHO "Zipping complete!!!" >> %LOGFILENAME%.log Example If LOGFILENM = DV_REP then Output of the above script will be DV_REP_Mon.zip
The above batch program on Informatica server can be scheduled to run in night on a daily basis. HOW TO: Retain multiple session and workflow log files in the session log folder
You can set sessions to retain log files according to the following options:
1. Session Runs. The Integration Service creates a designated number of session log files. Configure the number of session logs in the Save Session Log for These Runs option. The Integration Service does not archive binary logs.
2. Session Time Stamp. The Integration Service creates a log for all sessions, appending a time stamp to each log. When you save a session log by time stamp, the Integration Service archives the binary logs and text log files.
To retain multiple session and workflow log files in the session log folder edit the default session configuration as follows:
1. Open the Workflow Manager. 2. Select Tasks > Session Configuration > Edit 3. Click the Properties tab. 4. For Save Session log by select Session runs or Session timestamp from the list 5. Optional. If you selected Session runs in step 4, enter a numeric value for the Save session log
for these runs attribute:
Note: These settings will take effect for new sessions you create.For old existing sessions you will get the Revert option. If you click on Revert, then it will take your modified configuration file settings.
This change will be applicable to all the new sessions. $PMSessionLogCount Integration Service Variable You can also use the $PMSessionLogCount service variable to create the configured number of session logs for the Integration Service.
1. Go to Administration Console. 2. Go to the Integration Service > Properties > General Properties. 3. Set the value $PMSessionLogCount.
In this example it is set to 2 which will keep a history of 3 (2+1) log files.
4. Open the Workflow Manager. 5. Select Tasks > Session Configuration > Edit. 6. Select the Properties tab. 7. Use of variable $PMSessionLogCount set in the Administration Console:
HOW TO: Renaming Informatica Repository Name
There may be cases when you would like to rename Informatica Repository, the effective way of doing as follows - Make sure while performing following operation, none of the developers are connected to Informatica Repository.
Step 1: Using Repository Server Administration - Create repository backup: File Name “BI_Informatica_Rep” and skip workflow and session logs (this will save time and
space) as shown in figure 1.
Figure 1: Repository Backup - Advance Options
Step 2: Validate the backup is created on X:\Infa_714\RepositoryServer\bin\Backup Step 3: Create a copy of license file to NewName.lic from OldName.lic, the license file can be found at X:\Infa_714\RepositoryServer\bin Step 4: Using Repository Server Administration – Drop the existing repository Step 5: Using Repository Server Administration – Create new repository from backup the file "BI_Informatica_Rep" as shown in Figure 2
Figure 2: Create New Repository using Restore
Informatica Performance Improvement Tips
We often come across situations where Data Transformation Manager (DTM) takes more time to read from Source or when writing in to a Target. Following standards/guidelines can improve the overall performance.
Use Source Qualifier if the Source tables reside in the same schema Make use of Source Qualifer “Filter” Properties if the Source type is Relational. If the subsequent sessions are doing lookup on the same table, use persistent cache in the first
session. Data remains in the Cache and available for the subsequent session for usage. Use flags as integer, as the integer comparison is faster than the string comparison. Use tables with lesser number of records as master table for joins. While reading from Flat files, define the appropriate data type instead of reading as String and
converting. Have all Ports that are required connected to Subsequent Transformations else check whether
we can remove these ports Suppress ORDER BY using the ‘–‘ at the end of the query in Lookup Transformations Minimize the number of Update strategies. Group by simple columns in transformations like Aggregate, Source Qualifier Use Router transformation in place of multiple Filter transformations. Turn off the Verbose Logging while moving the mappings to UAT/Production environment. For large volume of data drop index before loading and recreate indexes after load. For large of volume of records Use Bulk load Increase the commit interval to a higher value large
volume of data Set ‘Commit on Target’ in the sessions
Leveraging Metadata in Informatica Workflow-Session/Analysis
We can leverage the metadata collected in the Informatica repository for many interesting analysis, few of the scenarios where I have leveraged the Informatica Metadata are as following.
This SQL Queries can be executed in Oracle database with no changes and requires little modification with other databases.
Failed Sessions The following query lists the failed sessions. To make it work for the last ‘n’ days, replace
SYSDATE-1 with SYSDATE – n QUERY: SELECT SUBJECT_AREA AS FOLDER_NAME, SESSION_NAME, LAST_ERROR AS ERROR_MESSAGE, DECODE (RUN_STATUS_CODE,3,’Failed’,4,’Stopped’,5,’Aborted’) AS STATUS, ACTUAL_START AS START_TIME, SESSION_TIMESTAMP FROM REP_SESS_LOG WHERE RUN_STATUS_CODE != 1 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE) RESULT:
Long running Sessions The following query lists long running sessions. To make it work for the last ‘n’ days, replace
SYSDATE-1 with SYSDATE – n QUERY: SELECT SUBJECT_AREA AS FOLDER_NAME, SESSION_NAME, SUCCESSFUL_SOURCE_ROWS AS SOURCE_ROWS, SUCCESSFUL_ROWS AS TARGET_ROWS, ACTUAL_START AS START_TIME, SESSION_TIMESTAMP FROM REP_SESS_LOG WHERE RUN_STATUS_CODE = 1 AND TRUNC(ACTUAL_START) BETWEEN TRUNC(SYSDATE -1) AND TRUNC(SYSDATE) AND (SESSION_TIMESTAMP – ACTUAL_START) > (10/(24*60)) ORDER BY SESSION_TIMESTAMP RESULT:
Invalid Tasks The following query lists folder names and task name, version number, and last saved for all
invalid tasks.
QUERY: SELECT SUBJECT_AREA AS FOLDER_NAME, DECODE(IS_REUSABLE,1,’Reusable’,’ ‘) || ‘ ‘ ||TASK_TYPE_NAME AS TASK_TYPE, TASK_NAME AS OBJECT_NAME, VERSION_NUMBER, LAST_SAVED FROM REP_ALL_TASKS WHERE IS_VALID=0 AND IS_ENABLED=1 ORDER BY SUBJECT_AREA,TASK_NAME RESULT:
Merge Rows as Columns / Transpose records
The Requirement: Converting rows to columnsCustomer Product CostCust1 P1 10Cust1 P2 20Cust1 P3 30Cust2 ABC 10Cust2 P2 25Cust2 Def 10Customer
Product1
Cost1
Product2
Cost2
Product3
Cost3
Cust1 P1 10 P2 20 P3 30Cust2 ABC 10 P2 25 Def 10The above illustration would help in understanding the requirement. We had to merge multiple records into one record based on certain criteria. The design had to be reusable since each dimension within the data mart required this flattening logic.
1. Approach:
The use of aggregator transformation would group the records by a key, but retrieval of the values for a particular column as individual columns is a challenge, hence designed a component ‘Flattener’ based on expression transformation.Flattener is a reusable component, a mapplet that performs the function of flattening records.Flattener consists of an Expression and a Filter transformation. The expression is used to club each incoming record based on certain logic. Decision to write the record to target is taken using the Filter transformation.2. Design:The mapplet can receive up to five inputs, of the following data types:i_Col1 (string), Customeri_Col2 (string), Producti_Col3 (decimal), Costi_Col4 (decimal) andi_Col5 (date/time)Have kept the names generic trying to accept different data types, so that the mapplet can be used in any scenario where there is a need for flattening records. The mapplet gives out 15×5 sets of output, in the following manner:o_F1_1 (string), Customero_F2_1 (string), Product1o_F3_1 (decimal), Cost1o_F4_1 (decimal) ando_F5_1 (date/time)o_F1_2 (string), Customer o_F2_2 (string), Product2o_F3_2 (decimal), Cost2o_F4_2 (decimal) ando_F5_2 (date/time)… … and so onThe output record is going to have repetitive sets of 5 columns each (Each set would refer to one incoming row). Based on the requirement the number of occurrence of these sets can be increased. The required fields alone can be used / mapped. For the above example we use just 2 strings and one decimal for mapping Customer, Product and Cost.The mapplet receives records from its parent mapping. The Expression would initially save each incoming value to a variable and compare it with its counterpart that came in earlier and is held in its cache as long as the condition to flatten is satisfied.
Syntax to store current and previous values:i_Col2 string iprv_Col2 string v curr_Col2curr_Col2 string v i_Col2
The condition/logic to flatten records is parameterized and decided before mapping is called thus increasing codes’ scalability. The parameterized logic is passed to the Expression transformation via a Mapplet parameter. The value is used as an expression to perform the evaluation and the result is a flag value either ‘1’ or ‘2’.
Syntax for port – flagFlag integer v $$Expr_compare
An example for parameterized expression
$$Expr_compare = iif (curr_Col1 = prv_Col1 AND curr_Col2 != prv_Col2, 1, iif (curr_Col1 != prv_Col1,2))
A variable port named “rec_count” is incremented, based on the flag. Syntax for port – rec_count
rec_count integer v iif (flag=2,0, iif (flag=1,rec_count + 1,rec_count))The expression transformation now uses the value in ports “flag” and “rec_count” to decide the place holder for each incoming input value, i.e. the column in target table where this data would move into ultimately. This process is an iterative one and goes on until the comparison logic ($$Expr_compare) holds good, i.e. until all records get flattened per the logic. An example of the place holder expression is shown below:
v_Field1 data type v iif(flag=2 AND rec_count=0,curr_Col1, v_Field1)Port “write_flag_1” is set to 1 when the comparison logic fails (meaning flattening is complete). Filter transformation filters out the record once it is completely transposed.
Filter condition: write_flag_1 integer v iif (flag=2 AND write_flag>1 ,1,0)
3. Outcome:After developing the code and implementing the same we found it to be a useful utility, so thought of sharing it and would like to hear suggestions from readers on performing the same functionality in a different way. Please do share your views.
Informatica and Stored Procedures
A. Described below is a scenario where the requirement is to have a stored procedure that returns a cursor as a source.By and large PowerCenter does not support a stored procedure that returns a cursor as a source. The workaround for this is1. The procedure that will load the data to a new table:
CREATE OR REPLACE procedure load (p_initial_date in date, p_final_Date in date) as str_load varchar2 (500);str_clean varchar2 (500);BEGIN str_clean:= ‘DELETE FROM EMP’;str_load:= ‘INSERT INTO EMP select * from EMPLOYEE where DOJ between trunc (p_initial_date) and trunc (p_final_Date) ‘;execute immediate str_clean;execute immediate str_load;EXCEPTIONWHEN OTHERSTHENROLLBACK;END LOAD;
2. Create the table that will receive the data from the procedure:
SQL> create table EMP as SELECT * from EMPLOYEE where 1 > 2;
3. Add a Store Procedure transformation to the PowerCenter mapping. This transformation will execute this new procedure called as LOAD on this example.
4. Set the run method to be Source Pre Load, to be executed before read the source table.
5. Import the EMP table as a Source Definition. This table will be populated by the new Store Procedure.
If the original store procedure is used by the customer application and you can’t change the source code, you can create a new store procedure that call the original one (without inserting into a table), and execute the insert on the new table executing a loop on the returned cursor.
B. Given below is a situation where you wanted to pass a mapping variable to a stored procedure transformation (it can either be connected or unconnected).
Connected Stored Procedure
The parameters that are passed to a connected Stored Procedure have to be linked from another transformation.Given below are the steps to pass mapping variable to a connected Stored Procedure transformation:
1. Create an Expression transformation. 2. Create an output port in the Expression transformation with the following expression:
$$mapping_variable
This sets the value of this output port to the mapping variable.
1. Link this output port to the Stored Procedure transformation.
Unconnected Stored Procedure
For unconnected Stored Procedure transformations you can use the mapping variable in the expression calling the stored procedure.Follow the steps below to pass mapping variable to a unconnected Stored Procedure transformation:
1. Create an Expression transformation. 2. Create an output port in the Expression transformation with the following expression:
: SP.GET_NAME_FROM_ID ($$mapping_variable, PROC_RESULT)
In case if you are attempting to use a mapping variable to store the output value of the stored procedure, the session will fail with the below error.
“TE_7002 Transformation Parse Fatal Error; transformation stopped: invalid function reference. Failed to Initialize Server Transformation.”
To resolve the issue replace the mapping variable with the PROC_RESULT system variable.
Example: Incorrect, using a mapping variable: :SP.PROCEDURE(FIELD1, $$mapping_variable) Correct, using the PROC_RESULT system variable: :SP.PROCEDURE(FIELD1,PROC_RESULT) Or
:SP.PROCEDURE($$mapping_variable,PROC_RESULT)
The PROC_RESULT system variable assigns the stored procedure output to the port with this expression.
Output Files in Informatica
The Integration Service process generates output files when we run workflows and sessions. By default, the Integration Service logs status and error messages to log event files.
Log event files are binary files that the Log Manager uses to display log events. When we run each session, the Integration Service also creates a reject file. Depending on transformation cache settings and target types, the Integration Service may create additional files as well.The Integration Service creates the following output files:
Output Files
Session Details/logs:
When we run a session, the Integration service creates session log file with the load statistics/table names/Error information/threads created etc based on the tracing level that have set in the session properties.
We can monitor session details in the session run properties while session running/failed/succeeded.
Workflow Log:
Workflow log is available in Workflow Monitor. The Integration Service process creates a workflow log for each workflow it runs. It writes information in the workflow log such as
o Initialization of processes, o Workflow task run information, o Errors encountered and o Workflows run summary.
The Integration Service can also be configured to suppress writing messages to the workflow log file.
As with Integration Service logs and session logs, the Integration Service process enters a code number into the workflow log file message along with message text.
Performance Detail File:
The Integration Service process generates performance details for session runs. Through the performance details file we can determine where session performance can be
improved. Performance details provide transformation-by-transformation information on the flow of data
through the session.
Reject Files:
By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets.
The writer may reject a row in the following circumstances: o It is flagged for reject by an Update Strategy or Custom transformation. o It violates a database constraint such as primary key constraint o A field in the row was truncated or overflowed o The target database is configured to reject truncated or overflowed data.
Note: By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. We can view this file name in session level.
Open Session – Select any of the target View the options o Reject File directory. o Reject file name.
If you enable row error logging, the Integration Service process does not create a reject file.
Row Error Logs:
When we configure a session, we can choose to log row errors in a central location. When a row error occurs, the Integration Service process logs error information that allows to
determine the cause and source of the error.
The Integration Service process logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information.
we enable flat file logging, by default, the Integration Service process saves the file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager.
Recovery Tables Files:
The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery.
When you run a session in recovery mode, the Integration Service process uses information in the recovery tables to complete the session.
When the Integration Service process performs recovery, it restores the state of operations to recover the workflow from the point of interruption.
The workflow state of operations includes information such as active service requests, completed and running status, workflow variable values, running workflows and sessions, and workflow schedules.
Control File:
When we run a session that uses an external loader, the Integration Service process creates a control file and a target flat file.
The control file contains information about the target flat file such as data format and loading instructions for the external loader.
The control file has an extension of .ctl. The Integration Service process creates the control file and the target flat file in the Integration Service variable directory, $PMTargetFileDir, by default.
Email:
We can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer and the Email task can be placed in a workflow, or can be associated it with a session.
The Email task allows to automatically communicate information about a workflow or session run to designated recipients.
Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, we can create two different messages, one to be sent if the session completes successfully, the other if the session fails.
We can also use variables to generate information about the session name, status, and total rows loaded.
Indicator File:
If we use a flat file as a target, we can configure the Integration Service to create an indicator file for target row type information.
For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject.
The Integration Service process names this file target_name.ind and stores it in the Integration Service variable directory, $PMTargetFileDir, by default.
Target or Output File:
If the session writes to a target file, the Integration Service process creates the target file based on a file target definition.
By default, the Integration Service process names the target file based on the target definition name.
If a mapping contains multiple instances of the same target, the Integration Service process names the target files based on the target instance name.
The Integration Service process creates this file in the Integration Service variable directory, $PMTargetFileDir, by default.
Cache Files:
When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects:
o Aggregator transformation o Joiner transformation o Rank transformation o Lookup transformation o Sorter transformation o XML target
By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations and XML targets in the directory configured for the $PMCacheDir service process variable.
Processing Multiple XML Files through Informatica – 1 Problem Statement: Data to be processed in Informatica were XML files in nature. The number
of XML files to be processed was dynamic in nature. The need was also to ensure that the XML file name from which data is being processed is to be captured.
Resolution:Option 1 – Using File list as part of Indirect File Sources in sessionOption 2 – Using Parameter File and workflow variable
Implementation Details for option 1: Using File list XML file names to be processed were read using batch script and file list was created containing
XML file. This file list name was set under source properties at session level. XML file were read sequentially and data pertaining to every XML file was processed. Since the number of XML files to be processed was dynamic the need of the hour was to achieve looping in Informatica.
Challenge in using File List – Created in a session to run multiple source files for one source instance in the mapping. When file list is used in a mapping as multiple source files for one source instance, the properties of all files must match the source definition. File list are configured in session properties by mentioning the file name of the file list in the Source Filename field and location of the file list in the Source File Directory field. When the session starts, the Integration Service reads the file list, then locates and reads the first file source in the list. After the Integration Service reads the first file, it locates and reads the next file in the list. The issue using XML file names in file list was further compounded by Informatica grouping records pertaining to similar XML node together. This lead to difficultly in identifying which record belonged to which XML file.
Batch Script – batch scripts controlled over all looping in Informatica by encompassing below mentioned tasks:• Reading XML file names from staging location and creating file list containing XML file names.• Moving XML files from staging location to archive location.• Verifying whether there are any more XML files to be processed and depending on the
outcome either loop the process by invoking first workflow or end the process• Using PMCMD commands invoke appropriate workflows.
Workflow Details –There were two Informatica workflows designed to achieve looping:• First workflow –created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are:o Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server.o Command task which will execute the second workflow to process data within XML files.
• Second workflow will read process XML files and populate staging tables. Details of workflow are:o A session will read XML file names using indirect file and load into staging tables.o A command task will move the XML file just processed in file into an archive folder. Using batch scripto A command task will execute a batch script which will:Check whether there are any more XML files to be processed. If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables. If no then process will complete.
Informatica Development Best Practice – Workflow
Workflow Manager default properties can be modified to improve the overall performance and few of them are listed below. This properties can impact the ETL runtime directly and needs to configured based on :
i) Source Databaseii) Target Databaseiii) Data Volume
Category Technique
Session Properties
While loading Staging Tables for FULL LOADS, Truncate target table option should be checked. Based on the Target database and the primary key defined, Integration Service fires TRUNCATE or DELETE statement.Database Primary Key Defined No Primary KeyDB2 TRUNCATE TRUNCATEINFORMIX DELETE DELETEODBC DELETE DELETEORACLE DELETE UNRECOVERABLE TRUNCATEMSSQL DELETE TRUNCATESYBASE TRUNCATE TRUNCATE
Workflow Property “Commit interval” (Default value : 10,000) should be increased for increased for Volumes more than 1 million records. Database Rollback Segment size should also be updated, while increasing “Commit Interval”.
Insert/Update/Delete options should be set as determined by the target population method.
Target Option Integration ServiceInsert Uses Target update Option Update as Update Update as Insert Update else InsertUpdate as update Updates all rows as UpdateUpdate as Insert Inserts all rowsUpdate else Insert Updates existing rows else Insert
Partition Maximum number of partitions for a session should be 1.5 times the number of processes in the Informatica server. i.e. 1.5 X 4 Processors = 6 partitions. Key Value partitions should be used only when an even Distribution of data can be obtained. In other cases, Pass Through partitions should be used.A Source filter should be added to evenly distribute the data between Pass through Partitions. Key Value should have ONLY numeric values. MOD(NVL(<Numeric Key Value>,0),# No of Partitions defined) Ex: MOD(NVL(product_sys_no,0),6) If a session contains “N” partition, increase the DTM Buffer Size to at least “N” times the value for the session with One partition If the Source or Target database is of MPP( Massively Parallel Processing ), enable Pushdown Optimization. By enabling this, Integration Service will push as much Transformation Logic to Source database or Target database or FULL ( both ) , based on the settings. This property
can be ignored for Conventional databases.
Informatica Development Best Practices – Mapping
The following are generally accepted “Best Practices” for Informatica PowerCenter ETL development and if implemented, can significantly improve the overall performance.
Category Technique Benefits Source Extracts Loading data from Fixed-width files
take less time than delimited, since delimited files require extra parsing. Incase of Fixed width files, Integration service know the Start and End position of each columns upfront and thus reduces the processing time.
Performance Improvement
Using flat files located on the server machine loads faster than a database located on the server machine.
Performance Improvement
Mapping Designer There should be a place holder
transformation (Expression) immediately after the Source and one before the target. Data type and Data width changes are bound to happen during development phase and these place holder transformations are used to preserve the port link between transformations.
Best Practices
Connect only the ports that are required in targets to subsequent transformations. Also, active transformations that reduce the number of records should be used as early in the mapping.
Code Optimization
If a join must be used in the Mapping, select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table.
Performance Improvement
Transformations If there are multiple Lookup Code
condition, make the condition with the “=” sign first in order to optimize the lookup performance. Also, indexes on the database table should include every column used in the lookup condition.
Optimization
Persistent caches should be used if the lookup data is not expected to change often. This cache files are saved and can be reused for subsequent runs, eliminating querying the database.
Performance Improvement
Integration Service processes numeric operations faster than string operations. For example, if a lookup is done on a large amount of data on two columns, EMPLOYEE_NAME and EMPLOYEE_ID, configuring the lookup around EMPLOYEE_ID improves performance.
Code Optimization
Replace Complex filter expression with a flag (Y/N). Complex logic should be moved to the expression transformation and the result should be stored in a port. Filter expression should take less time to evaluate this port rather than executing the entire logic in Filter expression.
Best Practices
Power Center Server automatically makes conversions between compatible data types which slowdown the performance considerably. For example, if a mapping moves data from an Integer port to a Decimal port, then back to an Integer port, the conversion may be unnecessary.
Performance Improvement
Assigning default values to a port; Transformation errors written to session log will always slow down the session performance. Try removing default values and eliminate transformation errors.
Performance Improvement
Worldwide Computer Products News-8 May 2006-Informatica launches PowerCenter 8
How can we use pmcmd command in a workflow or to run a session
pmcmd>startworkflow -f foldername workflowname
Unit testing are of two types
1. Quantitaive testing
2.Qualitative testing
Steps.
1.First validate the mapping
2.Create session on themapping and then run workflow.
Once the session is succeeded the right click on session and go for statistics tab.
There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected.This is called Quantitative testing.
If once rows are successfully loaded then we will go for qualitative testing.
Steps
1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded according to the DATM then go and check in the code and rectify it.
This is called Qualitative testing.
This is what a devloper will do in Unit Testing.
Difference between static and dynamic cache- Static- Once the data is cached , it will not change. example unconnected lookup uses static cache.Dynamic- The cache is updated as to reflect the update in the table( or source) for which it is reffering to.(ex. connected lookup).
while using a static cache in lookup we can use all operators like =,<,>... while giving condition in condition tab
but in using dynamic cache we only can use = operator
Static cache: It is readonly cacheDynamic Cache: It is Read and WriteStatic: Informatica returns value when condition is true and if it is false it will return default value in connected look up and Null value in unconnected look up Dynamic: It will return only if condition is false
Connected lookupUnconnected lookup Receives input values diectly from the pipe line.Receives input values from the result of a lkp expression in a another transformation.
U can use a dynamic or static cache U can use a static cache.
Cache includes all lookup columns used in the mapingCache includes all lookup out put ports in the lookup condition and the lookup/return port.
Support user defined default valuesDoes not support user defiend default values
Stop Vs Abort
stop: _______If the session u want to stop is a part of batch you must stop the batch,
if the batch is part of nested batch, Stop the outer most bacth
Abort:----
You can issue the abort command , it is similar to stop command except it has 60 second time out .
If the server cannot finish processing and committing data within 60 sec
Here's the difference:
ABORT is equivalent to:1. Kill -9 on Unix (NOT kill -7) but YES, Kill -92. SIGTERM ABEND (Force ABEND) on Mainframe3. Windows FORCE QUIT on application.
What does this do?Each session uses SHARED/LOCKED (semaphores) memory blocks. The ABORT function kills JUST THE CODE threads, leaving the memory LOCKED and SHARED and allocated. The good news: It appears as if AIX Operating system cleans up these lost memory blocks. The bad news? Most other operating systems DO NOT CLEAR THE MEMORY, leaving the memory "taken" from the system. The only way to clear this memory is to warm-boot/cold-boot (restart) the informatica SERVER machine, yes, the entire box must be re-started to get the memory back.
If you find your box running slower and slower over time, or not having enough memory to allocate new sessions, then I suggest that ABORT not be used.
So then the question is: When I ask for a STOP, it takes forever. How do I get the session to stop fast?
well, first things first. STOP is a REQUEST to stop. It fires a request (equivalent to a control-c in SQL*PLUS) to the source database, waits for the source database to clean up. The bigger the data in the source query, the more time it takes to "roll-back" the source query, to maintain transaction consistency in the source database. (ie: join of huge tables, big group by, big order by).
It then cleans up the buffers in memory by releasing the data (without writing to the target) but it WILL run the data all the way through to the target buffers, never sending it to the target DB. The bigger the session memory allocations, the longer it takes to clean up.
Then it fires a request to stop against the target DB, and waits for the target to roll-back. The higher the commit point, the more data the target DB has to "roll-back".
FINALLY, it shuts the session down.
WHAT IF I NEED THE SESSION STOPPED NOW?Pick up the phone and call the source system DBA, have them KILL the source query IN THE DATABASE. This will send an EOF (end of file) downstream to Informatica, and Infa will take less time to stop the session.
If you use abort, be aware, you are choosing to "LOSE" memory on the server in which Informatica is running (except AIX).
If you use ABORT and you then re-start the session, chances are, not only have you lost memory - but now you have TWO competing queries on the source system after the same data, and you've locked out any hope of performance in the source database. You're competing for resources with a defunct query that's STILL rolling back.
There are 3 ways to improve session performance for an aggregator transformation :-
A)
1)Size of data cache = Bytes required for variable columns + bytes required for output columns.
2) Size of index cache = size of ports used in group by clause.
B) If you provide sorted data for group by ports aggregation will be faster, so for ports which are used in group by of an aggregator sort those ports in a sorter.
C) We can use incremental aggregation if we think that there will be no change in data which is already aggregated.
When we create a target as flat file and source as oracle.. how can i specify first rows as column names in flat files...
use a pre sql statement....but this is a hardcoding method...if you change the column names or put in extra columns in the flat file, you will have to change the insert statement
You can also achive this by changing the setting in the Informatica Repository manager to display the columns heading. The only disadvantage of this is that it will be applied on all the files that will be generated by This server
IIF(CUME(1)=1, 'col1,col2,col3,col4'||CHR(10)||to_char(col1 ),to_char(col1))
In Version 8.2 of Informatica , we have in session properties , an option Called " Header options". Use that to get the field names as the first row in the Target Flat file
When importing a flat file into target designer a flat file import wizard appears. In this there is an option as 'import field names from first line'. Just check this option so integration server treats first row values as column names.
Can anyone explain error handling in informatica with examples so that it will be easy to explain the same in the interview.
There is one file called the bad file which generally has the format as *.bad and it contains the records rejected by informatica server. There are two parameters one fort the types of row and other for the types of columns. The row indicators signifies what operation is going to take place ( i.e. insertion, deletion, updation etc.). The column indicators contain information regarding why the column has been rejected.( such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in the data preesent in the bad file and then reloads the data in the target,then the table will contain only valid data.
What is difference between IIF and DECODE function
You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3, BONUS))), 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF :
SALES > 0 and SALES < 50, SALARY1,
SALES > 49 AND SALES < 100, SALARY2,
SALES > 99 AND SALES < 200, SALARY3,
SALES > 199, BONUS)
Decode function can used in sql statement. where as if statment cant use with SQL statement.
DEODE is similar to switch statement in c language where we test for multiple conditions.Single IIF can test for only one condition. To test for multiple conditions we have to use nested IIF. But DECODE is more readable compared to IIF.
Question: What is Target Update Override? What is the Use ?
Answers
It overrides the default update statemet in the target properties
target update override it is also like souce qualifier override target update override is use ful to update the target with out using the update strategy transformation.
When we don't have primary keys defined on database level. And still we need update on this target from Informatica. We neeed to define keys at informatica level and use update override in target property. This way we can update the table.
The Integration Service updates target tables based on key values. However you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns.
When the Integration Service executes SQL against a source target or lookup database it searches the reserved words file stored in the Integration Service installation directory. It encloses matching reserved words in quotes. If you use target update override you must manually put all reserved words in quotes.
For a mapping without an Update Strategy transformation or a Custom transformation with the update strategy property enabled configure the session to mark source rows as update. The Target Update option only affects source rows marked as update. The Integration Service processes all rows marked as insert delete or reject normally.
When you configure the session mark source rows as data-driven. The Target Update Override only affects source rows marked as update by the Update Strategy or Custom transformation.
For example a mapping passes the total sales for each salesperson to the T_SALES table.
The Designer generates the following default UPDATE statement for the target
T_SALES:UPDATE T_SALES SET EMP_NAME :TU.EMP_NAME DATE_SHIPPED :TU.DATE_SHIPPED TOTAL_SALES :TU.TOTAL_SALES WHERE EMP_ID :TU.EMP_ID
Because the target ports must match the target column names the update statement includes the keyword:TU to specify the ports in the target transformation.
If you modify the UPDATE portion of the statement be sure to use :TU to specify ports.
You can override the WHERE clause to include non-key columns.
For example you might want to update records for employees named Mike Smith only. To do this you edit the WHERE clause as follows:
UPDATE T_SALES SET DATE_SHIPPED :TU.DATE_SHIPPED TOTAL_SALES :TU.TOTAL_SALES WHERE :TU.EMP_NAME EMP_NAME andEMP_NAME 'MIKE SMITH'Submitted by: miteshcpatel When we don't have primary keys defined on database level. And still we need update on this target from Informatica. We neeed to define keys at informatica level and use update override in target property. This way we can update the table.
Target Update (by using Update strategy) will work only when we have the Primary key in Target. Update override will happen based on this key.If we don't have the Pkey Update override will not work.
To overcome the above issue..If our Target is not having any key column or if we want to update the Target based on a column other than PKey column then we need to go for Target Update override option available in Target properties.
How did u implement the Update Strategy multiple
Ans: Target Update Override By default, the Informatica Server updates targets based on key values. However, you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns. For a mapping without an Update Strategy transformation,configure the session to mark source records as update. If your mapping includes an Update Strategy transformation, the Target Update option only affects source records marked as update. The Informatica Server processes all records marked as insert, delete, or reject normally. When you configure the session, mark source records as data-driven. The Target Update Override only affects source rows marked as update by the Update Strategy transformation.
Overriding the WHERE Clause You can override the WHERE clause to include non-key columns. For example, you might want to update records for employees named Mike Smith only. To do this, you edit the WHERE clause as follows:
UPDATE T_SALES SET DATE_SHIPPED = :TU.DATE_SHIPPED,
TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME and EMP_NAME = 'MIKE SMITH'Entering a Target Update Statement Follow these instructions to create an update statement. To enter a target update statement: 1. Double-click the title bar of a target instance. 2. Click Properties. 4. Click the arrow button in the Update Override field. 5. The SQL Editor displays. 5. Select Generate SQL. The default UPDATE statement appears. 6. Modify the update statement. You can override the WHERE clause to include non-keycolumns. 7. Click OK.
How to list Top 10 salary, without using Rank Transmission
BY USING SORTER TRANSFORMATION USING SORTED PORT AS SAL AND FILTER TRANSFORMATION TO GET FIRST 10 RECORDS
To configure a Joiner transformation to emulate a Lookup transformation in a PowerCenter mapping do the following: 1.Create a Source Definition based on the database table used for the lookup. 2.Add this Source Definition to the mapping. 3.Connect this Source Definition to the Joiner transformation as the master source 4.Connect the original source table to the Joiner as the detail source. 5.Change the Join Type to Master Outer Join . 6.Configure the join condition. The join condition must use the same ports as the lookup condition.
Limitations A Joiner can be used to emulate the lookup, but with certain limitations:
Only equi- conditions are supported. Duplicate matching is not supported. Joiner will always cache, Non-Cached is not an option.
The Lookup will always return one value when it can find data and you can choose to get the first value or last value using the lookup property Lookup Policy on Multiple Match . However, as the Joiner does not have that property, you will have to ensure that that all key values only occur once. If this is not ensured then you will be allowing duplicates of your input data flow into the mapping.
Scenario Very often you might have to compare incoming data to previous loaded data in the warehouse. Often the target table is loaded into a lookup, comparisons are made and actions are taken. This is not always the most efficient way. Searching for a value in a cache with millions of rows will take longer than searching for a value in a cache with a few hundred rows which can result in poor performance. You might want to keep the caches small, however, a data warehouse only gets bigger and performance goes down. The input data often has the same size for each delivery as deltas are often created by some smart mechanism and how would you cache your input data instead of target data and make comparisons that way? How will you make the cache to be smaller and yet increase the performance?
You can put your input data in a lookup cache and read all rows from the warehouse table as a source, then make comparisons and act. However, this cannot detect new data because the new data will never be looked up because it does not exist in the warehouse table. One way to address this issue is to use the Joiner transformation. The Lookup transformation is a specialized Joiner transformation and when you use the outer join option the Joiner can operate similar to a Lookup and if you use the sorted input option a very small cache is created.
Yes, we can resemble the functionality of a lookup transformation using joiner transformation.
The only difference is when there is multiple match. If there is multiple match a joiner will return all the matched rows where as a lookup will return only either the first row or last row or report an error.
As per performance joiner is more costly than lookup.
SCD 123
http://www.dwforum.net/data_warehousing_concepts/scd_12_129.0.html
SCD2 Concept is that if the source record found match in target,then that will Inserted into the target as a new record.So in SCD2 there are multiple record will be in target redundantly i,e let say for a single Empno there is more than one record that's why it maintain history.
The points are as below
1)when the matched record is inserted into the Target Dimension table,it remain keep the source key value(Let say Empno),but the target table primary key is the surogate key which is not created upon the Employee Key.I suppose to say that it means in target table,the employee no may be redundant but the surogate key will be unique.
2)The Lookup transformation always make condition upon Src.empno=Tgt.empno.Here the situation is when for a single source record if it match more than one record in target,then you can choose First,Last or any when you use Static Lookup.So by static lookup it is easier.But in case of Dynamic Lookup if for a single source record found more than one corrosponding record in target,that will fail.So you make sql override in Dynamic lookup that will extract only one match record (Last record)from target.then it will work,other wise not.
3)In dynamic Lookup,if the newlookup row is 0-You can drop it.
1-Insert it by using the sequence genarator for Surogate key.
2-It will be inserted but use a new sequence generator having the start value somthing more than the privious sequence generator's last value,then the both are not contradict.
4)Overwride the sql query in dynamic lookup,that's why in every run for each source record if it found match in target then always return the last matched record.
Then it work fine.But the better solution is Static Lookup for SCD2.
SQL Override
Target update override:(done in designer in target instance and also view in session's target option)Update Override it is an option available in TARGET instance===============================================Update override with non Primary Key field in the where condition:Ex: Target table Target1 has A1,B1 as Primary key defined (local to informatica in designer)UPDATE OVERRIDE SQL ON TARGET TABLE:
UPDATE Target1 SET C=:TU.C,D=:TU.D WHERE A1=:TU.A1 AND B1=:TU.B1
--// already generated by informatica================================================================SQLOverride and Lookup SQL Override==============================================================Coming to SQL override it is an option available in Source Qualifier and Lookup SQL Override available in Lookup transafornmation where u can include joins ,filters,Group by and order byObviously we have some differences between those 2 Overrides...1. SQL Override is to limit the number of incoming rows entering the mapping pipelineLookup SQL Override is to limit the number of lookup rows to avoid the whole table scan by saving the lookup time & cache it uses...2. Lookup SQL Override uses "Order By" clause by defaultSQL Override doesn't uses it & should provide in our query if we require it3. SQL Override can provide any kind of join by writing the queryLookup SQL Override provides only Non-Equi joins4. Lookup SQL Override gives only one record even if it finds multiple records for a single condition SQL Override doesn't do that...=========================================================What is the difference between ODBC , DSN , NATIVE and TCP/IP Connections===========================================================
ODBC is the database drivers which is used to connect from clinet to the relational database.DSN is the data service name which is the name of the odbc name give by the customer in order to access from the clientNATIE Drivers is the drivers which allows the client to acess the source and target after loading them from database to the workspaceTCP/IP it is the protocal used for connecting the remote computers.here the repository contacts with informatica repository server through TCP/IP