a Interview Question

1) what is admin console. can u explained me detailed.

ANS:

There are two types in Informatica..

1)Web based client tool - We need browser to Access --- Admin Console

2)Window Based Client tool - We dont need browser to Access---Repository Manager,Designer,Workflow Manager,Workflow Monitor...

and Admin Console-- Its A web based client tool n administrating the all informatica client tools such as Repository Manager,Desiger etc. And Administrating Users and Groups.Responsible for creating Repository services and Integration Services and Access by Administrator.

------------------------------------------------------------------------------------------------------------------------------------------

2) can any one explain clearly what do u mean by dirty dimension and junk dimension with example?

ANS:

Junk Dim:

Dimension table containing flags, gender, text,…. Which is not useful to generate reports? Then we say this is junk dimension.

Dirty Dim:

If a dimension contains records those records are maintaining duplication.

-------------------------------------------------------------------------------------------------------------------------------------

3) I need your help for faster loading from flat file to oracle table

Flatfile contains 20 million records

please suggest me best to way load and better session performance

ANS:

1)Use Expernal Loading Concept in informatic at session level.For this u need configure "SQL Loader".

2)Split ur single Flat-File in unix around 10 Files based on the size and Enable Partition in session and select target load type "Bulk"

(or)

I know there is feature sqlloader in informatica..., there is anything i have to do at db level .. i meant table at db as currently table is at Oracle database..

4) Can anybody explain the difference between1) session variable and mapping variable.2)Session parameter and mapping parameter.

ANS:

session variable:Yes they are same But suppose if we define Same variable in Session and mapping with some value then Informatica will take Value which has been defined in Session

mapping variable:mapping variable represent value that can be changed during mapping runmapping variable is reqiresd to define a incremantal extratoin and source qualifier t/r to per form incremental extraction

5) 1)how to load first half records to one target and remaining half records to another target

2)where we can use status code

3)der r 2 workflows wkf1,wkf2 if wkf1 sucess then only wkf2 executed how

4)what is vi editor

5) what operations performed on materalized view

5)when we go for connected and unconnected lookup

6)how target table gets refreshed daily

7)session variables and workflow variables

8)how to delete particular row i.e 2nd in a file unix

9) what r the properties set to improve session performance

6) unconnected lookup can return multiple values?plz give me with brief explination?

(I) No it cannot. It will can retrun only one value. Google it you will get better answers.(II) Any Lookup can't return more than one value.

if lookup find multiple matches there are few options u can select

1) custom2) first 3) Last

If question related multiple return port fields You can get multiple fields out of unconnected lookup.To achieve it In Lookup SQL override write concate(field1, Field2)

In Expression u can split that field again into two fields and use it as two separate fields.

(III) Using Unconnected lookup also we can return Multilpe Column Values . Strictly Speaking Un connected Lookup Returns only one Value.So here i am concating all the column values using the || operator and returning this column from lookupup.So after that i am spliting the entire column into multiple values using Substr and Instr.

Ex : Table (emp---> No,Name,Sal,Loc,Dname). So here i am passing only No to EMP i want to all other columns.

Note : Enable the Lookup override : select name|| '~' || sal ||'~' || loc|| '~' ||Dname from emp (Give the dateype as string and Lenght in the Lookup as 10000

for eno it retruns all the columns based on '~' we will split the single column into multiple columns

7) How to load the filename of the flat file into the session statisitc table that is audit table

(I) Which flat file name do you want to upload. I mean is it a source or a target? or anything else

(II) From Informatica 8 Onwords its ver Simple.

1) Click on on source defenition and edit the properties tab2)Select the option 'Currently add Processing File Name"

(III) Yes i know that option ...

Session1 is flatfile to targettable(this is general business requirement load ) ....here we are not loading filename .

Source filename is loaded into different table that is audit table

this means after session1 is completed i will be running session2 (this is to load into audit table)

Here I am thinking how can we get 'Currently add Processing File Name" from the session1 into session2 .

(IV) decalre one variable in mapping variables & parameters.and hardcode the required source table in initial value or in parameter file.

------------------------------------------------------------------------------------------------------------------

8) Joiner and look up t\r which one gives better performance ?

1)how can we tell particular t/r will give good performance (lookup, joiner) ?

2) i have multiple transformations instead of multiple transformations we will take mapplet which one gives good performance???

Ans) my answer is both are same but insted of using multiple transformations we will use mapplet, but mapplet will not show the transformations....

ANS: As per my knowledge...(I)1) Lookup and joiner can be used for the same purpose.

Both had its own advantages and disadvantages. based on requirement performance differs.

Joiner will take master table into cache by default. It should use the cache for joiner.

Lookup also uses cache but u can minimize the cache usage in lookup and u can write sql override in lookup filter only required fields and records by mentioned some conditions in override.

2) Instead of multiple transformation we won't use mapplets.. Mapplets are prefered to use repeated logic and re use that mapplets in different mappints. Performance won't differ by changing multiple trans into mapplets.

(II) Lookup is passive t/r that means it processes all the records, whether condition is true or false whatever it may be it processes all the records( if the condition is true it sends corresponding value and if the condition is false sends null ) where as Joiner is active and it processes the records which are satisfied by the condition. Joiner is nice performer here.

but we can't say which one is good that depends upon the requirement.------------------------------------------------------------------------------------------------------------

9) 1)how to load first half records to one target and remaining half records to another target

2)where we can use status code

3)der r 2 workflows wkf1,wkf2 if wkf1 sucess then only wkf2 executed how

4)what is vi editor

5) what operations performed on materalized view

5)when we go for connected and unconnected lookup

6)how target table gets refreshed daily

7)session variables and workflow variables

8)how to delete particular row i.e 2nd in a file unix

9) what r the properties set to improve session performance

-----------------------------------------------------------------------------------------------------------------

10) need solution for thisi/p1: abcd

i/p2: efgh

o/p should be:a eb fc gd h

ANS:

(I) give the clarity of this question Rashmi.

what i am asking is i/p 1 and 2 are different files or two ports are in same file ?

and o/p is two columns or same column ?(II) Please find the solution below for your requirement :

Sourc1---SQ---->SEQ--->Souce2--SQ---SEQ-------> Joiner [It will Join the 2 pilenines into one(Condition is Source1-->SEQ=Source2-->SEQ)--->Load the records into target

Here, I am generating a seqence numbers for the first file and second file. So based on the seq numbers i am joining the two files.

10) please anybody explain unix commands which are frequently in Informatica development.

If anybody related documents..

ANS:Regarding UNIX Usage in Informatica:

First of all .. In Informatica development projects Unix is not mandatory. Some projects use windows based environment. You know need to know unix commands when ur working in Windows environment.

Even if you are working in Unix environment. If Sources are databases and target database then u hardly use Unix environment.

UNIX Basic commands using in dev

FTP -- connecting to unix for files transfer or downloadput -- place the file in unix get -- get the file from unix

cp-- copy file to different filerm-- remove filevi-- vi editor to modify the file content

Grep-- search for the content in the file

These are some basic commands-------------------------------------------------------------------------------------------------------

11) how to split one half to one target and another half to another target using unconnected lookup how to do this can any one explain me.....

12) regarding update stretagy?without using update strategy how can i update the table?with simple mapping?ANS:

1)Enable the Update Override in Target in the Mapping2)Select Treat Rows as "Update"3)Target Session Properties "Enable only Update as Update"

13) comparison null operetorwhat is comparison null operator in lookup.where we use it in dynamic lookup.which situetion.

ANS: null comparision in lookup with dynamic chacewhen we specified a condition in lookup tm, for null values the integration service take default value instead of taking the null values..... while null values on the scan it should take default only..... in un-connected look up tmit should give null only.....

14) how can i lookup the data without using lookup transformation?

ANS: with join t.m

15) table A:

col1

bangalore#chennai#HydDelhi#bombay

Required format

Col1 col2 col3 bangalore chennai hyddelhi bomby

How can divide one row to multiple rows.

using informatica and also want sql query for that(oracle).

ANS:

(I) Yes you can use Normalizer Transformation(II) Use instring and substring in oracle to acheive the result.(III) when ur importing the flat file take a delimiter as # then directly map to target we get

output what ever ur asking.(IV) eather data coming from database means we can use substring and instring concepts.

16) flat filehi can u please help on this

i have flat file, the content of file is huge and it contains lot of unwanted data...

and we should not use any transformations. and source is connected to target it is one to one mapping only,

how to remove the unwanted data and how can we achieve this...........please help meANS:(I) 1) Please some filter condition in the source qualifier like Deptno = 10

2) Use some pre UNIX scripts to validate the flat file and copy those records from the flat file to the other file and then trigger the workflow

Hope this may help

(II) can you please tell me where we have to implement unix script

(III) In the informatica server it selfPlease move/copy your file into a particular directory of unix server where Informatica is installed and from there using unix script do all the data validation and format correction then trigger your Informatica job

(IV) in sq trns write a query in sql override, user defined join, source filter and select distinct we can restrict the unwanted data.

17) i have two employee tables with the same structures but with differnet data , now i want to find the max(sal) from two tables and needs to compare tables like(max(sal) of emp1 >max(sal) of emp2)if emp1 having the max(sal) i need to pass one set of records and emp2 has max(sal) i need to pass another set of records.

please help me asap.

18) What is associated port in informatica??????

What is data validation????????

What is versioning in Informatica???

ANS:

1. associated port is default port when we are enabling the dynamic cache in the lookup trans. it is compared with input ports. and compared result will give the "NewLookupRow" port.

2. data validation means when source is coming to the staging area we have to validate the i.e. we have 4 techniques.

1. data cleansing (remove unwanted data)2. data scrubing ( adding the ports i.e. adding the data)3. data merging (merge the data diff sources)4. data aggregations (loading the summarized data)

3. versioning is good concept of informatica, i.e. group of members accessing the same repository, when u want to restrict to access ur mappings from others use version controlling.there we have 2 options checkin, checkout.checkin used for to the mapping. in this informatica server will give one version number starts with 1.checkout used for edit mapping. applying the some changes.

19) Do you have any idea about LRF ( Load Ready File) in Informatica?

What is LRF ?, When it comes in to the picture ?, how we deal with it ?

ANS:(I) LRF is "Indicator File".Using Even Wait Task in informatica, we will wait for the file to

trigger the Jobs/Sessions. (II) I think LRF is another concept

bcoz, "indicator file" maintains the id's and it is one of the session log statistics.one type of output file.indicator file will give each every record different id's.weather it is inserts means give one idupdates means give another idlike that.

20) Which transformation is used instead of lookup to check whether the record alredy exist or not???

ANS:

(I) Stored Procedure(II) Joiner tm

21) how to print singele record three timesfor example i have emp table i have the king record only once in the table[i.e no duplicate records]

but my requirement is to print the same record three times

ANS:

(I) I guess u can take union transformation and do it.Take 3 emp source instances and union it in union transformation and u get 3 duplicates for each record in target.Never tried it, but i guess it works

Let me know if i am wrong.(II) Create a target instance three times...... Connect all the output port the three instances...

U will get the expect output...

@shashankNo need of Union transformation here......

(III) taking 3 instances of target is working but full records are not comming, i have removed the p.k also when i took union t/r they are getting successfully loaded with 3 duplicates of every record.

22) diff between $ and $$ in informaticaIf any body know the information about this post me

ANS: (I) $ is Mapping Parameter & $$ is Session Parameter(II) Three types of variables are available in Informatica. They are

1) Pre Defined Variable 2) User Defined Variable3) System Variable

Pre Defined Variables:----------------------------It denoted by Single Dollar ($).

Its defined by the Informatica itself.

Example : $Source, $Target

User Defined Variables:----------------------------It denoted by double dollar ($$).

Its defined by the informatica developers depends on the logic. For mapping parameter and mapping variables are using the user defined variables.

Example : $$Currdate, $$deptno

System Variable:---------------------It denoted by trible dollar ($$$)

Example : $$$Sessionsystime

@Chocsweet: Pls chk the things above, wht u r saying is not the exact answer.

(III) if $ is used for predefined variable, then session parameters come under predefined variables???

examples of session parameters

$session parameter$Inputfile

----------------------------------------------------------------------------------------------------

23) TCS question1.wt is throughput in informatica

2. wt is pipeline partition.

ANS: THROUGHPUT is nothing but the speed that informatica server reads the data from source and writs the data to the target /sec......... it wil be displayed when u double click on session in workflow monitor , it will show a window of the speed that inf.server read and writes. check it out .........

24) SQL TransformationHi Dudes...

What is the use of sql t/r??when we are using sql t/r??Wt r the situations are used sql t/r in project??

ANS:1.sql t.r will supports the ddl and dml commands, where as other t.r not supported to ddl commands.2. when u want to create a table dynamically (i.e. while session is running).3. that is on depends on LLd preparations. then we want to use sql t.r mandatory.and also depends on client requirements.

25) mapping lockUnder what cicumstances a mapping gets locked. And how to realease the lock from the mapping??

ANS

(I) Locks in informaticaHi shashank,

There are different situations mappings or workflows are locked by other users.

1) Repository is disconnected when u r working on any of the mappings. If you reconnect to repository and if you open the mapping which you are working on sometimes it will says mappings is locked by User(Pavan)

2) If you are not closed the designer or workflow manager is not disconnected properly. If you directly close the window. Locks will be on corresponding mapping or workflow.

3) if two persons are working on the same mapping / workflow. second person will prompt a msg " m_test mapping is locked by user(pavan).

If you want to release the locks u have to go to repository manager check user locks and u can release the locks.

hope this will help you to understand locks in informatica.

(III) Its a Write intent lock, if some user has opened it.this lock can be released by disconnecting the respective connection id(user specific) for that object in Admin console

(IV) I think from Infa 8.6.1 onwards this activity is pushed to Admin console to manage.But earlier version all the locks can be managed by Repositry Manager.

26) i have scenario like this

Rno subject Marks----- --------- --------1 maths 601 science 722 maths 672 science 82

now i need output like this

Rno subject Marks subject Marks----- --------- ------- --------- --------1 maths 60 science 722 maths 67 science 82

ANS: (I) i think normalizer will not work here, may be it will be acheaved by using aggregator

t/r but i dont know exactly,(II) i think Normalizer T/R will giv us the Input in this example as its output when the

output of this particular example is used as the Normalizers Input 27) in which scenario we go for mapplets?pls tell me in real time with example

ANS:(I) Definition and Limits

Mapplets

When the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it would any other sessions, passing data through each transformations in the mapplet.

If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping using the mapplet.

You can create a non-reusable instance of a reusable transformation.

Mapplet Objects:(a) Input transformation(b) Source qualifier(c) Transformations, as you need(d) Output transformation

Mapplet Won’t Support:

- Joiner- Normalizer- Pre/Post session stored procedure- Target definitions- XML source definitions

Types of Mapplets:

(a) Active Mapplets - Contains one or more active transformations(b) Passive Mapplets - Contains only passive transformation

Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit your changes

You can use a single mapplet, even more than once on a mapping.

Ports

Default value for I/P port - NULLDefault value for O/P port - ERRORDefault value for variables - Does not support default values

ExampleIn one our my projects.. we have error stretagy which is applicable for all the mappings.. That will capture the error records and flag them and attach error msg to the error records and writing to the error table. this the logic common for all the mappings.. so we implemented this logic in MAPPLETS

this example any one and any domain can tell as example of mapplet..

We are routing the error records based on the key coloumns null or null values.. what are the scenarios we thought errors.. we capture those as errors and sending the only error records to the Mapplet input.. In Mapplet, we are converting single row into multiple rows if that records contains more then one error.. example one row had three errors we are creating three records out of it and showing three errors for the same record.

example.. empno name date sal101 praveen 12/12/1999 10000Null pavan 33/13/1990 afdsfdsfds

in the above example we have three errors.in second record. empno is null, date is invalid date, sal is not valid.. so what will do is we generate row number for each input records to identify which record had error. so our output in the error file like this

rowid tablename error field error msg 2 emp empno empno null2 emp date date invalid 2 emp sal invalid salary.

In Mapplet..

Mapplet Input---> Expression---> UnconnectedLookup--> Mapplet output

actuall mapplet will do only segregate the errors and assign the error code and lookup on error msg table based on the error code and assign the error messages to the each error and also error count for each record in expressoin based on tat we will mention occurs option in normalizer to split single row into multiple rows.

Its out of mapplet.. i'm just explaning till the last to load data into error tableater that flow will go to mapplet out---> Normalizer --> error table..

In normalizer, we have to mention occurs option based on the max number of possible errors.

Please let me know if ur required more abt this....

In the Expression we will validate what kind of error and assign the code for each type of error. based on the code we will lookup on the lkp file..

there is no specific reason to use unconnected lookup. u can use connected also.. we required only one return port from the lookup so we used unconnected where we will get the performance improvement also.. if ur using connnected lookup it will create the cache for each coloumn in the lookup. so it might hit performance so we used unconnected..

hope it cleared ur doubts.. if not reply back..

(I) can u explaine me

how maplet doesnot support joiner T/R(II) joiner transformation is supported in mapplets. (III) Sorry i just copied that matter from material. that is old version..6 x

In latest versions Joiner will work.

28) Hi All,

I am getting the error while installing infa 8.6 on my PC. OS is Vista and database is Oracle 10G vista compatible .

No logs have been created.

Error Details :-

Informatica PowerCenter 8.6.0

cannot start Informatica Services.

Use the error below and catalina.out and node.log in the server/tomcat/logs directory on the current machine to get more information. Select Retry to continue the installation.

EXIT CODE: S

Please Help me out regarding this issue.

ANS:

Check the following.....(I)

1)Be sure that java is installed as its own installation not included with other software.2)Put the Informatica service bin path in the environment path of the system.3)While the installer was pinging the domain I stopped and re-started the service, change user service ID from a network domain user to the system account, Stopped the service and disabled it then after saving re-enabled it and restarted, changed the service ID back to a domain user ID, etc., etc., etc...4)Used infacmd commands to try to add a domain and/or ping the domain externally. I did find one time I had to manually create a domain using the infacmd.5)Handle Compatibility issues.

Let me know if it works or no.

(II) Thanks a Lot Aparna....As per your suggestions infa Server got successfully Installed...Found few errors with respect to configuration in Admin console and Client let you know later.

29) I am new to informatica ... can any one explain me abt incremental aggregation property... I knw we need to chk incremental agg. property in sessions....but my qtn is just by chking that option how informatica filters out only the new records from the source? do we need to do some kind of look up in mappings? or its enough if we just click this incremental aggregation option?

ANS:(I) when u send data second time and you have new values to be aggregated

then you go for incremental aggregation. when u select incremental aggregation in session properties the data is copied into cache and checked in cache for the new values and aggregation is performed.

(II) Thanks Shashank ... u mean to say the source data as is copied into the cache? and it compares the new source with the cache and processes only the new records for aggregation?

(III) Its is not source data which is going into cache it is target data and source data is compared with this cache.

(IV) okey .. but the target is aggregated ... for example

Source - Jan Feb and MarTarget may be Qtr?

How it compares?(V) As per my knowledge,

The property(checkbox) specified in the sesssion tab would NOT be responsible to filter out the new records and aggregate it..The option is used only for the incremental aggregation or also called as running aggregation.The onus would be on the user to filter the delta records(changed/New) at the source side and pass it on to the aggregator..When you check the option in the session properties,infa creates two set of caches for data and index..(original set of cache and backup set of caches which is used for recovery purposes).eg:existing data in the emp table as on 20sep-------------------------------------------------empid allowances date

1 100 20sep2 200 20sep

incoming records on sep21st----------------------------------1 200 21sep2 300 21sep3 400 21sep

After running the incremental aggregation enabled session,the target table would look like below:--------------------------------------------------------------------empid allowances date1 100 20sep2 200 20sep1 300 21sep2 500 21sep3 400 21sep

Note the aggregation of the allowances that has taken place above and also the new record that been inserted into the table..

(VI) Ya thts correct.,explained by the shiva prasad..In realtime these incremental aggregation will be used in telecom industry and insurance domain..

* telecom domain in postpaid connection to find out the total balance what have you talked till today for this month will be calculated in incremental aggregation..

*Insurance domain: To sum up ur premium amt from d start effective date to till date they will be using incremental aggregation...

Thse are the some small examples in realtime...Any explaination needed reply in this forum itself...

(VII) So we need use lookup,update strategy to filter out the new/changed records just like SCD ?? and pass it to the aggregator ?

30) scenarioid name 1 abc2def3ghi1 abc2 def2 def3 ghi----------Target-----------id name1 abc1 abc12 def2 def2

2 def33 ghi13 ghi2how to acheive this logic

ANS:(I) select id,Name || decode (row_number() over(partition by id order

by id),1,null,row_number() over(partition by id order by id)-1) from TableUse the above query in Source Qualifier and load the records into your target.

(II) If this is a flat file how do solve this(III) Source(Flat-File)--SQ-->Rank Trans(Group by ID and select no of

ranks say 10000)-->Exp (Drag all the fileds from Rank Trans-- id,name,rank)Disable the Outputport in the exp for the columns (name,rank)Create One O/P Port named as Named_rn give this formula name || (decode(rank,1,null,rank-1)Now Connect the id & Names_rn from expresiion to Target.--------------------------------------------------------------------------31) please tell about ETL testing using informatica.Roles of ETL testing experts.At what stage ETL testing people will come in the project.

ANS:(I) basically ETL testing is 4 types:1 unit test: for developed

mappings

2.integrate test: gathering all mappings to one location this will do superiors

3.system test: this is all so one of the impotent testing

4.UAT:user acceptance test: this will show to our client this is the final test........

testing is different scenarios are there :

test spec will provide your org.

test procedure:

test case:expecting result:actual result:status: pass or fail

(II) Apart from mappings....the following test scenarios should be considered

1).Existing data testing2).New data creation3).Editing/Deletion of existing data.

How UI reacts to the changes

4).What are the filters/SQ's/Target connections/Jobs and Tasks

32) This is tcs interview question if parameter file miss what eror will come?plz answer

ANS:(I) PETL_24049 failed to get the initialization

properties from the master service process for the prepare phase [Session task instance[Session_name] Unable to read variable definition from parameter file[file_name]] with error code [32522632]

This error i got n if its wrong let me know the error...

(II) I guess even if the parameter file goes missin and if you have assigned the initial values for that parameter @ the mapping level,You shouldn't be getting the error...

33) I don't have source data but i have to test my mapping, how it is possible in informatica?

ANS:(I) Hi Guys,

Siva is right here , Without sample data u can't test the mapping. Deepthi might have confused here. without loading to target table u can test with sample data using debugger or test load options..

(II) A mapping is not valid without a source or target.A debugger can not be used in a invalid mapping.------------------------------------------------

34) How do you migrate from one environment to another like development to production in Informatica

ANS:(I) Migration code from dev --> test ---

> Prod is having different methods. It will be different from project to project.

1) Repository Dump and create new repository with different name in Prod

2) Export and import workflows from Repository Manager.3) Unix Scripts to migrate workflows from folder to new folderWho's responsible for this migration. this is also having different options...1) Some project will have separate Admin team will handle this migration. 2) In team experienced people will do the migrate 3) sometimes development is handle by one company and Separate vendor's(different company) will maintained environment in this case we have raise migration request to respective team to migrate the code. Hope this will help you to understand migration methods

(II) import & Export through repository managerimport & Export through pmrep command line

Keep both the Repositories open in a Infa Repository manager clientCopy PasteDrag & DropDeployment groups

(III) Another way is called Deployment Group.

Trust me this works amezing. Not sure from which version onwards it comes.. but I think 8.6.1 onwards.

You need not to worry about the dumps or export import.... this work just like copy workflow from Repo to Repo. But make sure you use right way of Infa version control.---------------------------------------

34) ERROR TABLE IN INFORMATICA REPOSITORYANY BODY KNOW ABOUT THE " ERROR TABLE " IN INFORMATICA REPOSITORY......IT HAS A UNIQUE 'SEQUENCE_NUMBER', 'ERROR_CODE','ERROR_DATA'..........PLS............. ANY BODY HAVE THE KNOWLEDGE OF THIS TASK...... PLS EXPLAIN US

ANS:

(I) In Informatica from 7.1.4 onwards provided separate repository for error tables. In session properties you have option to log the errors in error tables. Informatica build 4 separate error tables to log the errors.

For error info., Infa 7.1 onwards provides 4 inbuitl error tables.PMERR_SESS, PMERR_TRANS, PMERR_DATA, PMERR_MSG...........these tables log the error details on row by row basis.. This errors are related to only informatica functions or validations or technical errors not data related errors. Informatica will log Technical issues related to informatica process.

Check workflow administration guide for further explanation.(II) Thats right Pawan...

These OPB* tables are specific to Informatica.But I think you can get the error code and description in some error table as well. Informatica log is built based on that repository information.Sorry I do not have any OPB table access.. else I could help you out.----------------------------------------------------------------------------------------------

35) Splitting source columnsI have flat file source with 6 columns

I want to load 1st 3 columns in target1

2n 3 columns in target2

How?

ANS:(I) From source qualifier directly connect first 3 columns to first target and remaining 3

columns to second target.

RamaKrishna: Router concept will not come into picture here.

(II) Ramakrishna, AbdullWrong answer.. router is no use here..

Siva answer is correct u can directly connect the columns to different targets based on requirement.

(III) i don't think so you need to go for sorter and ranker for this solution. check the problem clearly it is simply loading from source to target..

first three coloumns needs to load to target 1 and next three columns need to load to target 2..

he is not talking about records only columns. (IV) yes , you can directly map first 3 ports to Target1 and remaining 3 ports to target 2.

no need of router...etc

36) I got a Question like

With out using Update strategy transformation can we Update the target in informatica?If it is possible how ?

waiting for reply

ANS:(I) By Using Session Properites.........

Update else Insert

(II) updating targetby using data drivenselect update,insert ,delete in session in target mapping tab

(III) n target one option is thereIN TARGET properties Upadate override option is there

using this one we can update the target

37) what is Transformation Errorhello friends..Could any one tell me what is transformation error?

ANS:(I) transformation errors are

sql override in the source qualifier transformationsome date conversionsand lookup override in the lookup transformation

38) Please Provide Large Databases?please i want to gain more knowledge on informaticaso while taking source database it will be better if we have some database which has more fact nd dimension tables along with more records..do any 1 have large databases just like emp database

ANS: (I) in cognos u will find gosales database with lacks of records(II) we have SH ( sales History ) and OE (Order Entry) schemas

are there in Oracle so those may helpful to your requirement

38) WEIRD QUESTION FACED IN INTERVIEWI have d source file with 3GB memory.I need to load d data in 2 target tabls according to following condition...

T1: should have d data for 2GBT2: It should have d remaing 1 GB data..

plz answer me for dis question..i have faced d question in DELL interviews..

ANS: (I) Hi...the only solution that i may think as of now is to split the file into two

based on the size and then load them into there respective targets.

'split' unix command can be used to split the files.(II) Ya that may be possible,but he asked me to implement the mapping for

above scenario.

(III) So here you want to load the 75% of your input data into first target and remaining 25% of data into the second target.

Source(Flat-File/DB)--->SQ-->EXP(Pass all the columns and have a sequence number by using seq generator)--->RTR(two Group )--->Targets (T1,T2)

First Group Cond : IIF(MOD(seq,3)=1 or MOD(seq,3)=2) Pass this to T1Default Group : Pass this to T2.-----------------------------------------------------------------------------

39) informatica 8.6 installationANS:

(I) It is just next , next like that only

But you should have fallowing :

1) Two DB users2) oracle port number(default is 1521) , Oracle SID ( difault is ORCL)

And other part of the installations are

1) creating the Repository2) creating Integration service

After this you can start using Informatica

I hope it will be usefull(II) after the installing integrating server/service

tne integrating service & rep not working(III) change the operating mode from exclusive to normal

and click on enablethen ur informatica repository service will be running

40) i have 2 work flows namely wkf1,wkf2. after the execution of wk1 we have to execute wkf2. if wkf1 is not executed means you should not execute wkf2 we have to do it automatically

ANS:(I) use the command task in workflow1 as the final task and start the workflow2 using

pmcmd command....once all the tasks are executed successfully in workflow1, the pmcmd command in command task will trigger the workflow2 automatically...

(II) Hi Prasad,

In this approach we can't achieve all the requirements. any time u can start the wkf2 in ur case but requirement is wkf2 shouldn't run if wkf1 is not succeeded.

Solution for In this case u have to create a flat file at the end of wkf1 by using touch xyz.txt and in wkf2 use even wait task and use file watch event on that flat file. in this solution if you run the wkf2 it will wait for the flat file to create if not it won't run the wkf2.

Hope this will be ur solution.(III) Use the post session command for the wf1 to create a null/zero file.

the file created can be used as a file-watcher in wf2.

Create an event-wait task and use it as the first task in wf2.Such that it starts the preceding sessions of wf2 & after the creation of 0 file(wf1

completes). You can select the option of Delete File-watch in the event wait task(depends on your requirement).

41) how to delete dublicates in flat files

ANS:(I) using sorter t/r Check the distinct option in property tab

42) help me pls in this data issuehi friends

While loading from flat file(fixed length) to oracle table .the data got loaded successfully but when i checked in session log file .. i got the below error ..

Severity: ERRORTimestamp: 9/16/2010 1:31:45 PMNode: NODE_02Thread: TRANSF_1_1_1Process ID: 6749Message Code: TT_11132Message: Transformation [e_Donnelly] had an error evaluating output column [v_DATE2]. Error message is [<<Expression Error>> [TO_DATE]: invalid string for converting to Date... t:TO_DATE(s:LTRIM(s:RTRIM(s:' ',s:' '),s:' '),s:'YYYY-MM-DD')].

ANS:(I) Hi Karthik,

This error will occur when using incompatible date formats. The default date format in PowerCenter is MM/DD/YYYY HH24:MI:SS and hence the SYSDATE is converted to MM/DD/YYYY HH24:MI:SSbut in this case the TO_DATE is expecting the input in the DD-MON-YYYY format.

Solution

To resolve this use the convert the date format of the SYSDATE to the formate used by the TO_DATE function.

Example:

TO_DATE(TO_CHAR(SYSDATE,'DD-MON-YYYY'),'DD-MON-YYYY')

(II) data is in string format like 20070802

this data is getting loaded but some data has no value and it is creating error for those data

the file is fixed limit(III) If some records having null u have to filter those records or send

default date to those records. If you trying to do to_date function in null values it will throw an error like this.

Use condition something like this..

IIF( ISNULL(IN_DATE), to_date('19991231','yyyymmdd', to_date(in_date,'yyyymmdd') )

Try this logic it will resolve ur issue.

43) source qualifier to expressionHi friends i have 2 source qualifiers ( EMP & DEPT) and i want to use expression transformation and in which i want to copy port from both the transformation.so i did it and port are copied from both the table SQ but links are forming from only one source qualifier only why like that if you don't get i will explain again

ANS:(I) You have to define the Join Condition in the Source

qualifier.

Let me know if it works or not.(II) Hi aparna

Thanks for reply. I got this question in an interview His requirement is he has two individual source qualifiers and he want to copy the port from two SQ to Expression with out using any other transformation like

source1----source1_SQ ---------> Expression Transsource2----source2_SQ

Like above is it Possible

(III) You can use one SQ->Lookup->Expression(IV) hi ratna kumar

i got you but after SQ no other Transformation , direct Expression only like this he asked me

(V) if the two tables coming frm same database..... then remove one two sourc quailfiers ,, and create one source quailifier transformation ,,,, then combain two tables(emp,dept) because they have common columns.... after that connect the ports to expression transformation..... source qualifier is a active transfor mation.... only multi input group tm accepts the two inputs ....... so expressin never act lick that........ read transformation guid for assistence....

44) Duplicates in flat fileI have a flat file, it has duplicate records

I want to send

1.Distinct records in target 12.Duplicate records in target 2

How to do this?

ANS:(I) Use Exp Tx compare the row and set the flag then

use router transformation.

(II) Duplicates in flat fileFor ex: Assume you have Dept table (DEPTNO,DEPT_NAME,LOC) are the fields in the table. now your scenario is to compare the record by the use of DEPTNO

Table data:DEPTNO,DEPT_NAME10,AAA,CHE10,BBB,BGR20,CCC,HYD30,DDD,KMU30,EEE,TPJ

Your EXPRESSION T/R PORT will be like this

PORT NAME,I,O,V, EXPDEPTNO (will be I/P port)DEPT_NAME(will be I/P Port)LOC(will be I/P Port)

Now Create new ports in this order

PRE_RECORD(Variable port) TEMP(in expression)TEMP(Variable port) DEPTNO(in expression)FLAG_DUP(O/P port) iif(PRE_RECORD=DEPTNO,'Y','N')

This is use to compare the records and it will flag Y if it is duplicate.Then use router

-----------------------------------------------------45) Please explan this query

Select * from emp e where &N=(select count(distinct(sal)) from emp f where f.sal>=e.sal)

how it get sorted whether it will create a virtual table to sort or what it do actuallywhat what is the need of countplease help meANS:

(I) Its querying the emp table inorder to find the Nth Highest salary, for eg: 5th Highest Salary.

Before executing this query you can give this command. "set autotrace on". Then you will know how the SQL got executed

(II) Hi Aparna,Thanks for ur reply but it showing following errors Could you help me

SQL>set autotrace onSP2-0613:Unable to verify PLAN_TABLE format or existenceSP2-0611:Error enabling EXPLAIN reportSP2-0618:Cannot find the session Identifier. Check PLUSTRACE role is enabledSP2-0611 Error enabling STATISTICS report

(III) Which version of oracle DB are u using?

(IV) IM using oracle 9i You have to run these scripts before you set autotrace on.

$ORACLE_HOME/sqlplus/admin/plustrce.sql$ORACLE_HOME/rdbms/admin/utlxplan.sql

46) router is an active or passive transformation

ANS: Router is a Active tx because of Update strategy property and it will change the Rownum of the table---------------------------------------------------------------------------------------------------------------47) Suppose I have one source which is linked into 3 targets.When the workflow runs for the first time only the first target should be populated and the rest two(second and last) should not be populated.When the workflow runs for the second time only the second target should be populated and the rest two(first and last) should not be populated.When the workflow runs for the third time only the third target should be populated and the rest two(first and second) should not be populated.Could any one

ANS:

by my first look.. posting the thought.. hope this helps you...

take sequence generator in exp and generate the numbers starting from 0

use a Router t/r which is connected to the Target table,

define three group conditions as

Table1 : MOD(SEQ_VALUE,3)=0



48) can anybody solve this scenario pleasemy source isid111122223333then my targets are liketarget1:

id123

target2:

id111222333

ANS: (I) This question is similar to past post only

to do this

SQ----> sorter--->expression---->router------>target1 and target 2(II) routing of unique values into one target(III) hi frens

i have got one source table(here source is of type flat file)like thisSNO,SNAME,EDUCATION101,VIJAY,BTECH102,PRAMODH,BCOM103,KISHORE,BTECH104,SHANKAR,MSC105,RAJESH,BCOM106,MOHAN,BTECH107,SNEHA,MCA108,KAMALINI,BCOM109,PAYAL,MCA110,SHANTI,MSC

and i need o/p in such a way tht routing of unique values into one target and dummy values into other with in a given mapping only

SNO,SNAME,EDUCATION101,VIJAY,BTECH102,PRAMODH,BCOM104,SHANKAR,MSC107,SNEHA,MCA

and in the other tagrget table i need like thisSNO,SNAME,EDUCATION103,KISHORE,BTECH105,RAJESH,BCOM106,MOHAN,BTECH108,KAMALINI,BCOM109,PAYAL,MCA110,SHANTI,MSC

(IV) Hi shankar,

Din & Ramesh is correct. may be u got confused with Jai answer using aggregator. now aggregator is not required. here i'm giving you the coding check it out.

Source Qualifier -> sorter -> Expressino ->router -- 2 targets

in sorter : sort on education

in expression : INPUT & OUTPUT PORTS ARE ENO, ENAME, EDUCATION

Take two variable and one output port v_FLAG : IIF(V_TEMP = EDUCATION, 1,0)V_TEMP : EDUCATIONO_flag : v_flagIn router :Create two groups

unique values group : o_flag = 0duplicate values group : o_flag=1

map those values to corresponding group.

hi friends correct me if any wrong in that code.(V) thanks pavan for ur detailed explanation

here i followed another approach and got iti used source qualifier -> rank -> router ->targetin rank i grouped by education and gave ranking from bottom(say ranking upto 5)and in router i used o/p grp condition rank=1 which obviously routes the unique values to one target and default grp routes dummy values to another target

(VI) i have a doubt Shankar .. the way you are proceeding seems doubtful to me …..

Here is one example suppose in your target table you have rows as below SNO,SNAME,EDUCATION101,VIJAY,BTECH102,PRAMODH,BCOM104,SHANKAR,MSC107,SNEHA,MCA

and in the other tagrget table SNO,SNAME,EDUCATION103,KISHORE,BTECH105,RAJESH,BCOM106,MOHAN,BTECH108,KAMALINI,BCOM109,PAYAL,MCA110,SHANTI,MSC

Now if next day a row comes with the data SNO,SNAME,EDUCATION111,AMIT,BTECH

Now tell me the row will get inserted to which table?

So what I want to suggest is …. We can have a look up to the existing table … and can give a condition …. If that found put into table 1 if not put into table 2.

(VII) Shankar wants to findout the duplicate , unique record into two different targets in the same day. u are thinking abt next day also.If you use the lookup on the target table first day ur approach won't workout. first day all the records won't be there in the target table so all the records go to one target. i can understand ur point but shankar requirement is not that.

Shankar u approach is rite.. it will work for ur requirement.. in informatica we can do it in N number of ways. tanx for sharing another approach also..

49) interview questionHiHow to extract original records at one target & Duplicate records at one target?Thanks in Advance...

ANS: You are solution is not completed/Correct. may be u got confused with the question. Check the similar question in the old postshttp://www.orkut.co.in/Main#CommMsgs?cmm=791012&tid=2543660813794278811&kw=v_count

---------------------------------------------------------------------------------------------------------------50) what is source commit and target commit intervals

where we can find rejected records and how to reload that records

what operations we can performed on materalized views

http://www.orkut.co.in/Main#CommMsgs?cmm=791012&tid=2543660813794278811&kw=v_count

http://www.orkut.co.in/Main#CommMsgs?cmm=791012&tid=2543660813794278811&kw=v_count

exact difference between lookup and joiner trs

ANS:(I) The main differencee between lookup and joiner is

Lookup is performs Non-equijoins and joiner performs only EquiJoins.

Second Difference is Whenever we perform join Using Joiner It contains Must Primary key-Foreign key Relation ship.By using Lookup Transformation Just We Need a Matching Port.

If any diff is there Tell me(II) Commit Is The Wat amount of data is loaded in to target during session runs

main we have 3 types of commit points is there

Target-based commit. The PowerCenter Server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size the commit interval and the PowerCenter Server configuration for writer timeout .Source-based commit. The PowerCenter Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties.

User-defined commit. The PowerCenter Server commits data based on transactions defined in the mapping properties. You can also configure some commit and rollback options in the session properties. Rejected Records Are Saved in Session log in the form .bad File

(III) source commit is after how many records you wan load to target ...

eg after sening 1000 records ... commit interval

and target commit point after reaching 1000 rec to target.....

these are just like savepoint in oracle ok....

to improve the session performence you have to increase the commit interval .......

51) In expression transformation no values are getting passed from output port...even if i 'hard code' any value in output port its not coming to next transformation...

ANS:(I) if the value is coming into the expression then its not hard coded

hard coded means that the value remains the same for tht port like suppose if we want a port STATE to have a value as 'New York' then v make that port as output (checked) ,input(unchecked) and write 'New York' in the expression part

51) Hi All..Can anybody clarify me these questions.1)Surrogate comes under which category?2)Where we use surrogate key?(technically)3)Whether first we should use and why?i)Expression Transformationii)Filter Transformation4)What is basic diffrence between star schema and snowflake schema?5)How to do performance tuning in Informatica?6)What is diffrence between data migration and data warehouse.

7)What is dynamic and static lookup transformation.8)Why to choose oracle or sql server for data warehouse?

ANS:(I) 1)Surrogate comes under which category?

2)Where we use surrogate key?(technically)

Surrogate Key is an artificial identifier for an entity.In surrogate key values are generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe anything.Primary Key is a natural identifier for an entity. In Primary keys all the values are entered manually by the user which are uniquely identified. There will be no repeatition of data.Need for surrogate key not Primary KeyIf a column is made a primary key and later there needs a change in the datatype or the length for that column then all the foreign keys that are dependent on that primary key should be changed making the database UnstableSurrogate Keys make the database more stable because it insulates the Primary and foreign key relationships from changes in the data types and length.

(II) 3)Whether first we should use and why?i)Expression Transformationii)Filter Transformation

As per the informatica documents for bottlenecks in informatica, we should use Filter Transformation first to remove as much as unwanted records and then allow Expression Transformation to process records based on user requirement.

But as per the requirement if necessary u can use Expression Transformation first, For example.. you have to set some flag and based on those flag you want to filter records. So in such situation you have to use Expression first.

(III) 5)How to do performance tuning in Informatica?

Excerpts from Infomatica Manuals:

The goal of performance tuning is to optimize session performance by eliminatingperformance bottlenecks. To tune the performance of a session, first you identify aperformance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.

The most common performance bottleneck occurs when the PowerCenter Server writes to a target database. You can identify performance bottlenecks by the following methods:

♦ Running test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.♦ Studying performance details. You can create a set of information called performancedetails to identify session bottlenecks. Performance details provide information such asbuffer input and output efficiency. ♦ Monitoring system performance. You can use system monitoring tools to view percent CPU usage, I/O waits, and paging to identify system bottlenecks.

Once you determine the location of a performance bottleneck, you can eliminate the Bottleneck by following these guidelines:

♦ Eliminate source and target database bottlenecks. Have the database administratoroptimize database performance by optimizing the query, increasing the database networkpacket size, or configuring index and key constraints.♦ Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks.♦ Eliminate session bottlenecks. You can optimize the session strategy and use

performance details to help tune session configuration.♦ Eliminate system bottlenecks. Have the system administrator analyze information from system monitoring tools and improve CPU and network performance.

(IV) 4)What is basic diffrence between star schema and snowflake schema?

Star Schema : Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.

Snowflake Schema : A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables.In a star schema every dimension will have a primary key. • In a star schema, a dimension table will not have any parent table while in a snow flake schema, a dimension table will have one or more parent tables. • Hierarchies for the dimensions are stored in the dimensional table itself in star schema Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies.• Star schema utilises less joins than snow flakes so the performance is faster.• last but not least Star schema is more common than snow flakes

(V) 6)What is diffrence between data migration and data warehouse.Data Migration: It is a process of migration of data from one database location either relational or non relational to another database location.

Data Warehouse: It is a store or warehouse where we manage the unoperational data (non transactional) for historic usage, business analysis or tren analysis. Remember that DW can be relational but can not be OLTP. interviewer confuses candidate by that way

7)What is dynamic and static lookup transformation.

Uncache

You cannot insert or update the cache.You cannot insert or update the cache.You can insert or update rows in the cache as you pass rows to the target.You cannot use a flat file lookup. You can use a relational or a flat file lookup.You can use a relational lookup only. When the condition is true, the PowerCenter Server returns a value from the lookup table orcache.When the condition is not true, the PowerCenter Server returnsthe default value for connected transformations and NULL forunconnected transformations.

Static CacheWhen the condition is true, the PowerCenter Server returns avalue from the lookup table or cache.When the condition is not true, the PowerCenter Server returns thedefault value for connected transformations and NULL for unconnected transformations.

Dyanamic CacheWhen the condition is true, the PowerCenter Server either updates rows in the cache or leaves the cache unchanged, depending on the row type.This indicates that the row is in the cache and target table. You can pass updated rows to the target table.

When the condition is not true, the PowerCenter Server either inserts rows into the cache or leaves the cache unchanged, depending on the row type.This indicates that the row is not in the cache or target table. You can pass inserted rows to the target table.

(VI) 8)Why to choose oracle or sql server for data warehouse?

This is a debatable topic. there can be many reasons like technical, financial, user requirement etc. One should give the reason by analysing all the logics. I have worked in both SQL and ORACLE, both are pros and cons..one person can not say easily that ORACLe is better than SQL SErver.----------------------------------------------------------------------------------------

52) ) i hav 1table containing 1000records i want to load first five records in one target and next 5 records in second target alternatley up2 1000recordss ???????????

2)i hav 1 table contaning 10records and dat table contaning sum duplicatesnow i want to load duplicates and original records in two targets???????

3) how to load the same table 10 times in2 target in one mapping????

plzz help out vth ds quests?

ANS:(I) Q1 -

Use two counter variables in Expression.

Counter1 - Increment it from 1 to 5 and then cycle it back.Counter 2 - Whenever Counter1 becomes 1 increment Counter2.

So for Target 1 - If mod(Counter2,2) = 1For Target2 - If mod(Counter2,2) = 0

Hope this serves the requirement(II) There could be multiple approaches to handle this

1. You can use aggregator2. From your Source Table, connect 2 SQs. In first SQ. use DISTINCT query to get the unique records and load into first targetIn 2nd SQ, put to query to get the duplicates and load into second table.

(III) Q3 - For this also there are multiple ways .. depends on what kinda requirement you have -

1. Use Normalizer with OCCURS as 10.

2. Use Unix Script

3. You can connect 10 SQs from your source.

4. At Workflow level also you should be able to use a workflow variable and a decision task I guess.

5. You can also schedule your workflow to run for 10 times one after another.

Depends whats the actual business requirement.-----------------------------------------------------------------------------------------------

53) comparing two tablesi have two employee tables with the same structures but with differnet data , now i want to find the max(sal) from two tables and needs to compare tables like(max(sal) of emp1 >max(sal) of emp2)if emp1 having the max(sal) i need to pass one set of records and emp2 has max(sal) i need to pass

another set of records.

please help me asap.

ANS:(I) I guess SQL transformation can come to your rescue here.

In the source qualifier, useSELECT COUNT(*) FROM TABLE1UNIONSELECT COUNT(*) FROM TABLE 2

In the filter check the condition count1 > count2. After that you can use SQL transformation and write your SELECT query there.

this is one approach.. there could be other ways as well-----------------------------------------------------------------------------------------------------

54) hi,

i have n number of flat files like a.txt,b.txt..............n.txt

in a.txt ------------- C1,C2,C3 1,2,3 in b.txt -----------C1,C2,C3 2,3,4 like this up to n.txti want to insert my data into a target asC1 C2 C3 C41 2 3 a.txt(source file name)2 3 4 b.txt( " ).....................................

brieflyi want to load the data into target isall the data from source(a.txt,b.txt ......n.txt) as well as source file name

and

if i want to add another n+1 file in source system that will add in the target

ANS:(I) we have option in source flat file " Add currently processed file name as port" you check the

option then u will be the achieve the ur desired output.

(II) hi,first read ur input files using indirect option in session.then in flat file source properties "add currently procesed files" port u have achive tje desired output.for indirect option check with informatica help

55) what are Push and Pull ETL strategies?2) Can anyone explain factless fact table with example?3) what is a Junk Dimension can you explain it with example?

ANS:(I) Push and Pull strategies determine how data comes from source system to ETL

server.

Push : In this case the Source system pushes data i.e.(sends data) to the ETL server.

Pull : In this case the ETL server pulls data i.e.(gets data) from the source system.(II) Fact table without any measure is called factless fact table.(III) A "junk" dimension is a collection of random transactional codes, flags and/or text

attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades

56) Pre and post QSL Overrides

Hi Guys can some one help me on what condition we prefer the pre and post SQL OverridesAnd whats its advantages and dis advantages

ANS:(I) In case if you want to do anything on the table before data loading or after data loading we

use pre and post sql's

for ex. dropping and creating indexes

57) source filter condition

what is the main difference between source filter which we will give in the source qualifier and filter condition which we will add by overriding the source qualifier query?

ANS:

(I)

Filter

Hi,

If you set the filter at SQ level, it limits the source data( means it filter the data at source lever so it increase the performance) if you use the filter condition with the filter tranceformation it limits the target rows( means it retrive all rows from the source then it will apply the filter condition and load the data in to the target) it will take some time to retrive the records from source, so it will increase the session time.

(II) Paluri was not talking about diff between source filter in source qualifier transformation and filer in filter transformation..His Question was Source qualifier filter and SQL over ride in source with filter condition..Both are the. In source filter u can only give filter condition to filter out unwanted records or for testing purpose..where as SQL over ride source qualifer is to customize the default sql which is generated by Source Qualifier..

(III) is there any difference in the way how they will execute? i mean both the filter conditions we execute in the database level?

(IV) Purpose of both is the same but executes in diff levels...SQL override execute in Database level, where as just filter condition executes in Informatica level, It means source Qualifier reads all the data from Database and then based on filter condition it passes the records

(V) if you observe the session log , select query which is issued to the database will be with the "where clause".(where clause will be the condition which we added in the source filter of the sql transformation)As per my understanding source fileter condition will be add to the select statement first and it will be issued to the database...

(VI) for your clarity, try to run a mapping with SQL over ride with where clause and don't put and any filter... and again run the same with just filter condition ( no sql over ride) then u will understand the diff

(VII) If we run with just filter condition ( no sql over ride) then observe the log for the sql queiry which is issued on database,it will be adding filter condition only.if we over ride the query then it will take the over rided query.

(VIII) if we don't give any filter condition or sql over ride, though the log shows default sql query without any customizations

58) mapping xml doubt

how to export total mapping to a xml file.how to use that xml file in other system.what r the steps we follow when we do this.

ANS:(I) In the designer,

go to repository, click on export it will save the file in XML format..To import, go to repository, click on import, select the xml which u saved and follow the steps..

(II) First you want expand the navigator window,select the which mapping you want export

go to menu's and select Export option....click on ok

(III) exported as xml file.when i import it is saying not a valid xml

59) how to maintain history data in oracle

Hi Friends,

my source table is having like this

sno sname comm------- ---------- -----------1 anand 5002 reddy 10001 anand 20002 reddy 20003 vinod 1000

i want target table like this

sno sname comm------- ---------- -----------1 anand 25002 reddy 30003 vinod 1000

Note : This is only in oracle not for Informatica....

ANS:

(I) your question and wat you want to achieve are totally different...

In order to get the output as you mentioned in the table..Here is the query

select sno, sname, sum(comm) from table_namegroup by sno, sname

60) Why Union is Active transformation

Hi guys can any body explain why Union is Active transformation? I tried Union but it doesn't delete any duplication records.

ANS:

(I) UNION I S ACTIVE SOLUTION

FIRST WE know a transformation is said to be active if it changes the rows to pass through it. it combines the two tables jof similar stru. so there is a change in records so it is active.

suppose u want to delete duplicate records we use only option . otherwise we use union all for displaying with duplicate records.

1) What is Direct and Indirect in flat file property

2) What are the types of Caches

3) Scenario: I have 3 tables 1st table Emp ID and 2nd table Telephone Number, Address , Location and 3rd table Bank Name , Account num .I want all the 3 tables in one target table (all columns converted to rows .Note table is in Denormalized form) ** Empid is Common in 3 tablesOther part of Question Scenario: Can i solve the above problem with Unconnected Lookup.Other partScenario: If we are using Joiner what is the join condition

4) Scenario : I want to update the target table only, without using Dynamic Cache

5) Scenario : I have used Sql override in lookup ,there are 5 ports from Col A to Col D But I have used override for first 3 columns Col A, Col B, Col C order by Col B and mapping is validated .when i run the mapping the session throws error and Sql overide is not valid.

6) Scenario : I have to join table is it better using the Sql override or lookup or Joiner. Performance wise which is better

7) Doing overrides in SrcQua is it a better performance

8) Scenario: In workflow I have a reusable Session, the same session is reused across other workflow. Any change made in either of the Session does reflect in the other Sessions.

9) Can we make any changes in reusable & Non reusable Session often

10) Scenario : In my mapping the Update Strategy is not Updating

11) Scenario : Is it better filtering the rows in filter or in SrcQua

ANSWERS :::

1> Direct : When we want to load data from 1 flat file only.Indirect: when we want to load data from 2 or more flat files of same data structure.

2> Static Cache, Dynamic Cache, Persistent Cache, Recache Cache, Shared Cache

3> Explain more plz. is there any column common???

4> Use a filter after dynamic cache and give filter condition NewLookUprow=0

5> After SQL override give "--" without quotes else override wont work.

6> If tables are from same database use Source Qualifier else joiner. Lookup is not a good option.

7> Yes, override in SQL is better.

8> When you make changes to a reusable task definition in the Task Developer, the changes reflect in the instance of the task in the workflow only if you have not edited the instance.

9> Yes, we can make.

10> Check whether you have set the option Treat Source Row as Update in session properties or not. Set it to Update.

11> Better in source quailfier

3> We need to join the three tables using Joiner. We need 2 joiner transformations to join them. Join condition would be empid of one table=empid of another table.

We do not need to use unconnected lookup as it is used to update Slowly changing dimension tables.

4> We can use simple lookup to know whether record exists in target table or not. We can the use update strategy.

We can also use Unconnected lookup.

Other option is to set the Treat Source Row as Update in session properties.

1) How to delete header and footer in flat file.

2) In source we have 1000 rows and i have 3 tragets . The 1st 100 rows have to go in 1st target and the next 200 rows in 2nd traget and the rest of the rows in the 3rd tgt.

3) I have some duplicate rows in the Source table , i have 2 targtes the unique records has to be loaded in the 1st target and the duplicate records in 2nd target.

4) I have Empno,Name ,Loc in the source table , we have two targets the 1st target is Tgt_ india and Tgt_USA. when the employee moves from india to USA the row of that employee must be inserted in USA tgt and must be deleted in Indian tgt and vice versa.

SOLUTIONS

1> Skip the first row to delete header. Not sure for footer.

2> Use an expression t/f after source and use a mapping variable of count aggreation type. Then use a router to filter the records in 3 groups and load to 3 target tables. ( REFER TO FLATFILES FOLDER MAPPING)

3> Already discussed in thread.

4> Insert table using a router to appropriate table and use Post SUCCESS COMMAND to delete the row from other table. (REFER TO FLATFILES MAPPING)

SQL STAEMENT ::

1> delete from emp_india a where empno =(select empno from emp_usa bwhere a= b);

2 >delete from emp_india where empno =(select empno from emp_usa);

Aim of Informatica ::

Informatica is a ETL tool used to extract data from OLTP sources , applied transforamtion to cleanse the data before applying the business logic and then loaded into the star tables (warehouse)

ARCHITECTURE Data Warehouse Architecture

Informatica Architecture

Version 6.2 / 7.1.1

Clients Server Sources Users Repository Admin Console

Repository Server

Source Repository

Repository Manager

Informatica Server

Target / DataWarehouse

Data warehouse/Datamart

Designer Workflow Manager

Workflow Monitor

SCHEMA

1Star Schema

2Snow Flake Schema

3Fact Constellation or Galaxy Schema

DWH CONCEPTS | TYPES OF SCHEMA

STAR SCHEMA

http://www.dwhlabs.com/dwh_concepts/schema_types.aspx

http://www.dwhlabs.com/dwhconcepts.aspx

http://www.dwhlabs.com/schema/galaxyschema.aspx

http://www.dwhlabs.com/schema/snowflakeschema.aspx

http://www.dwhlabs.com/schema/starschema.aspx

Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summarizing, drill-downs and specifying criteria. Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de-normalized (second normal form).

Fact table

The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query response times. The fact table typically contains records that are ready to explore, usually with ad hoc queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data warehouse environment.The primary key for the fact table is a composite of all the columns except numeric values / scores (like QUANTITY, TURNOVER, exact invoice date and time).

Typical fact tables in a global enterprise data warehouse are (apart for those, there may be some company or business specific fact tables):

sales fact table - contains all details regarding sales orders fact table - in some cases the table can be split into open orders and historical orders. Sometimes the values for historical orders are stored in a sales fact table. budget fact table - usually grouped by month and loaded once at the end of a year. forecast fact table - usually grouped by month and loaded daily, weekly or monthly. inventory fact table - report stocks, usually refreshed daily

Dimension table

Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily. The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. In a star schema design, there is only one de-normalized table for a given dimension.

Typical dimension tables in a data warehouse are:

time dimension table customers dimension table

products dimension table key account managers (KAM) dimension table sales office dimension table Star schema example

An example of a star schema architecture is depicted below.

SNOWFLAKE SCHEMA

Snowflake schema architecture is a more complex variation of a star schema design. The main difference is that dimensional tables in a snowflake schema are normalized, so they have a typical relational database design.

Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains millions of rows, the use of snowflake schemas should significantly improve performance by moving out some data to other table (with BRANDS for instance).

The problem is that the more normalized the dimension table is, the more complicated SQL joins must be issued to query them. This is because in order for a query to be answered, many tables need to be joined and aggregates generated.

An example of a snowflake schema architecture is depicted below.

GALAXY SCHEMA

For each star schema or snowflake schema it is possible to construct a fact constellation schema. This schema is more complex than star or snowflake architecture, which is because it contains multiple fact tables. This allows dimension tables to be shared amongst many fact tables.That solution is very flexible, however it may be hard to manage and support.

The main disadvantage of the fact constellation schema is a more complicated design because many variants of aggregation must be considered.

In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level.

Use of that model should be reasonable when for example, there is a sales fact table (with details down to the exact date and invoice header id) and a fact table with sales forecast which is calculated based on month, client id and product id.

In that case using two different fact tables on a different level of grouping is realized through a fact constellation model.

Normalization

What is Normalization?

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalizaton process::

Eliminating redundant data.Ensuring data dependencies.

First Normal Form

First Normal form(1 NF) sets the very basic rules for an organized database.

Eliminate duplicative columns from the same table Create seperate tables for each group of related data and identify each

row with a unique column or set of columns(the promary key)

Second Normal Form

Second Normal form(2 NF) further addresses the concept of removing duplicative data.

Meet all the requirements of teh first normal form. Remove subsets of data that apply to multiple rows of a table and place

them in seperate tables. Create relationships between these new tables and their predecessors

through the use of foreign keys.

Third Normal Form

Third Normal form(3 NF) remove columns which are not dependent upon the primary key.

Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

Next

Install

Keep the INFORMATICA CD into the CD Rom Click on CD-Drive

http://www.dwhlabs.com/etl_lab/informatica/Installation_cont.aspx

Click on the icon

Now it is showing prompt box with option Now click on " Power Center for windows" button. while in the process it asks you to configure Repository Server and

Informatica Server .That you can do it now or later on.We will tell how to cofigure it in the next few lines.

Successfully installed.

Configure Repository Server

Go to Program files Informatica power center 7.1.1 > Repository Server > Repository Setup. You see the prompt screen of Repository server with differnt options. Set the "Server Port Number" any value between 5002 to 65535. Set the "administrative password" of your own choice.It is case

sensitive. Now we successfully configured Repository Server.

Add Data Source

Click on Control panel Click on Administrator tool >Data source (ODBC) Now click on add button.It shows set of drivers of differnt data

sources.Select the right data source.For eg., If your database is Oracle then select respected oracle driver.

Cick on OK It ask for description of the Data Source.You can name it as you want

for eg., ora_data_src. Click on OK Here you added the data source successfully.

Start Services

Clcik on Control Panel Now click on Administrator Tool> Services You see the prompt display on the screen. Right click on "Informatica Repository Server" and click on start. Right click on "Informatica" and click on start. successfully started the services.

http://www.dwhlabs.com/etl_lab/informatica/Installation.aspx

Create two users

Login to your database .

Create two users one for Repository and other for Target Database. This are necessary to secure the data in the repositories. For eg., Repository (Username :: Rep_one , Password :: r) and Target

(username::trg_wh , Password :: t). After creating the users.Test it by connecting to the database.

Previous

Create Repository (Version 7.1.1)

ALl programs > Informatica Power Center 7.1.1 Informatica Power Center -client > Repository Admin Console Under console root , Right click on Informatica Repository Sever and

click on new server registration. Prompt comes with a port number. Host name :: your computer name , Port number :: Server Port

number(Repository) Now your system connected with the Repository server. Click on the Server (represented with your host name) and clcik on

connect. It asks you to enter the password .Make sure the port number and

password must be same as the Repository server. Now Right click on the repository folder and click on New Repository . Enter the Repository name and select Global Repository (optional) . Now click on LICENSE tab -- enter the keys in the following order

Product License Key, Opton Key and Connectivity License Key. Click on OK It takes some time to create repository. Now your repository keeps running.

Create Temporary Server ( Workflow Manager)



Clck on workflow manager (Informatica Client) Start the Infromtica Services Open the reepository On the menu bar , click on server and choose "Add server" Now enter the new server name and the host name must be "Computer

name" click on OK On menu bar click on "Connections" and choose the relational option. Now we have to create teo relationals one for Source and the other for

Target Database.(Be careful wen you enter login information) Successfully server created.

Configure Informatica Server

Go to All Programs Informatica Power Center 7.1.1 >Informatica Power center -Server >

Informatica Server Setup It prompts with a box ..click on Continue button. Now you find a bigger prompt with set of tabs with different options. In the first tab "Server" tab Server Name :: Temporary server name

TCP/IP Host Name :: Computer Name Click on "Repository" tab

Repository Name:: Name of the repository created at console placeRepository User :: Username of RepositoryRepository Password :: Password of RepositoryRepository Server host Name :: Computer Name

Now move to "License" tabEnter Option key , click on update Enter Connectivity , click on update

Click on ok

REPOSITORY ADMIN CONSOLEActions Create Local or Global RepositoryStart Repositories.Back up repository

Move the copy of the Repository to a different Server Disable the Repository. Export connection information. Notificy Users :: Notification message can be send to all the users connected to the

Repository Propagate Register Repositories

http://www.dwhlabs.com/etl_lab/informatica/Installation_cont.aspx

Rstore Repository Upgrade Repository

REPOSITORY MANAGER

Actions

Create Local or Global Repository Start Repositories. Back up repository Move the copy of the Repository to a different Server Disable the Repository. Export connection information. Notificy Users :: Notification message can be send to all the users connected to the

Repository Propagate Register Repositories Rstore Repository Upgrade Repository

DESIGNER MENU DESIGNER OVERVIEW

Design Mappings Represent how to move data from source to target. Design Mapplets Create Reusable and Non-Reusable Transformation Access multiple Repositories and folders templates/tables at a time. Many more features like Data Profiling , Propagate , Debugger ,Versioning etc.,

Different Data Providers/Sources

Flat File (Note pad , Excel ) Relational Data , Views , Synonyms(Oracle , Sql Server , Access) XMl Data (XML) COBOL Data

DESIGNER MENU Important Designer Tools

http://www.dwhlabs.com/etl_lab/informatica/designer.aspx

http://www.dwhlabs.com/etl_lab/informatica/designer.aspx

Source Analyzer

Create Source Definition - Import data from Flat Files/ Relational/ Application/ XML/ COBOL

Warehouse Designer

Create Target Definition

Transformation Developer

Create Reusable Transformation

Mapplet Designer

Create Mapplets (Group of transformation which can be reused in different mappings)

Mapping Designer

Create Mappings - Represents how the data to move from Source to Target Table.It consists of Source Definition , Mapplets , Transformations and Target Definition.

MAPPLETS

When you have to create a Mapplet ?

If in a need of particular set of transformations which uses same logic in multiple mappings.So that you can reuse the group of transformation in multiple mappings.

Create Mapplet 1. You can create Mapplets in Mapplet Designer Tool.2. Mapplet Input Transformation is used only when you dont want to use the Source

Definition in Mapplet Designer.3. Mapplet Output Transformation is always used when ever you create Mapplet. 4. Example Mapplet Flows

Source > Sorter > Expression > Mapplet Output Mapplet Input > Sorter > Expression > Mapplet Output

Advantages

Include source definitions. You can use multiple source definitions and source qualifiers to provide source data for a mapping.

Accept data from sources in a mapping. If you want the mapplet to receive data from the mapping, you can use an Input transformation to receive source data.

Include multiple transformations. A mapplet can contain as many transformations as you need.

Pass data to multiple transformations. You can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet.

Contain unused ports. You do not have to connect all mapplet input and output ports in a mapping.

Limitations

Must use a reusable Sequence Generator transformation. Normal Stored Procedure transformation alone can be used. Cannot use

o Normalizer transformations o COBOL sources o XML Source Qualifier transformations o XML sources o Target definitions o Other mapplets

VERSIONING

Slowly Changed Dimension

It is a Dimension which slowly changes over a time.

Slowly Changed Dimension Mapping

Type Description

SCD Type 1 Slowly Changing Dimension

Inserts new dimensions. Overwrites existing dimensions with changed dimensions. (Shows Current Data)

SCD Type 2 /Version Data

Slowly Changing Dimension

Inserts new and changed dimensions. Creates a version number and increments the primary key to track changes.

SCD Type 2 /Flag Current Slowly Changing Dimension

Inserts new and changed dimensions. Flags the current version and increments the primary key to track changes.

SCD Type 2 /Date Range Slowly Changing Dimension

Inserts new and changed dimensions. Creates an effective date range to track changes.

SCD Type 3 Slowly Changing Dimension

Inserts new dimensions. Updates changed values in existing dimensions. Optionally uses the load date to track changes.

Data Profiling

Data profiling is a technique used to analyze source data. PowerCenter Data Profiling can help

you evaluate source data and detect patterns and exceptions. PowerCenter lets you profile source data to suggest candidate keys, detect data patterns, evaluate join criteria, and determine information, such as implicit datatype.

You can use Data Profiling to analyze source data in the following situations:

During mapping development During production to maintain data quality

VERSIONING

How to Enable Version Control?

In Repository Admin Console . Select the repository for which you want to enable version control. Choose Action-Properties. Select the Supports Version Control option. Click on Ok button.

Advantage of Versioning ?

A repository enabled for version control maintains an audit trail of version history. It stores multiple versions of an object as you check out, modify, and check it in. As the number of versions of an object grows, you may want to view the object version history. You may want to do this for the following reasons:

Determine what versions are obsolete and no longer necessary to store in the repository. Troubleshoot changes in functionality between different versions of metadata.

WORKFLOW MANAGER Actions

Create Reusable tasks , Worklets , Workflows. Schedule Workflows. Configure tasks.

Workflow

A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

WorkletsA worklet is an object that represents a set of tasks.

When to create Worklets?Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to create and edit worklets.

Where to use Worklets? You can run worklets inside a workflow. The workflow that contains the worklet is called the parent workflow. You can also nest a worklet in another worklet.

TASKS

There are many tasks available , which are used to create workflows and worklets.

Types of Tasks

Task Name Reusable DescriptionSession Yes Set of instructions to run a mapping

Command Yes Specifies shell commands to run during the workflow. You can choose to run the Command task only if the previous task in the workflow completes.

Email Yes Sends email during the workflow.

Control No Stops or aborts the workflow.

Decision No Specifies a condition to evaluate in the workflow. Use the Decision task to create branches in a workflow.

Event-Raise No Represents the location of a user-defined event. The Event-Raise task triggers the user-defined event when the PowerCenter Server runs the Event-Raise task.

Event-Wait No Waits for a user-defined or a pre-defined event to occur. Once the event occurs, the PowerCenter Server completes the rest of the workflow.

Timer No Waits for a specified period of time to run the next task.

WORKFLOW MONITOR

You can monitor workflows and tasks in the Workflow Monitor. View details about a workflow or task in Gantt Chart view or Task view.

ActionsYou can run, stop, abort, and resume workflows from the Workflow Monitor.You can view the log file and Performance Data

TRANSFORMATIONS Transformation Active Passive Description

SORTER __ sorting the tables in ascending or descending and aslo to

obtain Distinct records.

RANK __ Top or bottom 'N' analysis .

JOINER __ Join two different sources cmng from different and same

location .

FILTER __ filters the rows that do not meet the condition. ROUTER __ It is useful to test multiple conditions .

AGGREGATOR

__ To perform group calculation such as count , max , min , sum , avg (mainly to perform calculation or multiple rows or group)

NORMALIZER __ Reads cobol files ( denormalized format).

Split a single row into multiple rows.

SOURCE QUALIFIER

__ It performs many tasks such as override default sql query , filtering records , join data from two or more table etcRepresents the flatfile or relational data.

UNION __ It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.

EXPRESSION

__ You can use the Expression transformation to calculate values in a single row before you write to the target.

LOOK UP

__ Use a Lookup transformation in a mapping to look up data in a flat file or a relational table, view, or synonym.

STORED PROCEDURE

__ stored procedures to automate tasks that are too complicated for standard SQL statements.You can call by using Stored Procedure Transformation.

XML SOURCE QUALIFIER

__ When you add an XML source definition to a mapping, you need to connect it to an XML Source Qualifier transformation.

UPDATE STRATEGY __ To flag rows for insert, delete, update, or

reject..

Why we have to create Mapping Parameters or Variables ?

you can use mapping parameters and variables to make mappings more flexible. You can Reuse a mapping by varing the parameters and Variables.

Represntation

$$parametername/$$variablename

Parameters A mapping parameter represents a constant value that you can define before running a session. A mapping parameter retains the same value throughout the entire session. Variables

A mapping variable represents a value that can change through the session. The PowerCenter Server saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time you run the session.

Default Values of Mapping Parameter and Variables

Data Default Value String Empty String Numeric 0Datetime 1/1/1753 A.D. ,

DEBUGGER

Actions

You can debug a valid mapping to gain troubleshooting information about data and error conditions.

Situation to run the Debugger

o Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager.

o After you run a session. If a session fails or if you receive unexpected results in your target, you can run the Debugger against the session.

1.Define Data Warehouse ?

“A subject-oriented , integrated , time-variant and non-volatile collection of data in support of management's decision making process”

2. What is junk dimension? What is the difference between junk dimension and degenerated dimension?

A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes.where as A degenerate dimension is data that is dimensional in nature but stored in a fact table.

Junk dimension: the column which we are using rarely or not used, these columns are formed a dimension is called junk dimension

Degenerative dimension: the column which we use in dimension are degenerative dimension

Ex.Emp table has empno, ename, sal, job, deptno

But We are talking only the column empno, ename from the EMP table and forming a dimension this is called degenerative dimension

3.Differnce between Normalization and Denormalization?

Normalization is the process of removing redundancies.OLTP uses the Normalization process

Denormalization is the process of allowing redundancies. OLAP/DW uses the denormalized process to capture greater level of detailed data (each and every transaction)

4. Why fact table is in normal form?

A fact table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules.

A fact table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules.

There can just be SKs within a Star schema, which itself is de-Normalized. Now, if there were then FKs on the dimensions as well, I would agree. Being in normal form, more granularity is achieved with less coding i.e. less number of joins while retrieving the fact.

5. What is Difference between E-R Modeling and Dimensional Modeling?

Basic difference is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design.

Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. Adding to the point:

E-R modeling revolves around the Entities and their relationships to capture the overall process of the system.

Dimensional model / Multidimensional Modeling revolves around Dimensions (point of analysis) for decision-making and not to capture the process.

In ER modeling the data is in normalized form. So more number of Joins, which may adversely affect the system performance. Whereas in Dimensional Modeling the data is denormalized, so less number of joins, by which system performance will improve.

6. What is conformed fact?

Conformed dimensions are the dimensions, which can be used across multiple Data Marts in combination with multiple facts tables accordingly

Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. Conformed dimensions are those tables that have a fixed structure. There will b no need to change the metadata of these tables and they can go along with any number of facts in that application

without any changes

Dimension table, which is used, by more than one fact table is known as a conformed dimension.

7. What are the methodologies of Data Warehousing?

They are mainly 2 methods.

1. Ralph Kimbell Model (Top - Down approach :: Data Warehouse --> Data Mart) Kimball model always structured as Denormalized structure.

2. Inmon Model. (Bottom - Up approach :: Data Mart --> Data Warehouse) Inmon model structured as Normalized structure.

8. What are data validation strategies for data mart validation after loading process?

Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements.

9. What is surrogate key?

Surrogate key is the primary key for the Dimensional table. Surrogate key is a substitution for the natural primary key.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or

SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult and also used in SCDs to preserve historical data.

10. What is meant by metadata in context of a Data warehouse and how it is important?

Metadata or Meta data is data about data. Examples of metadata include data element descriptions, data type descriptions, attribute/property descriptions, range/domain descriptions, and process/method descriptions. The repository environment encompasses all corporate metadata resources: database catalogs, data dictionaries, and navigation services. Metadata includes things like the name, length, valid values, and description of a data element. Metadata is stored in a data dictionary and repository. It insulates the data warehouse from changes in the schema of operational systems. Metadata Synchronization The process of consolidating, relating and synchronizing data elements with the same or similar meaning from different systems. Metadata synchronization joins these differing elements together in the data warehouse to allow for easier access.

In context of a Data warehouse metadata is meant the information about the data. This information is stored in the designer repository. Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle) in data dictionary a.k.a metadata. Metadata is also presented at the Data mart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

11. What are the possible data marts in Retail sales?

Product information, sales information

12. What is the main difference between schema in RDBMS and schemas in Data Warehouse?

RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * Cannot solve extract and complex problems * Poorly modelled

DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model

13.What is Dimensional Modeling?

In Dimensional Modeling, Data is stored in two kinds of tables: Fact Tables and Dimension tables.

Fact Table contains fact data e.g. sales, revenue, profit etc..... Dimension table contains dimensional data such as Product Id, product name, product description etc.....

Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

14. Why is Data Modeling Important?

The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.

15. What does level of Granularity of a fact table signify?

It describes the amount of space required for a database. Level of Granularity indicates the extent of aggregation that will be permitted to take place on the fact data. More Granularity implies more aggregation potential and vice-versa. In simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET. Level of granularity means the upper/lower level of hierarchy, up to which we can see/drill the data in the fact table. Level of granularity means the upper/lower level of hierarchy, up to which we can see/drill the data in the fact table.

16. What is degenerate dimension table?

The values of dimension, which is stored, in fact table is called degenerate dimensions. These dimensions don't have it's own dimensions.

17. How do you load the time dimension?

In Data warehouse we manually load the time dimension, Every Data warehouse maintains a time dimension. It would be at the most granular level at which the business runs at (ex: week day, day of the month and so on). Depending on the data loads, these time dimensions are updated. Weekly process gets updated every week and monthly process, every month.

18. Difference between Snowflake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?

Star schema and snowflake both serve the purpose of dimensional modeling when it comes to data warehouses.

Star schema is a dimensional model with a fact table (large) and a set of dimension tables (small). The whole set-up is totally denormalized.

However in cases where the dimension tables are split to many tables that are where the schema is slightly inclined towards normalization (reduce redundancy and dependency) there comes the snowflake schema.

The nature/purpose of the data that is to be feed to the model is the key to your question as to which is better.

Star schema

contains the dimension tables mapped around one or more fact tables. It is a denormalized model. No need to use complicated joins. Queries results fastly.

Snowflake schema

It is the normalized form of Star schema. Contains in depth joins, because the tables are splited in to many pieces.

We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.

19. Why do you need Star schema?

1) Less joiners contains2) Simply database3) Support drilling up options

20. Why do you need Snowflake schema?

Some times we used to provide separate dimensions from existing dimensions that time we will go to snowflake

Disadvantage Of snowflake: Query performance is very low because more joiners is there

21. What is conformed fact?

Conformed dimensions are the dimensions, which can be used across multiple Data Marts in combination with multiple facts tables accordingly

Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. Conformed dimensions are those tables that have a fixed structure. There will b no need to change the metadata of these tables and they can go along with any number of facts in that application without any changes

Dimension table, which is used, by more than one fact table is known as a conformed dimension.

22. What are conformed dimensions

They are dimension tables in a star schema data mart that adhere to a common structure, and therefore allow queries to be executed across star schemas. For example, the Calendar dimension is commonly needed in most data marts. By making this Calendar dimension adhere to a single structure, regardless of what data mart it is used in your organization, you can query by date/time from one data mart to another to another.

Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables) Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions

23. What is Fact table

A table in a data warehouse whose entries describe data in a fact table. Dimension tables contain the data from which dimensions are created. A fact table in data ware house is it describes the transaction data. It contains characteristics and key figures.

24. What are Semi-additive and faceless facts and in which scenario will you use such kinds of fact tables

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example: Current Balance and Profit Margin are the facts. Current Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information

A factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information. Common examples of factless fact tables include: - Identifying product promotion events (to determine promoted products that didn't sell) - Tracking student attendance or registration events - Tracking insurance-related accident events - Identifying building, facility, and equipment schedules for a hospital or university

http://www.dwhlabs.com/etl_lab/questions_dwh/datawarehouse/dwh_4.aspx

25. What are the Different methods of loading Dimension tables

Conventional Load: Before loading the data, all the Table constraints will be checked against the data. Direct load:(Faster Loading) All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed. Conventional and Direct load method are applicable for only oracle. The naming convension is not general one applicable to other RDBMS like DB2 or SQL server..

26.What are Aggregate tables

Aggregate tables contain redundant data that is summarized from other data in the warehouse. These are the tables which contain aggregated / summarized data. E.g Yearly, monthly sales information. These tables will be used to reduce the query execution time.

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

27. What is a dimension table

A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.

Previous Next

Send ::If you have any Data Warehouse related interview questions of a particular company.Please share with us Click Here.Mention the name of the company ,Questions related to Informatica/ business

Objects /Database etc

25. Why are OLTP database designs not generally a good idea for a Data Warehouse

OLTP cannot store historical information about the organization. It is used for storing the details of daily transactions while a datawarehouse is a huge storage of historical information obtained from different datamarts for making intelligent decisions about the organization.

26. What is the need of surrogate key; why primary key not used as surrogate key

Surrogate Key is an artificial identifier for an entity.In surrogate key values are generated by the system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe anything.




http://www.dwhlabs.com/knowledgeform.aspx



Primary Key is a natural identifier for an entity. In Primary keys all the values are entered manually by the user which are uniquely identified. There will be no repeatition of data.

Need for surrogate key not Primary Key

If a column is made a primary key and later there needs a change in the datatype or the length for that column then all the foreign keys that are dependent on that primary key should be changed making the database Unstable

Surrogate Keys make the database more stable because it insulates the Primary and foreign key relationships from changes in the data types and length.

For Example : You are extracting Customer Information from OLTP Source and after ETL process, loading customer information in a dimension table (DW). If you take SCD Type 1, Yes you can use Primary Key of Source CustomerID as Primary Key in Dimension Table. But if you would like to preserve history of customer in Dimension table i.e. Type 2. Then you need another unique no apart from CustomerID. There you have to use Surrogate Key.

Another reason : If you have AlphaNumeric as a CustomerID. Then you have to use surrogate key in Dimension Table. It is advisable to have system generated small integer number as a surrogate key in the dimension table. so that indexing and retrieval is much faster.

27. What is data cleaning? how is it done?

Data Cleansing: the act of detecting and removing and/or correcting a database's dirty data (i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly) It can be done by using the exisitng ETL tools or using third party tools like Trivillium etc.,

28. What are slowly changing dimensions

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time

29. What are Data Marts

Data Mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.

Data Mart: a data mart is a small data warehouse. In general, a data warehouse is divided into small units according the busness requirements. for example, if we take a Data Warehouse of an organization, then it may be divided into the following individual Data Marts. Data Marts are used to improve the performance during the retrieval of data.

eg: Data Mart of Sales, Data Mart of Finance, Data Mart of Maketing, Data Mart of HR etc.

30. Can a dimension table contains numeric values?

No. Only Fact Table having Numeric Fields.

31. Explain degenerated dimension in detail.

Degenerated dimension is a dimension, which is not having any source in oltp

It is generated at the time of transaction

Like invoice no this is generated when the invoice is raised

It is not used in linking and it is also not a fkey

But we can refer these degenerated dimensions as a primary key of the fact table

A Degenerate dimension is a Dimension which has only a single attribute.

This dimension is typically represented as a single field in a fact table.

The data items thar are not facts and data items that do not fit into the existing dimensions are termed as Degenerate Dimensions.

Degenerate Dimensions are the fastest way to group similar transactions.

Degenerate Dimensions are used when fact tables represent transactional data.

32. Give examples of degenerated dimensions

Degenerated Dimension is a dimension key without corresponding dimension. Example:

In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.Therefore, POS Transaction Number is a degenerated dimension.

33. What are the steps to build the data warehouse

1.Gathering bussiness requiremnts

• Identifying Sources

• Identifying Facts

• Defining Dimensions

• Define Attribues

• Redefine Dimensions & Attributes

• Organise Attribute Hierarchy & Define Relationship

• Assign Unique Identifiers

• Additional convetions:Cardinality/Adding ratios

• Understand the bussiness requirements.

2.Once the business requirements are clear then Identify the Grains(Levels).

3.Grains are defined; design the Dimensional tables with the Lower level Grains.

4.Once the Dimensions are designed, design the Fact table With the Key Performance Indicators (Facts).

5.Once the dimensions and Fact tables are designed define the relation ship between the tables by using primary key and Foreign Key. In logical phase data base design looks like Star Schema design so it is named as Star Schema Design

34. What is the different architecture of data warehouse

1. Top down - (bill Inmon)

2. Bottom up - (Ralph kimbol)

There are three types of architectures.

• Date warehouse Basic Architecture:

In this architecture end users access data that is derived from several sources through the data warehouse.

Architecture: Source --> Warehouse --> End Users

• Data warehouse with staging area Architecture:

Whenever the data that is derived from sources need to be cleaned and processed before putting it into warehouse then staging area is used.

Architecture: Source --> Staging Area -->Warehouse --> End Users

• Data warehouse with staging area and data marts Architecture:

Customization of warehouse architecture for different groups in the organization then data marts are added and used.

Architecture: Source --> Staging Area --> Warehouse --> Data Marts --> End Users

Q1>How do u change parameter when u move it from development to production. Ans :: We can manually move the parameter file and save in prod server. (Posted :: Vijay )

Q2>How do u retain variable value when u move it from development to production.Ans :: while moving the variables to prod make sure that you assign the default value while creating the variables in dev environment.when the code was moved it will check the repository and see if there is any value for that variable if there is no value then it takes the default value. (Posted :: Vijay )

Q3>How do u reset sequence generator value u move it from development to productionAns :: Keep the sequence value as 1 in dev and move the code to prod. (Posted :: Vijay )

Q4>How to delete duplicate values from UNIX. Ans :: UNIQ <filename>

Q5>How to find no.of rows commited to the target, when a session fails.Ans :: Log file

Q6>How to remove the duplicate records from flat file (other than using sorter trans. and mapping variables) Ans :: (i)Dynamic Lookup (ii) sorter and aggregator

Q7>How to generate sequence of values in which target has more than 2billion records.(with sequence generator we can generate upto 2 billion values only) Ans :: Create a Stored Procedure in database level and call it using storedprocedure transformation.

Q8>I have to generate a target field in Informatica which doesn exist in the source table. It is the batch number. There are 1000 rows altogether. The first 100 rows should have the same batch number 100 and the next 100 would have the batch numbe 101 and so on. How can we do using informatica?Ans :: develop a mapping flow

Source > sorter > sequencegenerator (generate numbers)> expression (batchnumber , decode function) > target

Expression :: decode(nexval<=100, nextval ,Nextval>100 and Nextval<=200,Nextval+1,Nextval>200 and nextval<=300,nextval+2 ,

Nextval>900 and nextval<=1000,nextval+10,0 )

Q9>Lets take that we have a Flat File in the Source System and It was in the correct path then when we ran the workflow and we got the error as "File Not Found", what might be the reson? Ans :: Not entered “source file name” properly at the session level

Q10>How to load 3 unstructured flat files into single target file? Ans :: Indirect file option (configure at session level)

Q11>There are 4 columns in the table

target Definition :: Store_id, Item, Qty, Price 101, battery, 3, 2.99 101, battery, 1 , 3.19 101, battery, 2, 2.59

Source Definition:: 101, battery, 3, 2.99 101, battery, 1 , 3.19 101, battery, 2, 2.59 101, battery, 2,17.34

How can we do this using Aggregator?

Ans :: Source > aggregator (group by on store_id , item , qty ) > target

Tip :: aggregator will sort the data in descending order if u dnt use sorted input.

Q12> in the source qualifer if the default query is not generated... what is the reason...? how to slove it? Ans :: (i)if source is flat file you cannot use this feature in source qualifier

(ii)In case if u are using the realational file as source and if u forget make the connection to the next transformation from source qulaifier .u cannot gerate SQL query

ABBREVIATIONS

ASCII American Standard Code for Information Interchange

BI Business Intelligence

BO Business Object

BPM Business Process Modeling

C/S Client/Server

DBA Database Administrator

DBMS Database Management System

DDL Data Definition Language

DM Data Modeling

DSN Data Source Name

DSS Decision Support System

DW Data Warehouse

ERD Enterprise Relationship Diagram

ERP Enterprise Relationship Planning

ETL Extract Transformation Loading

GB Giga Bytes

GUI Graphical User Interface

HOLAP Hybrid OnLine Analytical processing

HTML Hyper Text Markup Language

JDBC Java Database Connectivity

MB Mega Bytes

MDBMS Multi-dimensional Data Base Management System

MOLAP Multi-dimensional On Line Analytical Processing

ODBC Open Data Base Connectivity

ODS Operational Data Store

OLAP On Line Analytical Processing

OLTP On Line Transaction Processing

OS Operating System

PCS Power Center Server

QA Quality Assurance

RDBMS Relational Data Base Management System

ROLAP Relational On lIne Analytical Processing

SCD Slowly Changing Dimension

SQA Software Quality Assurance

SRS Software Requirement Specification

TB Tera Bytes

TCP/IP Transmission Control Protocol/Internet Protocol

VPN Virtual Private Network

XML eXtensible Markup Language

What are the tasks that Loadmanger process will do?

ANS;

Manages the session and batch scheduling: Whe you start the informatica server the load maneger launches and queries the repository for a list of sessions configured to runon the informatica server.When you configure the session the loadmanager maintains list of list of

sessions and session start times.When you sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.

Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents you starting the session again and again.

Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the fileVerifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session.

Creating log files: Loadmanger creates logfile contains the status of session.

What is DTM process?

ANS:

After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.I creates themaster thread.Master thread creates and manges all the other threads.

DTM means data transformation manager.in informatica this is main back ground process.it run after complition of load manager.in this process informatica server search source and tgt connection in repository if it correct then informatica server fetch the data from source and load it to target.

Informatica integration server maintains two types executing process.1. Load manager2.Data transfer Manager

DTM is to read from Source by using the Readers thread and loading the data into target by using writer thread

What are the different threads in DTM process?

ANS:

Master thread: Creates and manages all other threadsMaping thread: One maping thread will be creates for each session.Fectchs session and maping information.Pre and post session threads: This will be created to perform pre and post session operations.Reader thread: One thread will be created for each partition of a source.It reads data from source.

Writer thread: It will be created to load data to the target.Transformation thread: It will be created to tranform data.

What are the data movement modes in informatcia?

ANS:

Datamovement modes determines how informatcia server handles the charector data.U choose the datamovement in the informatica server configuration settings.Two types of datamovement modes avialable in informatica:-ASCII modeUni code mode.

What are the out put files that the informatica server creates during the session running?

ANS:

Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log). It also creates an error log for errormessages. These files will be created in informatica home directory:-

Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sqlcommands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that you set.

Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.Ucan view this file by double clicking on the session in monitor window

Performance detail file: This file contains information known as session performance details which helps you where performance can be improved.To genarate this file selectthe performance detail option in the session property sheet.

Reject file: This file contains the rows of data that the writer does notwrite to targets.

Control file: Informatica server creates control file and a target file when you run a session that uses the external loader.The control file contains the information about thetarget flat file such as data format and loading instructios for the external loader.Post session email: Post session email allows you to automatically communicate information about a session run to designated recipents.You can create two differentmessages.One if the session completed sucessfully the other if the session fails.

Indicator file: If you use the flat file as a target,You can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicatewhether the row was marked for insert,update,delete or reject.output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.

Cache files: When the informatica server creates memory cache it also creates cache files.

For the following circumstances informatica server creates index and datacache files:-Aggreagtor transformationJoiner transformationRank transformationLookup transformation

In which circumstances that informatica server creates Reject files?

ANS:

When it encounters the DD_Reject in update strategy transformation.Violates database constraintFiled in the rows was truncated or overflowed.

What is polling?

ANS:

It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when you poll the informatica server.

-----------------------------------------------------------------------------------------------------------------------------------------

Can you copy the session to a different folder or repository?

ANS:Yes. By using copy session wizard You can copy a session in a different folder or repository. But that target folder or repository should consists of mapping of that session.If target folder or repository is not having the maping of copying session ,You should have to copy that maping first before you copy the session.

What is batch and describe about types of batches?

ANS:

Grouping of session is known as batch. Batches are two types:-Sequential: Runs sessions one after the otherConcurrent: Runs session at same time.If you have sessions with source-target dependencies you have to go for sequential batch to start the sessions one after another.If you have several independent sessions You can use concurrent batches which runs all the sessions at the same time.

Can you copy the batches?

ANS: NO.

What are the session parameters?

ANS:

Session parameters are like maping parameters,represent values you might want to change between sessions such as database connections or source files.Server manager also allows you to create userdefined session parameters.Following are user defined session parameters:-Database connectionsSource file names: use this parameter when you want to change the name or location ofsession source file between session runs.Target file name : Use this parameter when you want to change the name or location ofsession target file between session runs.Reject file name : Use this parameter when you want to change the name or location ofsession reject files between session runs.

What is parameter file?

ANS:

Parameter file is to define the values for parameters and variables used in a session.A parameterfile is a file created by text editor such as word pad or notepad.You can define the following values in parameter file:-Maping parametersMaping variablessession parameters.

Performance tuning in Informatica?

ANS:The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the sessionperformance by following.The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local diskmoves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections.Flat files: If u?r flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.Relational datasources: Minimize the connections to sources ,targets and informatica server toimprove session performance.Moving target database into server system may improve sessionperformance.Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.Removing of staging areas may improve session performance.You can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in onebyte.Unicode mode takes 2 bytes to store a character.If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY orGROUP BY clause may benefit from optimization such as adding indexes.We can improve the session performance by configuring the network packet size,which allowsdata to cross the network at one time.To do this go to server manger ,choose server configure database connections.If u are target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run thesession and rebuild them after completion of session.Running a parallel sessions by using concurrent batches will also reduce the time of loading thedata.So concurent batches may also increase the session performance.Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines.In some cases if a session contains a aggregator transformation ,You can use incremental aggregation to improve session performance.Aviod transformation errors to improve the session performance.If the sessioin containd lookup transformation You can improve the session performance by enabling the look up cache.If U?r session contains filter transformation ,create that filter transformation nearer to the sourcesor You can use filter condition in source qualifier.Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session

performance in this case use sorted ports option.Increase the temporary database space also improves the performance.

1) In a single flat file, if am having multiple delimiters,then how can i load the flat file?

2)If a flat file is " , " delimited flat file and while loading data one of my field or attribute is having this "," in the field.then how can i handle this case..?

3)while loading multiple flat files using Indirect loading method, how can i generate list file,if am having n number of flat files of similar structure?

4)While loading multiple flat files using Indirect loading,i want to load data into one target and the file names into one target? How u can do this?

ANS:1) According to my knowledge. check the below answers.

1) Not possible to load multiple delimiters. Informatica can't handle this case. Request the source data team to send in only single delimiter.2)Informatica can't consider that field value with ','. Either request source system to change the delimiter to some other delimiter.

3)List of files will be appended in file list file using UNIX Script. Once all the files are FTP to source Directory.. Unix script will generate the file list with all the files exists in the source file directory.

4) In mapping we can get the file name including field values. connect that file name field into sorter or aggregator to get the distinct file names and load into filename target table or file. remaining fields other than filename connect to other target.

Let me know if you required any further information.2) From Informatica 8.x Onwards it possible

1) Enable the below propertyColumn Delimiters :

One or more characters used to separate columns of data. Delimiters can be either printable or single-byte unprintable characters, and must be different from the escape character and the quote character (if selected). To enter a single-byte unprintable character, click the Browse button to the right of this field. In the Delimiters dialog box,

select an unprintable character from the Insert Delimiter list and click Add. You cannot select unprintable multibyte characters as delimiters. Maximum number of delimiters is 80.

2) Enable the Below Property

Optional Quotes

Select No Quotes, Single Quote, or Double Quotes. If you select a quote character, the Integration Service ignores delimiter characters within the quote characters. Therefore, the Integration Service uses quote characters to escape the delimiter. For example, a source file uses a comma as a delimiter and contains the following row: 342-3849, ‘Smith, Jenna’, ‘Rockville, MD’, 6. If you select the optional single quote character, the Integration Service ignores the commas within the quotes and reads the row as four fields.

4)Enable the Below Property

Add Currently Processed Flat File Name Port.

The Designer adds the CurrentlyProcessedFileName port as the last column on the Columns tab. The CurrentlyProcessedFileName port is a string port with default precision of 256 characters.

a Interview Question

Documents

Transcript of a Interview Question