ODI Waiting for a file and processing it · PDF fileWhen ODI has detected files which match...

Waiting for a file and processing it

One of the perennial issues developers have is that they want to wait for incoming files in

a directory, and when those files arrive, to process each one in a scenario. ODI provides

tools to enable you to wait for files (OdiWaitForFile tool) and to take an action after that.

The complexity comes when you want to wait for files (many of which may come in at a

time) and process each individually. In order to address this issue, we need to add a little

cunning to our approach.

The starting point is to create a new ODI procedure, with a set of options:

This procedure will be used in the package which we create. The package will have three

steps:

• An OdiWaitForFile step, which polls the appropriate directory, waiting for files to

be ready to process. (Note that it is best practice when moving files, especially

when receiving files transmitted using FTP to actually move two files, the file you

want and an associated marker file, which is moved AFTER the transmit of the

actual file. This technique gets round the problem caused by files created with a

“slow write” being processed before they are ready. The marker file may contain

the name of the file you want, or be named similarly, or only be transmitted on

completion of all files in the transmission)

• “Process Waiting Files” is the procedure we create to be able to deal with the

file(s) which need to be processed.

• A “Start Self” step , which initiates the execution of the whole Wait and process

package. (the reason we don’t just loop is so that we can see that task has run,

and potentially purge the log of completed executions)

In the Package this image shows the OdiFileWait parameters. In this case I am waiting

on the d:/Temp directory for all files with a .zip extension. I have told ODI not to do

anything with the file (“Action: NONE”) although it may be an idea to move the file to a

“processing” directory. In this case, I have not used any of the other options, where I can

for instance specify the number of files to wait for (I might want to process batches).

When ODI has detected files which match the criteria specified, it will move on to the

next step in the process, which is my procedure to process the incoming files. This

process has the set of parameters shown below, which I need to fill in:

Of the options shown on this one, the

• PATTERN is for the files that we actually want to process – not those we are

waiting on (which may be different).

• SCEN_NAME is the name of the scenario to be executed for each of the files

• SYNC_MODE is 1 for Synchronous and 2 for Asynchronous. Be careful with

this one, as if you execute asynchronously, you will get as many concurrent

executions of the scenario as there are files. If the tasks within the scenario have

not been specifically modified to allow for concurrent execution, you may have

problems. (by default for instance all the temporary tables for a particular

interfaces will use the same name)

• INCOMING_DIRECTORY is where the files for processing are located. This

will be the same as the first step, unless you chose the MOVE option, to move the

files to a separate directory

The last step I have put in the package is the execution of itself, asynchronously. This is

using the SESS_NAME as the name of the scenario to start. This may not be correct if

the original scenario was started using a NAME= parameter.

Procedure Detail

This is the detail of the procedure I created. As you can see it has only four steps. It

could be done in less, but this makes the process more readable.

Figure 1 The steps of the file processing procedure

The first of the steps is the one to create the temporary table I will use to store the names

of the file to process. I have put this table in the Sunopsis Memory Engine, an in-

memory database functionality which is part of the ODI product.

In this first step, as I only have one command, I put it in the “Command on Target” tab.

The first part of this gets us the JDBC connection we will use, and to get the parameters

of that, on the “Command on Source” tab, I have set the parameters of the database I

wish to use (see the following image) As the table will be created in memory in the

execution agent, and it should be disposed of on completion of the session, there should

be little chance of this taking up too much memory. I have also created it with the

sessionid as well as the filename, in case.

Figure 2 Command on target for the create table step of the procedure

Figure 3Command on Source for the create table step of the procedure

Next is the step to retrieve the list of files, and insert them into the newly created table (in

memory). Here we are using some native functionality of the Jython scripting

environment. The glob.glob(filepattern) will return a collection of the names, which we

can then use to insert into the database with the INSERT statements.

Figure 4Command on Source for the Retrieve file list step of the procedure

“Command on Source” for the step which retrieves the file names from the table. Note

that we set the Technology and Schema here to match the memory engine, pre-defined in

Topology.

Figure 5 Execute Scenario for each file step "Command on Source"

For the “Command on Target” we execute the OdiTool command “OdiStartScen” once

for each of the files retrieved on the “Command on Source”. Note that to get it to

substitute the value of FILENAME in the resultset from the source command, we use the

‘#’ prefix. Note also that the name of the variable I am passining in to each of the

scenarios is here set as “ORACLE.FileToBeProcessed”. This implies that I have a

project with a code “ORACLE”, and the variable is called “FileToBeProcessed”. It

might be better to use a Global Variable, to eliminate the need to edit this in the

procedure, and a variable “FileToBeProcessed” needs to be created. This does suppose

that the scenario you are starting “DECLARE”s this variable, which may then be used in

the scenario – including in resource names (file names) etc

Figure 6 Execute Scenario for each file step "Command on Target"

The last Step in the procedure is just a “tidy-up” step, to drop the table I created earlier.

As there is only one command, this goes on the “Command on Target” tab.

Figure 7 Last step, Drop Table, Clean up after yourself

Appendix 1: Getting the value of parameters passed to a scenario

One last extra little tidbit which may be useful: if you are passing variables into a

scenario and want to know that those variables have been set, by default these are not

shown in the log. A couple of workarounds exist to get that information:

1) Put a tool step in your package, something like an OdiSleep, and modify the code

to be executed as illustrated:

The result of which will show in the log as follows:

2) The second option is to do a similar thing, but to put the code into a procedure as

follows:

As you can see, here I simply made a Java BeanShell step and put the commented (/*

*/) code into there, so it is ignored by the interpreter.

ODI Waiting for a file and processing it · PDF fileWhen ODI has detected files which match...

Documents

Transcript of ODI Waiting for a file and processing it · PDF fileWhen ODI has detected files which match...