ODI Waiting for a file and processing it · PDF fileWhen ODI has detected files which match...
Transcript of ODI Waiting for a file and processing it · PDF fileWhen ODI has detected files which match...
Waiting for a file and processing it
One of the perennial issues developers have is that they want to wait for incoming files in
a directory, and when those files arrive, to process each one in a scenario. ODI provides
tools to enable you to wait for files (OdiWaitForFile tool) and to take an action after that.
The complexity comes when you want to wait for files (many of which may come in at a
time) and process each individually. In order to address this issue, we need to add a little
cunning to our approach.
The starting point is to create a new ODI procedure, with a set of options:
This procedure will be used in the package which we create. The package will have three
steps:
• An OdiWaitForFile step, which polls the appropriate directory, waiting for files to
be ready to process. (Note that it is best practice when moving files, especially
when receiving files transmitted using FTP to actually move two files, the file you
want and an associated marker file, which is moved AFTER the transmit of the
actual file. This technique gets round the problem caused by files created with a
“slow write” being processed before they are ready. The marker file may contain
the name of the file you want, or be named similarly, or only be transmitted on
completion of all files in the transmission)
• “Process Waiting Files” is the procedure we create to be able to deal with the
file(s) which need to be processed.
• A “Start Self” step , which initiates the execution of the whole Wait and process
package. (the reason we don’t just loop is so that we can see that task has run,
and potentially purge the log of completed executions)
In the Package this image shows the OdiFileWait parameters. In this case I am waiting
on the d:/Temp directory for all files with a .zip extension. I have told ODI not to do
anything with the file (“Action: NONE”) although it may be an idea to move the file to a
“processing” directory. In this case, I have not used any of the other options, where I can
for instance specify the number of files to wait for (I might want to process batches).
When ODI has detected files which match the criteria specified, it will move on to the
next step in the process, which is my procedure to process the incoming files. This
process has the set of parameters shown below, which I need to fill in:
Of the options shown on this one, the
• PATTERN is for the files that we actually want to process – not those we are
waiting on (which may be different).
• SCEN_NAME is the name of the scenario to be executed for each of the files
• SYNC_MODE is 1 for Synchronous and 2 for Asynchronous. Be careful with
this one, as if you execute asynchronously, you will get as many concurrent
executions of the scenario as there are files. If the tasks within the scenario have
not been specifically modified to allow for concurrent execution, you may have
problems. (by default for instance all the temporary tables for a particular
interfaces will use the same name)
• INCOMING_DIRECTORY is where the files for processing are located. This
will be the same as the first step, unless you chose the MOVE option, to move the
files to a separate directory
The last step I have put in the package is the execution of itself, asynchronously. This is
using the SESS_NAME as the name of the scenario to start. This may not be correct if
the original scenario was started using a NAME= parameter.
Procedure Detail
This is the detail of the procedure I created. As you can see it has only four steps. It
could be done in less, but this makes the process more readable.
Figure 1 The steps of the file processing procedure
The first of the steps is the one to create the temporary table I will use to store the names
of the file to process. I have put this table in the Sunopsis Memory Engine, an in-
memory database functionality which is part of the ODI product.
In this first step, as I only have one command, I put it in the “Command on Target” tab.
The first part of this gets us the JDBC connection we will use, and to get the parameters
of that, on the “Command on Source” tab, I have set the parameters of the database I
wish to use (see the following image) As the table will be created in memory in the
execution agent, and it should be disposed of on completion of the session, there should
be little chance of this taking up too much memory. I have also created it with the
sessionid as well as the filename, in case.
Figure 2 Command on target for the create table step of the procedure
Figure 3Command on Source for the create table step of the procedure
Next is the step to retrieve the list of files, and insert them into the newly created table (in
memory). Here we are using some native functionality of the Jython scripting
environment. The glob.glob(filepattern) will return a collection of the names, which we
can then use to insert into the database with the INSERT statements.
Figure 4Command on Source for the Retrieve file list step of the procedure
“Command on Source” for the step which retrieves the file names from the table. Note
that we set the Technology and Schema here to match the memory engine, pre-defined in
Topology.
Figure 5 Execute Scenario for each file step "Command on Source"
For the “Command on Target” we execute the OdiTool command “OdiStartScen” once
for each of the files retrieved on the “Command on Source”. Note that to get it to
substitute the value of FILENAME in the resultset from the source command, we use the
‘#’ prefix. Note also that the name of the variable I am passining in to each of the
scenarios is here set as “ORACLE.FileToBeProcessed”. This implies that I have a
project with a code “ORACLE”, and the variable is called “FileToBeProcessed”. It
might be better to use a Global Variable, to eliminate the need to edit this in the
procedure, and a variable “FileToBeProcessed” needs to be created. This does suppose
that the scenario you are starting “DECLARE”s this variable, which may then be used in
the scenario – including in resource names (file names) etc
Figure 6 Execute Scenario for each file step "Command on Target"
The last Step in the procedure is just a “tidy-up” step, to drop the table I created earlier.
As there is only one command, this goes on the “Command on Target” tab.
Figure 7 Last step, Drop Table, Clean up after yourself
Appendix 1: Getting the value of parameters passed to a scenario
One last extra little tidbit which may be useful: if you are passing variables into a
scenario and want to know that those variables have been set, by default these are not
shown in the log. A couple of workarounds exist to get that information:
1) Put a tool step in your package, something like an OdiSleep, and modify the code
to be executed as illustrated:
The result of which will show in the log as follows:
2) The second option is to do a similar thing, but to put the code into a procedure as
follows:
As you can see, here I simply made a Java BeanShell step and put the commented (/*
*/) code into there, so it is ignored by the interpreter.