DS Routine

download DS Routine

of 12

description

a

Transcript of DS Routine

Abstract

DataStage is a powerful ETL tool with lot of inbuilt stages/routines which can do most of the functionalities required; for those things DataStage EE cant do, there are parallel routines which can be written in C++. Parallel routine are invoked by parallel jobs. Compared with server routine, it is mainly used in transform stage but can not be used in a job sequence as a kind of job control method.The paper mainly introduce how to create and use an parallel routine in parallel job.

Introduction

We can use the Parallel Routine window to create, view, or edit a parallel job routineThere are two types of parallel routine: External Function.This calls a function from a UNIX shared library, and may be used anywhere an expression can be defined. Any external function defined appear in the expression editor operand menu under Parallel Routines. External Before/After Routine.This calls a routine from a UNIX shared library, and can be specified in the Triggers page of a transformer stage Properties dialog box.

Tutorial of creating and invoking a Parallel Routine

Parallel routines are C++ components built and compiled external to DataStage. Note - they must be compiled as C++ components, not C. It is that we can only compile the C/C++ program with g++ instead of gcc.

Here's the typical sequence of steps for creating a DataStage parallel routine: Create --> Compile --> Link --> Execute

1) Create

Create a C/C++ program with main() Test it and if successful remove the main() The following c file ParaTest.c:#include int trans(int i){if(i>5)return i;else return i+5;}main(){int a = 6;printf(%d, trans(a));}Testing the program and if it runs successfully. Then rewrite the program without main()#include int trans(int i){if(i>5)return i;else return i+5;}And saved as IntTest.c.

2) Compile

Compile using the compiler option specified under APT_COMPILEOPT. g++ -O -fPIC -Wno-deprecated -c IntTest.cand will generate an object file named IntTest.o.

Note:Compiler and compiler options can be found in "DataStage --> Administrator --> Properties --> Environment --> Parallel --> Compiler" and create an object (*.o) file and put this object file onto this directory.

3) Link Use the Parallel Routine window to create, view, or edit a parallel job routineAnd link the above object (*.o) as IntTest.o to a DataStage Parallel routine by making the relevant entries in General tab:

Routine Name: myRoutineType: External Function Object Type: Object External subroutine name: trans Function Name specified inside your C/C++ programLibrary Path: /home/dsadm/4Train/ParaRoutine/IntTest.oAlso specify the Return Type and if you have any input parameters to be passed specify that in Arguments tab. Because the function will return an int value so we choose the return type as int.

The arguments tab:The job will transfer an argument to the function trans, we can give an argument name i.The native type is the argument type will is transferred by the job which will invoke the routine. The default type is int. If the data type we handle in the job is char or other types, we should define the type such as char*.

4) Execute

Now your parallel routine will be available inside your job. Include and compile your job and execute.Create a testing job and call this parallel routine inside your job. In the transformer call this routine in your output column derivation. Compile and run the job.Create a job named paraRoutine1 as the following snapshot shows:

After ran the job successfully, we can view the result. It is obviously that the data which value Enviroment to set the LD_LIBRARY_PATH variable.

Now rerun the job and view the result: