Post on 02-Dec-2015
description
Custom StagesCustom Stages
2© 2002. Infosys Technologies Ltd.
Agenda
Introduction
Types of Stages
How to build Custom Stages
3© 2002. Infosys Technologies Ltd.
Introduction
Data stage provides large no of inbuilt Stages to extract and transform data.
In addition to existing Stages, also provides capability to build custom Stages.
4© 2002. Infosys Technologies Ltd.
Types of Stages
There are three different types of Stages that can be built.
Custom
– use an existing Orchestrate operator as a Stage and use in parallel jobs.
Build
– Creator own operators and use them in Stage.
Wrapper
– Specify a UNIX command as a Stage and use it.
5© 2002. Infosys Technologies Ltd.
Custom Stage
Custom Stages use already existing Orchestrate operators.
Steps in defining Custom Stages.
• Select the category from repository.
• Select File -> New Parallel Stage -> Custom
• On General page specify the name of the operator to be used.
• On Links page specify the maximum and minimum no of input and output links.
• On Properties page specify the properties.
6© 2002. Infosys Technologies Ltd.
Wrapped Stages
Wrapper Stages use UNIX commands.
When defining a Build stage you provide the following information:
• Details of the UNIX command that the stage will execute.
• Description of the data that will be input to the stage.
• Description of the data that will be output from the stage.
• Definition of the environment in which the command will execute.
– Unix command can be any command like sort, grep, a script, etc.
7© 2002. Infosys Technologies Ltd.
Build Stages
Enables you to create own operators.
Written in C++
Gives advantage of programming language control.
8© 2002. Infosys Technologies Ltd.
Build Stages
Buildop provides a simple means of creating own operator. It does not use an existing operators or executable
Reasons to use Buildop include:
Functionality of Multiple Stages can be combined into
Complex business logic that cannot be easily using existing stages
Lookups across a range of values
Surrogate key generation
Better Performance as there is no unwanted functionality
Buildop is reusable. It can used within a project as well as exported and used in other projects also
9© 2002. Infosys Technologies Ltd.
Build Stage
Interface is similar to wrapper Stage.
When defining a Build Stage one needs to provide
– Input interface/schema
– Output interface /schema
– Transfer type, if “Auto Transfer” is selected all the input columns are output.
– Header files and definitions
– Code to be executed before the stage
– Code to be executed for each record input
– Code to be executed after the stage
10© 2002. Infosys Technologies Ltd.
Build Stage
11© 2002. Infosys Technologies Ltd.
Steps
Steps for defining a Build Stage
1. Select the Stage Types category in which the Stage is to be created
2. Choose File New Parallel Stage Build from the main menu or New Parallel Stage -> Build from the shortcut menu.
• General tab has Stage Type, Category, Operator by default the Build Stage name and Class name by default the Build Stage name.
• Creator tab has generic information about the version of build Stage, Author name, copy right information.
• Properties page all the options to be passed to Build Stage as run time options are defined.
• Build page contains three tabs
• Interfaces – This page contains input and output interfaces/schemas defined.
• Logic – This tab contains three sections Pre – Loop, Per – Record and Post – Loop
• Advanced
12© 2002. Infosys Technologies Ltd.
Build Stage Macros
There are a number of macros you can use when specifying Pre-Loop, Per-Record, and Post-Loop code.
• Informational
• Flow-control
• Input and output
• Transfer
13© 2002. Infosys Technologies Ltd.
This slide shows Interfaces tab in Build page. This tab contains the input and output interfaces defined.
14© 2002. Infosys Technologies Ltd.
Build Stage Macros
Informational Macros
These macros are used to determine the number of inputs, outputs,and transfers
• inputs() - returns the number of inputs to the stage.
• outputs() - returns the number of outputs from the stage.
• transfers() - returns the number of transfers in the stage.
Flow-Control Macros
These macros used to override the default behavior of the Per-Record loop in stage definition
• endLoop() - stops looping after completion of the current loop after writing any auto outputs for this loop.
• nextLoop() - immediately move control to the start of next loop
• failStep() - return a failed status and terminate the job
15© 2002. Infosys Technologies Ltd.
Build Stage Macros
Input and Output Macros
The following macros are available:
• readRecord(input) - reads the next record from input, if there is one. If there is no record, the next call to inputDone() will return false.
• writeRecord(output) - writes a record to output.
• inputDone(input) - returns true if the last call to readRecord() for the specified input failed to read a new record, because the input has no more records.
• holdRecord(input) - auto input is suspended for the current record
• discardRecord(output) - auto output is suspended for the current record, so that the operator does not output the record at the end of the current loop.
• discardTransfer(index) - auto transfer is suspended
16© 2002. Infosys Technologies Ltd.
Build Stage Macros
Transfer Macros
The following macros are available:
• doTransfer(index) – transfers data specified by index.
• doTransfersFrom(input) - transfers input from the index specified.
• doTransfersTo(output) - transfers output to the index specified.
• transferAndWriteRecord(output) - transfers and writes a record for the specified output. Calling this macro is equivalent to calling the macros doTransfersTo() and writeRecord().
17© 2002. Infosys Technologies Ltd.
Build Stage
This page contains all header file information and definitions
18© 2002. Infosys Technologies Ltd.
Example
Definitions tab contains Header files and definitions#include "apt_util/string.h"
#include "apt_util/ints.h"
int iHold = 0;
int iVar = 0;
int iCounter=0;
struct extract_type
{
long long gst_i;
long long mail_addr_i;
char surname[32];
long long acct_cd_seq_i;
long long dummy_grp_seq_i;
char grp_end_d[10];
};
struct extract_type extract_rec[100];
19© 2002. Infosys Technologies Ltd.
Example
Pre Loop section contains Code to be executed before processing of input.
Per Record section.
This section contains logic to be implemented for each record.
if (input.MAIL_ADDR_I!=tempMail )
{
// reading first record
extract_rec[i].gst_i=input.GST_I;
extract_rec[i].mail_addr_i=input.MAIL_ADDR_I;
extract_rec[i].acct_cd_seq_i=input.ACCT_CD_SEQ_I;
// Begin of Grouping logic
Each of the input column is accessed as input.Column where input is the name of input interface
20© 2002. Infosys Technologies Ltd.
Per Record section contains the code to be executed for each of the input record.This page shows code to be executed for each record.
21© 2002. Infosys Technologies Ltd.
Example
Code is written in C++ same as any C++ program without main//write output to output interface
for ( m=0;m<i;m++)
{ output.GST_I=extract_rec[m].gst_i;
output.MAIL_ADDR_I=extract_rec[m].mail_addr_i;
output.ACCT_CD_SEQ_I=extract_rec[m].acct_cd_seq_i;
output.PRIM_LAST_NAME=extract_rec[m].surname;
// Writing the record to Output
writeRecord(output.portid_);
}
Data is transferred to output interface by assigning the computed values to output interface using output.Column where output is the interface name.
Output is written by calling writeRecord(output) macro. It transfers the data to output interface.
22© 2002. Infosys Technologies Ltd.
Example
Post Loop section contains code to be executed after the processing. This is same as Pre Loop and Per Record sections but is executed after completion of Per Record section.