Sequentional File

4

Click here to load reader

Transcript of Sequentional File

Page 1: Sequentional File

7/25/2019 Sequentional File

http://slidepdf.com/reader/full/sequentional-file 1/4

Sequential File:

The Sequential File stage is a file stage. It allows you to read data from or write

data to one or more flat files as shown in Below Figure:

The stage executes in parallel mode by default if

reading multiple files but executes sequentially if it is only reading one file.

In order read a sequential file datastage needs to know about the format of the file.

If you are reading a delimited file you need to specify delimiter in the format tab.

Reading Fixed width File:

ouble click on the sequential file stage and go to properties tab.

Source:

File:!i"e the file name including path

Read Method:#hether to specify filenames explicitly or use a file pattern.

Important Options:

First Line is Column Names:If set true$ the first line of a file contains column names on writingand is ignored on reading.

Keep File artitions:Set True to partition the read data set according to the organi%ation of the

input file&s'.

Re!ect Mode: (ontinue to simply discard any re)ected rows* Fail to stop if any row is re)ected*

+utput to send re)ected rows down a re)ect link.

For fixed,width files$ howe"er$ you can configure the stage to beha"e differently:- ou can specify that single files can be read by multiple nodes. This can impro"e performance

on cluster systems.

- ou can specify that a number of readers run on a single node. This means$ for example$ that asingle file can be partitioned as it is read.

These two options are mutually exclusi"e.

Scenario ":

Page 2: Sequentional File

7/25/2019 Sequentional File

http://slidepdf.com/reader/full/sequentional-file 2/4

/eading file sequentially.

Scenario #:

/ead From 0ultiple 1odes 2 es

+nce we add /ead From 0ultiple 1ode 2 es then stage by default executes in 3arallel mode.

If you run the )ob with abo"e configuration it will abort with following fatal error.

sff4SourceFile: The multinode option requires fixed length records.&That means you can use this

option to read fixed width files only'

In order to fix the abo"e issue go the format tab and add additions parameters as shown below.

Page 3: Sequentional File

7/25/2019 Sequentional File

http://slidepdf.com/reader/full/sequentional-file 3/4

 1ow )ob finished successfully and please below datastage monitor for performanceimpro"ements compare with reading from single node.

Scenario $:/ead elimted file with By 5dding 1umber of /eaders 3ernode instead of

multinode option to impro"e the read performance and once we add this option sequential file

stage will execute in default parallel mode.

If we are reading from and writing to fixed width file it is always good practice to add

53T4ST/I1!435(65/ atastage 7n" "ariable and assign 898 as default "alue then it will

 pad with spaces $otherwise datastage will pad null "alue&atastage efault padding character'.

5lways ;eep /e)ect 0ode 2 Fail to make sure datastage )ob will fail if we get from format from

source systems.

Sequential %ile with &uplicate Records5 sequential file has < records with one column$ below are the "alues in the column separated by

space$

= = > ? @ A

In a parallel )ob after reading the sequential file more sequential files should be created$ one

with duplicate records and the other without duplicates.

File = records separated by space: = =

File records separated by space: > ? @ A

6ow will you do it

Sol=:

=. Introduce a sort stage "ery next to sequential file$

. Select a property &key change column' in sort stage and you can assign 8,nique or =,

duplicate or "ice"ersa as you wish.

Page 4: Sequentional File

7/25/2019 Sequentional File

http://slidepdf.com/reader/full/sequentional-file 4/4

>. 3ut a filter or transformer next to it and now you ha"e unique in = link and duplicates in other

link.

Sol:&Should check though'

First of all take a source file then connect it to copy stage. Then$ = link is connected to the

aggregator stage and another link is connected to the lookup stage or )oin stage. In 5ggregator

stage using the count function$ (alculate how many times the "alues are repeating in the key

column.

5fter calculating that it is connected to the filter stage where we filter the cnt2=&cnt is new

column for repeating rows'.Then the oCp from the filter is connected to the lookup stage as reference. In the lookup stage

D++;3 F5ID/72/7E7(T.

Then place two output links for the lookup$ +ne collects the non,repeated "alues and anothercollects the repeated "alues in re)ect link.