Sequentional File
Click here to load reader
-
Upload
sandeep-lakde -
Category
Documents
-
view
212 -
download
0
Transcript of Sequentional File
7/25/2019 Sequentional File
http://slidepdf.com/reader/full/sequentional-file 1/4
Sequential File:
The Sequential File stage is a file stage. It allows you to read data from or write
data to one or more flat files as shown in Below Figure:
The stage executes in parallel mode by default if
reading multiple files but executes sequentially if it is only reading one file.
In order read a sequential file datastage needs to know about the format of the file.
If you are reading a delimited file you need to specify delimiter in the format tab.
Reading Fixed width File:
ouble click on the sequential file stage and go to properties tab.
Source:
File:!i"e the file name including path
Read Method:#hether to specify filenames explicitly or use a file pattern.
Important Options:
First Line is Column Names:If set true$ the first line of a file contains column names on writingand is ignored on reading.
Keep File artitions:Set True to partition the read data set according to the organi%ation of the
input file&s'.
Re!ect Mode: (ontinue to simply discard any re)ected rows* Fail to stop if any row is re)ected*
+utput to send re)ected rows down a re)ect link.
For fixed,width files$ howe"er$ you can configure the stage to beha"e differently:- ou can specify that single files can be read by multiple nodes. This can impro"e performance
on cluster systems.
- ou can specify that a number of readers run on a single node. This means$ for example$ that asingle file can be partitioned as it is read.
These two options are mutually exclusi"e.
Scenario ":
7/25/2019 Sequentional File
http://slidepdf.com/reader/full/sequentional-file 2/4
/eading file sequentially.
Scenario #:
/ead From 0ultiple 1odes 2 es
+nce we add /ead From 0ultiple 1ode 2 es then stage by default executes in 3arallel mode.
If you run the )ob with abo"e configuration it will abort with following fatal error.
sff4SourceFile: The multinode option requires fixed length records.&That means you can use this
option to read fixed width files only'
In order to fix the abo"e issue go the format tab and add additions parameters as shown below.
7/25/2019 Sequentional File
http://slidepdf.com/reader/full/sequentional-file 3/4
1ow )ob finished successfully and please below datastage monitor for performanceimpro"ements compare with reading from single node.
Scenario $:/ead elimted file with By 5dding 1umber of /eaders 3ernode instead of
multinode option to impro"e the read performance and once we add this option sequential file
stage will execute in default parallel mode.
If we are reading from and writing to fixed width file it is always good practice to add
53T4ST/I1!435(65/ atastage 7n" "ariable and assign 898 as default "alue then it will
pad with spaces $otherwise datastage will pad null "alue&atastage efault padding character'.
5lways ;eep /e)ect 0ode 2 Fail to make sure datastage )ob will fail if we get from format from
source systems.
Sequential %ile with &uplicate Records5 sequential file has < records with one column$ below are the "alues in the column separated by
space$
= = > ? @ A
In a parallel )ob after reading the sequential file more sequential files should be created$ one
with duplicate records and the other without duplicates.
File = records separated by space: = =
File records separated by space: > ? @ A
6ow will you do it
Sol=:
=. Introduce a sort stage "ery next to sequential file$
. Select a property &key change column' in sort stage and you can assign 8,nique or =,
duplicate or "ice"ersa as you wish.
7/25/2019 Sequentional File
http://slidepdf.com/reader/full/sequentional-file 4/4
>. 3ut a filter or transformer next to it and now you ha"e unique in = link and duplicates in other
link.
Sol:&Should check though'
First of all take a source file then connect it to copy stage. Then$ = link is connected to the
aggregator stage and another link is connected to the lookup stage or )oin stage. In 5ggregator
stage using the count function$ (alculate how many times the "alues are repeating in the key
column.
5fter calculating that it is connected to the filter stage where we filter the cnt2=&cnt is new
column for repeating rows'.Then the oCp from the filter is connected to the lookup stage as reference. In the lookup stage
D++;3 F5ID/72/7E7(T.
Then place two output links for the lookup$ +ne collects the non,repeated "alues and anothercollects the repeated "alues in re)ect link.