DS314SVR - pudn.comread.pudn.com/downloads155/ebook/688807/Student_Exercises.pdf · DataStage...

DS314SVR

EXERCISES SERVER EDITION

Copyright This document and the software described herein are the property of Ascential Software Corporation and its licensors and contain confidential trade secrets. All rights to this publication are reserved. No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system or translated into any language, in any form or by any means, without prior permission from Ascential Software Corporation. Copyright © 2002 Ascential Software Corporation. All rights Reserved Ascential Software Corporation reserves the right to make changes to this document and the software described herein at any time and without notice. No warranty is expressed or implied other than any contained in the terms and conditions of sale.

Ascential Software Corporation 50 Washington Street

Westboro, MA 01581-1021 USA Phone: (508) 366-3888 Fax: (508) 389-8749

Ardent, Axielle, DataStage, Iterations, MetaBroker, MetaStage, and uniVerse are registered trademarks of Ascential Software Corporation. Pick is a registered trademark of Pick Systems. Ascential Software is not a licensee of Pick Systems. Other trademarks and registered trademarks are the property of the respective trademark holder.

04-08-2002

Exercise 2-1: Install the DataStage Server on Windows NT

Task: Install DataStage server on Windows NT Create a project and name it DS314Svr. You will use this project for your exercises.

Exercise 2-2: Install the DataStage Clients

Task: Install DataStage client applications Install the Developer’s edition on your Windows workstation.

Task: Test the connection to the server. 1 Select one of the DataStage client applications (other than Administrator) in the

Programs folder, e.g., DataStage Manager. The Attach to Project window is displayed.

2 Type the name of the computer on which the DataStage server was installed in the

Host system box. If the Server is on your client machine you can type localhost.

3 Type your user name and password.

4 Click the drop-down menu arrow in the Project box. If the list of installed projects is displayed, then the connection is established.

5 Click Cancel to quit without opening the DataStage client application.

Exercise 3-1 : Using Administrator

Task: Set timeout period 1 Log on to Administrator.

2 On the General tab, set the inactivity timeout period to never time out.

3 Click Apply.

Task: Set project defaults 1 Click the Projects tab.

2 Select your project and then click Properties. The Project Properties General tab is displayed.

3 Click Enable job administration in Director.

4 For your project, specify that job logs are to be purged up to the previous 3 job runs.

5 Click on the Permissions tab. Specify that guests can do development work in DataStage.

6 Click on the Tracing tab. Verify that server-side tracing is disabled.

7 Click on the Schedule tab. Enter your system user name and password. Click Test to verify that your user name can be used to schedule jobs.

• Note: The Windows Scheduler service needs to be running.

8 Click on the Tunables tab. Examine the information.

9 Click OK. Then close down Administrator.

10 Stop and start the DataStage services.

DataStage Essentials Exercise 3-1

Exercise 4-1: Building a Job

Important! • Before working on the exercises, download the Exercise Files

and load them into a directory named c:\DS314SvrFiles. Click the Exercise Files button on the Exercise page, to do this. Note: If you are copying this directory from the CD, be sure to remove the Read Only properties of the directory and its files.

• Recommendation! Prefix the names of jobs and other objects you create in DataStage with your initials. This will allow you to distinguish your jobs from jobs in the Solution set with the same names that you import for examination.

• Click the Exercise Jobs button to download the jobs and other DataStage objects in the Solution Set. You can check your jobs against the jobs in this file. You will learn how to import jobs from this file in Module 5.

Assumptions: • There is a file named Jobs.txt in your DS314SvrFiles directory.

Task: Start Designer and attach to your project 1 Start Designer.

2 On the Attach to Project window log on to your project. The New window is displayed.

3 Click the Server icon and then click OK to open a new Server job.

Task: Create a job that copies data in a sequential file Here is the summary. Detailed steps follow the summary.

In this task you create a simple job that copies a sequential file. Since it is a straight copy you will not use a transformer stage. Here are the details:

• Job name: JobsCopy

• File to copy: Jobs.txt

• File directory: \DS314SvrFiles

• Name of output file: JobsCopy.txt

• Directory to copy to: \DS314SvrFiles


• Overwrite the data in the output file

• Include column names in the first record (line) of the output file

Steps 1 Add the two sequential stages and the link between them.

− Give appropriate names to the stages and link

2 Open the Jobs.txt file in a text editor and determine the following:

− Column delimiter

− Column definitions

− Does first column contain column names?

3 Define the source sequential stage.

− Use the View Data button in the stage to verify that you have defined the data correctly.

4 Define the target sequential stage.

5 When you define the meta data using the columns tab, manually type in values as show in the screenshot below:

6 Add an Annotation stage to your job. Open the Annotation stage.

Document what your job does. Choose a suitable font and color for the text and background.

7 Open the Property Browser window. Select each of your stages and view the information that is displayed in the Property Browser window.

8 Click on your link label. Add appropriate documentation in the Property Browser window as to what is going in the link and what is going out of the link.


9 Save your job.

10 Compile your job.

11 Move to Director from the Designer and then validate and run your job.

− View the log events as your job runs

− Determine how many rows were copied from the Jobs.txt by double-clicking on the events in the log and viewing the information.

12 Go back to Designer and view the data in the output file.

Task: Create a job with a transformer In this task you create a simple job that extracts data from the jobs.txt file and then loads just the job IDs and descriptions into a sequential file. Here are the details:


• Job name: JobNames

• File to copy: Jobs.txt

• File directory: \DS314SvrFiles

• Name of output file: JobNames.txt

• Directory to copy to: \DS314SvrFiles

• Overwrite the data in the output file.

• Exclude the column names record from the first line in the output file.

• Use the pipe (|) for the column delimiter in the output file.

• In the output file, make the first column the job description and the second column the job ID.

Steps 1 More or less the same steps as in the previous task, except that

you need to add a transformer stage and define the movement of data from the input link to the output link.

2 To define the movement of data through the transformer, open the transformer and drag the source columns from the upper left pane to the upper right pane.

3 Validate and run your job. View the data in the output file after the job runs.

Exercise 5-1: Exporting and Importing DataStage Objects

Task: Start DataStage Manager If Designer is open, start Manager from within Designer. Otherwise, start Manager and log onto your project.

Task: Create a jobs category 1 In Manager select the Jobs branch. The jobs you created in

previous exercises should be listed.

2 Click File>New Category. The Create Category window is displayed.

3 Create a new category under Jobs named DS314Svr. Drag your existing jobs to this new category.

Task: Determine whether any table definitions exist Expand the Table Definitions branch and view its contents.

Task: Export job components In this task you export the existing job components in your project. This backs up the jobs you created in the previous exercise.

1 Close Designer and Director before you begin an export. Otherwise, some of the objects you want to export may be “locked”. Don’t forget to save any open jobs.

2 Click Export>DataStage Components to open the Export window.

3 Export all the jobs in the DS314Svr category you created earlier to a file named DS314SvrExport.dsx in the DS314SvrFiles directory. You will need to go to the Options tab and change the Selection to By Category.

− Note: The directory you export the file into is on the client. If the DataStage server is on a different machine, create a directory on your client machine in which to export the file.

Task: View the contents of a DataStage export file 1 Click Import>DataStage Components to open the DataStage

Repository Import window.


2 Specify the file to import, namely, the file you exported in the previous task.

3 Select Import selected and then OK. The DataStage Import window is displayed.

4 Verify that the jobs you exported are listed.

5 Click Cancel twice to close the import windows.

Task: Import a DataStage file In the previous task, you verified that your jobs existed in the export file. So go ahead and delete the jobs you exported and then recover them by importing the DS314SvrExport.dsx file. (Alternatively, open a different project in Manager and import them into this other project.)

1 Select your jobs and then click File>Delete.

2 Start the import process. On the DataStage Repository Import window, select Import all.

3 After you complete the import process, verify that the jobs exist in Manager.

Exercise 5-2: Importing Metadata

Task: Import a table definition from a sequential file In this task, you import the metadata for the jobs.txt file so that you can load the column definitions into the jobs you design.

1 Click Import>Table Definitions>Sequential File Definitions to open the Import Meta Data (Sequential) window and start the import process.

2 In the Directory box, select the \DS314SvrFiles directory, which contains the Jobs.txt file.

3 Select the Jobs.txt file and then click Import.

4 Specify the general format on the Format tab. The first line in the Jobs.txt file contains column names.

5 Click the Define tab and examine the column definitions. When you are done, click OK to perform the import and return to the Import Meta Data window.

6 Click Close to return to Manager.

7 In Manager, locate your new table definition.

8 Select your table definition and click the right mouse button. Then click Properties on the menu to open the Table Definition window.

9 Change the name of your table definition to Jobs.

10 Browse through the tabs of the metadata you imported.

11 Click OK to save your modifications.

Task: Define a job with an intermediate sequential stage In this task you define a job with an intermediate stage in order to experiment with a different job configuration.

Open and save your JobsCopy job under the name JobsCopyNames. Modify the job so that it first makes a copy of the jobs.txt file and then extracts data from the copy and loads it into a third sequential file with the same format as JobNames.txt.

Use new names for the output files, so they don’t get confused with the previous files, that is, JobsCopy2.txt and JobNames2.txt, respectively.

Detailed steps follow.


1 In Manager, select your JobsCopy job. Click the right mouse button and then click Edit. This opens your job in Designer. Save your job as JobsCopyNames.

2 Add the additional transformer and sequential stage.

3 Open up the JobsCopy stage. Notice that it has both an Inputs tab and an Outputs tab.

4 On the Inputs General tab, change the name of the file to JobsCopy2.txt.

5 On the Outputs General tab, specify the same filename, because you want to extract data from the copy after it’s created.

6 On the Format tab, click Load to load the Jobs format metadata you imported in the previous task.

7 On the Columns tab, click Load to load the Jobs column definitions you imported in the previous task.

8 Complete the job design specification and then compile your job.

9 Open up Director and validate your job. You will receive a fatal error: “Failed to open JobsCopy2.txt in directory …” It is important that you understand why.

The validation process checks whether all the source files can be opened. It fails because JobsCopy2.txt hasn’t been created yet. Since it will be created when the first link in the job executes, this is not a problem.

10 Run your job and monitor its progress in the log. How many rows were read?


11 Click Tools>New Monitor to open a monitor on the job. (If the Link name column isn’t visible, click the right mouse and select Show links.)

12 Examine the information provided in the monitor:

− The Stage name column lists the stages that perform some sort of processing.

− The Started at column displays the time the processing of the stage began.

− The Elapsed time column (hidden) shows the amount of processing time.

− The Link name column lists a link through which rows travel.

− The Type column specifies whether the link is a primary input link (<<Pri) or an output link (>Out).

− The Num rows column lists the number of rows going through the link.

13 In Designer, view the data of the JobNames stage without opening the stage. (Hint: Click the right mouse over the stage and select View… data.)

Task: Examine the Manager “Extended Job View” 1 In Manager, click View>Extended Job View. Notice that the

jobs are now listed in the Manager left pane (as well as the right pane).

2 Select your job in the left pane and expand its sub-branches.

3 View the information contained in each of the sub-branches. Double-click on elements in the right pane to view their contents as well.

Exercise 6-1: Extract Relational Data

Task: Set up an ODBC connection (Windows NT only) Set up a System DSN ODBC connection to a Microsoft Access data source named DSN314Svr. Select the database DSN314Svr.mdb in the C:/DS314SvrFiles directory.

Task: Import relational metadata In Manager, import the table definition for the Employees relational table. After you import it, examine its definition in Manager.

Task: Extract data from a relational table Create a new job named ODBCMidLevelEmployees. This job is to extract data from the Employees table and load it into a new file named MidLevelEmployees.txt. Your job should load the file with employees whose job levels are between 120 and 170. The records in the file should be sorted by last name.

Detailed steps follow:

1 Add an ODBC source stage and a sequential target stage and link. You do not need a transformer.

2 Open the ODBC source stage. Load the column definitions for the Employees table from the table definition you imported in the previous task.

3 On the Selection tab, define a WHERE clause that selects the employees with job levels between 120 and 170.


4 In the Other clauses box, add an ORDER BY clause that orders the records by employee last name.

5 Click the View SQL tab and examine the SQL DataStage generates.

6 Click View Data. Verify that your data is ordered by last name and only contains records with the appropriate job levels.

7 Compile, validate, and run your job.

8 Examine the data in your target sequential file to verify that it is correct.

9 Open the DSN314Svr.mdb database and delete the Employees table (you will recreate this table in the next exercise).

Exercise 6-2: Load Relational Data

Task: Load a relational table In this task you create a new job named ODBCEmployees. This job recreates the table named Employees in the DSN314Svr database and loads it with the data in the DSN314Svr Employees.txt file.

1 Create a new job named ODBCEmployees.

2 Add a sequential source stage and an ODBC target stage. You do not need a transformer.

3 Define the source stage to extract data from the DSN314Svr Employees.txt file.

− If necessary, import the table definition from the file and load the format and columns metadata.

− Click View Data to check whether the job will extract the data correctly.

4 Open the target stage. On the Stage General tab:

− Specify the data source name.

− Click Get SQLInfo to retrieve the quote character and schema delimiter from the database.

5 Click the Inputs tab. On the General tab:

− Specify the name of the table to create and load.

− Specify that you want the table created in the target database.


− Specify that you want the table cleared before rows ainserted.

re

6

− he column definitions that are loaded from your cation.

7 On

ws into the target table.

. o run the job multiple times without

− the

9

− every 5 rows.

11

job before you run it to verify that a connection and that the DDL can be prepared.

urce e in the

− of one of your columns is not supported by

13 Ru

− data loaded into the target table.

have MS Access installed, open the database and data.

On the Columns tab:

Examine tsource stage specifi

− Specify emp_id as the table key.

the View SQL tab:

− Examine the SQL used to insert ro

8 On the Edit DDL tab:

− Specify that you want to drop the table before you create itThis will allow you terrors. If you don’t drop the table each time, the second time you will get an error to the effect that “You can’t create a table with the same name as an existing table.”

If the CREATE statement hasn’t been generated, click Click DDL button to generate it.

On the Transaction Handling tab:

Specify that you want to COMMIT

10 Save the job.

Compile the job.

12 Validate your can be established

− If you get a “failed to connect” error, it may be that something went wrong in defining the ODBC data soor you haven’t specified the correct data source namtarget stage.

If you get an “unsupported data type” error, it may be that the data type the target relational database. Try changing the column definitions in your target to commonly supported types. (Ifyou make any changes, don’t forget to regenerate the DDL.)

n your job.

View the

14 Optional: If youin MS Access and examine the table definition

Exercise 7-1: Constraints and Derivations

Task: Create a job with constraints and derivations In this task you create a job that splits the data in the Jobs.txt file into four output files. You will direct the data to the different output files using constraints. You will also define some simple derivations.

Your job should meet the following specifications:

• Job name: JobLevels

• Source file: Jobs.txt

• Target file 1: LowLevelJobs.txt

− min_lvl between 0 and 25 inclusive.

− Same column types and headings as Jobs.txt.

− Include column names in the first line of the output file.

− Job description column should be preceded by the string “Job Title:” and embedded within square brackets. For example, if the job description is “Designer”, the derived value is: “Job Title: [Designer]”.

• Target file 2: MidLevelJobs.txt


− Same format and derivations as Target file 1.

• Target file 3: HighLevelJobs.txt


− Same format and derivations as Target file 1.

• Rejects file: JobRejects.txt

− min_lvl is out of range, i.e., below 0 or above 500.

− This file has only two columns: job_id and reject_desc.

− reject_desc is a variable-length text field, maximum length 100. It should contain a string of the form: “Level out of range: <min_lvl>”, where <min_lvl> is the value in the min_lvl field.

Steps 1 Open Designer and create a new job. Save it with the new name.


2 Add sequential file stages for the source and target files. Add a transformer stage and appropriate links.

3 Give appropriate names to each of the stages and links.

4 Save your job.

5 Define your source stage.

− Load the format and column definitions from Manager.

− View the data to verify that it has been correctly defined.

6 Open up the transformer.

− Define the columns for each of the target links.

− Hint: Use cut-and-paste and drag-and-drop to save time. The only column you need to define manually is reject_desc.

7 Define the constraints.

− Click the Constraints button on the Transformer Stage window to open the Transformer Stage Constraints window.

− Define each constraint.

− The link to the rejects file should be defined as a Reject Row.

− The rejects link must be the last link. If necessary, open the Output Link Execution Order window to specify this.


8 Create a stage variable named JobDescription. Initialize its value to the empty string.

− Define the derivation that is to be output to the job description columns for each of the output links.

− Insert the concatenation operator (:) from the Operators menu or type it in.

9 Define the derivations for the job_desc columns for each of the non-reject links.

− The derivation is the same for all of them. Use the stage variable you defined earlier in your derivations.

10 Define the derivation for the reject_desc column.

− Insert the concatenation operator (:) from the Operators menu or type it in.

− Note: You can concatenate integer values with string values. The proper conversions will be performed.

11 Define the output stages.

− DataStage automatically provides the column definitions.

− Include column names in the output files.

12 Compile your job. Fix any errors.

13 Go to Director and validate your job. It should validate OK.

14 Run and monitor your job in the log. Define a new monitor for the job.

− How many rows were rejected?

− How many rows went down each of the links?

− How many rows were written to each of the output files?

15 Return to Designer and view the data in each of the output files.

− Verify that the files contain the right records.

− Verify that the job_desc columns contain data in the right format.

− Verify that the reject_desc column contains data in the right format.

16 Fix any bugs and recompile and run as necessary.

Exercise 8-1: Creating BASIC Expressions

Assumptions: • There is a file named DS314SvrEmployees.txt in your

DS314SvrFiles directory.

Task: Create the EmployeesInfo.txt file In this task you create a job named EmployeesInfo that extracts a variety of information from the DS314SvrEmployees.txt file and puts it into a sequential file named EmployeesInfo.txt file.

You should also create an EmployeesRejects.txt file that contains processed records containing errors.

The following table describes the columns in the EmployeesInfo.txt file.

EmployeesInfo.txt File

Column Description

RowNum The row number (1, 2,… ) in the output file.

FullName The employee’s full name in the format: “ACCORTI, P. M.”

Emp_Sex Contains one of two values: “MALE”, “FEMALE”

The last non-blank character of the employee’s ID indicates the sex. If the last character is neither lowercase nor uppercase “F” or “M”, send the row to the rejects file.

Emp_Prefix First three non-blank characters of the employee’s ID converted to lowercase.

The following table describes the columns in the EmployeesRejects.txt file.

EmployeesRejects.txt File


Column Description

row_number The number of the row in the input file.

emp_id The emp_id value from the input file.

Steps 1 Create a new job called EmployeesInfo.

2 Add three sequential stages and a transformer. Name the stages and links appropriately.

3 If necessary, import the column definitions and format of the DS314SvrEmployees.txt file into the DataStage Manager Sequential\Source folder.

4 Define the source stage.

5 Open up the transformer and define the columns for the EmployeeInfoOut and RejectOut links.

6 Define the derivation for RowNum.

− Hint: In the derivation cell for RowNum, scroll through the list of system variables.

7 Define the derivation for FullName.

− Use the concatenation operator to string together the appropriate string values from the input columns.

− Use the UPCASE function to convert the last name to uppercase. The syntax is: UPCASE(string)

− Use the substring function to extract the first character from the first name. The syntax is: string[start, length]


− Optional: Derive the correct string when the middle initial is missing. The IF THEN ELSE operator may be useful.

8 Define the derivation for Emp_Sex.

− In this derivation you will need to nest functions. For example UPCASE(TRIM(“ abcd”)[1,2]) = “AB” applies the substring operator to the string after it is trimmed of blanks. UPCASE converts the resulting string to uppercase.

− Use the TRIM function to remove the blanks before extracting the last character from the employee ID.

− Use the LEN function to determine the position of the last character for use in the substring function or, alternatively, use the RIGHT function. Syntax: RIGHT(string, length)

− Keep in mind that the character indicating the sex might be either lower or uppercase.

− Use the IF THEN ELSE operator to conditionally move the appropriate value to the target field.

9 Define the derivation for Emp_Prefix.

10 Define the derivations for the columns in the RejectOut link.

11 Define the constraints.

− Define the constraint for the EmployeesInfoOut link. Define the other link as a reject link.

12 Define the two output stages.

13 Compile and run your job.

14 Verify that the data in the two output files is correct.

Exercise 8-2: Manipulating Dates

Task: Convert the hire date to a new format In this task, you add a column to your EmployeesInfo output file to store the employee’s hire date in the format “MM/DD/YYYY”.

1 Save your EmployeesInfo job as EmployeesInfoHire.

2 Open the transformer and then add a new fixed-length string column to the EmployeeInfoOut link called HireDate to store the hire date.

3 Define the derivation for the new column.

− Examine the source file to determine the existing format of the hire dates.

− Use the Iconv and Oconv functions to convert the date to the new format.


5 Verify that the output is correct.

Task: Define a new data element In this task, you define a new data element called DATE.STANDARD for date strings in the format “YY-MM-DD”. You will use this data element in the next task to define a transform that converts dates in the DATE.STANDARD format to the DATE.TAG format. In the subsequent task, you will use this transform in a derivation to determine the quarter an employee was hired.

1


1 Start Manager.

2 Select the Data Elements branch and then click File>New Data Element.

3 Define the element in category Custom\Dates.

4 Give a short description describing the format.

5 On the SQL Properties tab, select Char with precision 8.

Task: Define a Transform In this task, you define a transform named STANDARD.TO.TAG that converts dates from the DATE.STANDARD to the DATE.TAG format.

1 Select the Transforms branch and then click File>New Transform to open the Transform window.

2 Store the new transform in the category Custom\Dates.

3 On the Details page define one new Visible Argument with the DATE.STANDARD data element.

4 Select a Target data element type of DATE.TAG.

5 In the Definition box, use a combination of the Iconv and Oconv functions to define the conversion.

Task: Determine the quarter the employee was hired In this task, you create a new job named EmployeesInfoHireQuarter that determines the quarter an employee was hired. To do this you add the following new column to the EmployeesInfo.txt file and then define the appropriate derivation.

• The QuarterHired column stores a QUARTER.TAG indicating the quarter during the year the employee was hired.

Steps 1 In Designer, save your EmployeesInfoHire job as

EmployeesInfoHireQuarter.

2 Open up the transformer for your job and add the new column, QuarterHired, to your EmployeeInfoOut link.

− Select QUARTER.TAG as its data element.

3 Define the derivation for the QuarterHired column.

− Hint: The STANDARD.TO.TAG transform converts a DATE.STANDARD to a DATE.TAG, and the


TAG.TO.QUARTER transform converts a DATE.TAG to a QUARTER.TAG.


Exercise 9-1: Troubleshooting Problem

Assumptions: • There is a file named MyCustomer.dsx in your


• There is a file named Customers.txt in your DS314SvrFiles directory.

Task: Define an ODBC data source Define an ODBC data source named DS314Svr for an MS Access database.

1 Open the ODBC Administrator and click the System DSN tab.

2 Click Add.

3 Select the MS Access driver and then click Finish.

4 Specify the data source name.

5 Click the Create button and create the database file in your DS314SvrFiles directory.

Task: Import MyCustomer.dsx Import MyCustomer.dsx, which contains the CustomersToMyCust job, into your project and then open the job in Designer.

Task: Troubleshoot the CustomersToMyCust job In this task you will troubleshoot the CustomersToMyCust job using the Director job log.

1 Compile the job.

2 Validate the job. View the job log. A warning message appears.

3 Determine what the problem is, apply a fix, and then try again. Continue until there are no more warning messages that occur during validation.

4 Clear the job log.

5 Run the job with a stop after 5 warning messages. Warning messages appear.


6 Determine what the problem is, apply a fix, and then try again. Continue until there are no more warning messages.

7 How many rows were read? How many rows were written to the target file? Does this indicate a problem? If so fix it.


Another Troubleshooting Problem

Assumptions: • There is a file named DS314SvrEmployeesCopy.dsx in your


• There is a file named DS314SvrEmployees2.txt in your DS314SvrFiles directory.

Task: Import DS314SvrEmployeesCopy.dsx Import DS314SvrEmployeesCopy.dsx, which contains the EmployeesCopy job, into your project and then open the job in Designer.

Task: Troubleshoot the EmployeesCopy job In this task you will troubleshoot the EmployeesCopy job using the Director job log. This job is supposed to make an exact duplicate of the DS314SvrEmployees2.txt file into the EmployeesCopies.txt file.

1 Compile the job. You should get compile errors. Fix the errors and then recompile.

2 Clear your job log.

3 Validate your job. The job should fail validation. Fix the errors and then revalidate.

4 Run your job. You should get some errors and warnings. Fix the errors and then rerun your job until it runs without errors and warning messages.

5 Open up the DataStage Director Monitor window with the links showing. Is there any information that indicates that the job failed to work as designed? If so, fix the problems and then rerun your job.

Exercise 9-2: Debugging a Job

Assumptions: • There is a file named

CustomerAddressesByCountryDebug.dsx in your DS314SvrFiles directory.

• There is a file named Customers.txt in your DS314SvrFiles directory.

Task: Step through a job In this task you will run the debugger on the CustomerAddressesByCountryDebug job to verify that it is working correctly.

1 Import the CustomerAddressesByCountryDebug.dsx job into your project.

2 Open the job in Designer and compile the job.

3 Set a breakpoint on the CustomersIn link.

4 Open the Debug Window and add a watch variable to the lower pane for the CustomersIn.Country.

5 Step some way through the job using Next Link to verify that rows that are read in are written to the correct links.

Task: Set conditional breakpoints 1 Stop your job.

2 Set breakpoints on the CustomersIn and MexicoCustomersOut links.

3 Open up the Edit Breakpoints window.

4 Set conditions on the two breakpoints so that they break when the customer’s country is Mexico.

5 Verify that each Mexico customer that is written to the CustomersIn link is also written to the MexicoCustomersOut link.

6 Stop your job and clear all breakpoints. Set a new breakpoint and breakpoint condition to verify that no USA customer is rejected.

Exercise 10-1: Defining Lookups

Assumptions: • There is a file named DS314SvrEmployees.txt and a file named

Jobs.txt in your DS314SvrFiles directory.

• There is an ODBC data source named DS314SvrEmployees. This ODBC data source was defined in an earlier module exercise.

Task: Create and load a hashed file In this task you create and load the JobsHashed file. You will use this hashed file in a later task to look up an employee’s title.

1 Create a new job called LoadHashed.

2 Add a sequential source stage and a hashed file target stage with a link between them.

3 Define the source stage to extract data from the Jobs.txt file.

4 Define the target Hashed File stage to load a hashed file named JobsHashed in your project.

− On the Inputs General tab specify that you want write caching enabled.

− Specify that you want the hashed file created by the job.

− Specify that you want the file cleared before writing.

− Use the column definitions that are pushed across the link from the source stage. Specify that job_id is the key.

5 Compile your job.

6 Validate your job. Verify from the job log that write caching is enabled.


7 Run your job.

8 Examine the data in the hashed file to verify that it is correct.


Task: Define a hashed file lookup In this task, you design a job that adds job information to an employee’s record using a hashed file lookup.

1 Create a job called LookupJob. It extracts data from the DS314SvrEmployees.txt file and loads it into a file named EmployeeJobs.txt. This file is to have the following columns, with the specified derived values.

Column Derivation

Emp_full_name Full name of employee in format: “Art F. Walter”

Job_desc Job description from Jobs.txt file

Hire_date In format: YYYY-MM-DD

Standard_Bonus A standard bonus based on the job_lvl: 10*job_lvl

Max_Bonus A bonus based on max_lvl: 10*max_lvl

2 Create a rejects file called EmployeeJobsRejects.txt. It should contain the employee ID, job ID, and row number of any record with an invalid job ID, that is, a job ID that is not found in the JobsHashed file. These records should not be written to the EmployeeJobs.txt file.

3 Open the hashed file lookup stage.

− Specify that you want the hashed file preloaded in memory. You do this on the Outputs General tab. Preloading the hashed file in memory will speed up the lookup. You can


specify the size of the cache on the Tunables tab in Administrator.

4 In the Transformer:

− Define the lookup key expression.

− Define the constraints. Specify the RejectsOut link as a reject link. Specify the EmployeesJobsOut link. Remember records should not be written out this link if the lookup fails. Your constraint should take this into account.

− Specify the derivations described earlier.

5 Compile, run, test, and debug your job. Run the job.

− What rows were rejected?

− Are job descriptions correctly added?


− Is the date formatted correctly in the target file?

Task: Replace your hashed lookup by an ODBC lookup − Create a job named LoadODBC that creates and loads an ODBC

table named JobsODBC in the DS314SvrEmployees data source from the Jobs.txt text file.

− Save your LookupJob job as LookupJobODBC.

− Replace the reference lookup to the hashed file with a lookup to the JobsODBC table.


− Note: To preserve some of your definitions, first add the new stage, move the link, and then delete the original stage.

Exercise 11-1: Aggregating Data

Assumptions: • There is a file named DS314SvrSales_tab.txt in your


Task: Aggregate data In this task you create a job called AggregateTotals that calculates the total sales for each store by year. The source data is from the DS314SvrSales_tab.txt sequential file. The data in this source file is sorted by stor_id, then ord_num, in ascending order.

1 Create a new job called AggregateTotals.

2 Add stages and links as shown.

3 In the Transformer stage:

− Calculate the sales amount (qty * price) for each sale.

− Create a column in the CalcSalesOut link to store the year the sale was made. Name the column year_of_sale. The year is a 4-character string (YEAR.TAG) representing the year. Use a built-in transform to extract the year from the order date.

4 In the Aggregator stage:

− On the Inputs tab verify that columns for the store ID, sales amount, and year of sale have been input.

− On the Inputs tab specify how the input data is sorted.

− On the Outputs tab, add columns for the store ID, year, and total sales amount.

− Specify which column(s) to group by.


− Add derivations for each column.

5 Run your job.

6 Examine the data in the target file to verify that it is correct.

Exercise 12-1: Job Control

Assumptions: • You previously created the LoadHashed and LookupJob jobs

in your project.

Task: Add job parameters In this task you design a job with parameters. The job extracts rows from the Jobs.txt file based on the min_lvl value. The user can specify a range of min_lvl values and can specify the name of the target file.

1 Create a new job called JobDescriptions. It extracts data from the Jobs.txt file, has a transformer, and writes to a sequential target file.

2 Add the following three job parameters to your job, with suitable prompts and default values:

− MinimumJobLevel

− MaximumJobLevel

− TargetFileName

3 In the transformer, add a constraint that writes all job descriptions within the values entered for MinimumJobLevel and MaximumJobLevel.

4 In the target stage, add the parameter TargetFileName as the file name.


Task: Use a Before/After routine In this task, you add a Before/After routine to your job to make a copy of your target file with a new name after it is created.


1 Use the ExecDos built-in Before/After routine along with the DOS command copy c:\DS314SvrFiles\<filename parameter> c:\DS314SvrFiles\Backup.txt to create a backup copy of the target file.


− Verify that the backup copy was created and contains the correct data.

Task: Create a controlling job that writes a message to the log

In this task you create a job that writes a message to the log after running a second job.

1 Create a new job called LogAndRun.

2 Open up the Job Properties window to the Job control tab.

3 Add your JobDescriptions job to the edit box, specifying suitable parameters as prompted.

4 Insert a call to the DSLogInfo function after your job is run to display a message in the log (e.g., “Running job JobDescriptions.”. The syntax is this, where JobName is your controlling job name, LogAndRun:

CALL DSLogInfo(“message to display”, “JobName”)


6 Verify that JobDescriptions has run and that your message is displayed in the job log.

Task: Create a job that runs two other jobs In this task you create a job that first runs your LoadHashed job and then runs LookupJob.

1 Open up your LoadHashed job and save it as LoadHashedParam. Add a job parameter named HashFileName for passing the name of the hashed file for loading. In your target stage, use this parameter in place of the hashed file name.

− Compile and run your job.

− Verify that you can view the data using the new parameter.

2 Open up your LookupJob job and save it as LookupJobParam. Add the job parameter named


HashFileName. In your Hashed File lookup stage, use this parameter in place of the hashed file name.

− Compile and run your job.

− Verify that you can view the data using the new parameter.

3 Create a new job named LoadAndLookupJobParam.

4 Open up the Job Properties window to the Job control tab.

5 Add the LoadHashedParam job and then the LookupJobParam job.

6 Run your new batch job. Monitor its progress in the job log. Verify that it is working correctly.

Task: Use a parameter in your controlling job 1 Add a parameter named cjHashedFileName to your

controlling job.

2 On the Job Control tab, add the cjHashedFileName parameter in the DSSetParam functions in place of the values.

3 Run your controlling job. Test it with a new hashed file.

− View the data in the new hashed file to verify that it has been properly created and loaded.


Task: Use the DataStage Job Sequencer In this task, you will accomplish the same thing as the previous task using the Job Sequencer.

1 In Designer, create a new Job Sequence named LoadAndLookupJobParamJC.

2 Click View>Repository.

3 Drag the LoadHashedParam and the LookupJobParam jobs from the Repository to the canvas.

− Give appropriate names to the Job Activity stages that are added to the canvas.

4 Add a link from the LoadHashedParam Job Activity stage to the LookupJobParam Job Activity stage.

5 Add a Description Annotation stage to your job. In it describe what the job does.

− Open the Job Properties General tab and verify that the description shows up there in the Short Job Description box.

6 Add a job parameter named cjHashedFileName to your job sequence. The procedure is the same as adding a parameter to a job.

7 Open up Job Activity stages and insert your Job Sequence parameter in the Value Expression boxes next to each of the controlled jobs job parameters.


8 In the LoadHashedParam Job Activity stage, click the Triggers tab. Specify that the link is to be triggered only if


the LoadHashedParam runs OK.

9 Compile and run your Job Sequence.

− Examine the job log for the Job Sequence and also for each of the controlled jobs.

− Examine the data to verify that the job worked correctly.

Exercise 12-2: Using Containers

Task: Create a Shared Container In this task you create a shared container that validates sales.

1 Import the GetValidSalesJC job into your project from the DS314SvrFiles directory. It is in a file named GetValidSalesJC.dsx.

2 Open the job in Designer.

3 Open up the transformer and examine the constraint. Notice that the constraint implements some simple range checking and record validation. Records that satisfy the constraint are written to the target file. Other records go down the reject link into the rejects file.

4 Compile and run the job. Verify that records with invalid record IDs are written to the rejects file.

5 Select both the transformer and rejects stages. Click Edit>Construct Container>Shared. Create a new shared container named ProcessSalesSC.


6 Save your job as GetValidSalesSC.

7 Compile and run your job, which now has the shared container in it.

8 In Designer, select your job container and click the right mouse button. Click Open.

9 View the data in the rejects file to verify that it contains the invalid records.

10 Click Window then select your job to close the shared container and return to your job.

11 Click View>Repository. Verify that your new shared container is in the Shared Containers branch.


Task: Use a Shared Container in a job In this task, you read records from the Sales.txt file, process them through the shared container, and then write them to a file named ValidSales.txt.

1 Create a new job named GetSales.

2 Drag the Shared Container you created in the previous task from the Repository to the canvas.

3 Connect a Sequential stage to the input side of the Container.

4 Connect a Sequential stage to the output side of the Container. Provide appropriate names to the links and stages.

5 Edit the Sequential source stage so that it extracts data from the Sales.txt file.

6 Edit the Sequential target stage so that it loads data into a file named ValidSales.txt, which has the same format as the Sales.txt file.

7 Double-click on the Container.

− On the Inputs tab, map the SalesIn link to the container. Validate the mapping.

− On the Outputs tab, map the SalesOut link to the container. Validate the mapping.


− Verify that no rows with store IDs less than or equal to 7000 are written to the target stage.


− Verify that rows with store IDs less than or equal to 7000 are written to the rejects file you specified.

Exercise 13-1: Working with Plug-Ins

Assumptions: • You have a copy of the DataStage installation CD.

Task: Install the Sort plug-in • Note: If the Sort plug-in is already installed, choose another plug-

in to install.

1 Close down any DataStage clients.

2 Place the DataStage Installation CD in a drive.

3 Start the DataStage Server installation and let DataStage shut down the Universe services. Setup will then search for installed components, search existing projects, and retrieve license details.

4 At the Welcome dialog click the Add Plugins button.

5 Select the Sort Plug-in and then click Next.

6 At the Setup Complete dialog click Finish. This will restart the Universe Services. You do not need to review the Release Notes.

7 Exit the master setup window.

8 Start Manager and verify that the sort plug-in is installed.

Task: Sort records using the Sort plug-in 1 Create a job called SortEmployees with a sequential source and a

sequential target. The data to sort is in the DS314SvrEmployees.txt file.

2 Add the Sort stage to your diagram.

3 Add links between the stages and name the links and stages appropriately.


4 Define two job parameters called FieldName (default emp_id) and SortType (default ascending). The former is used to specify the field to sort by. The latter is to specify whether to sort in ascending or descending order.

5 Open up the sort stage and define the input and output link columns.

6 Click on the Stage Properties tab. Insert the job parameters to specify the field and sort order.

7 Compile, validate, and run your job. Select different fields to sort by and different sort orders. Verify that the output is correct.

Exercise 14-1: Scheduling and Reporting

Assumptions: • The Windows NT Schedule service has been started.

Task: Schedule a job 1 In Director schedule your SortEmployees job to run in 3 minutes

and in 6 minutes. Have it sort the employees by emp_id in ascending order the first time and descending order the second.

2 Schedule your SortEmployees job to run every Sunday at 2:30PM.

3 Click the Schedule button in the Director toolbar to view the schedule.

Task: Generate a job report 1 In Manager, click Tools>Reporting Assistant.

2 Select the Selection button and Jobs and then select one or more of your jobs.

3 Verify that DSReporting is selected in the Target Schema DSN in the Update Options tab.

4 Click Update Now.

5 Click Doc. Tool to open the Reporting Assistant database.

6 Click the Jobs tab and select your job.

7 Click File>Print Preview. Optionally, print the report.

Exercise 15-1: Creating Row Buffered Jobs

Assumptions: • There is a table named Orders in your DSN314Svr.mdb

Access database.

Task: Create and test the row buffered job In this task you create a job named SalesRowBuffered that extracts data from the Orders table of the DSN314Svr Access database and writes the data to a sequential file named Orders.txt file.

You should run the job twice, once with row buffering enabled and once with buffering disabled. Record statistics from each job execution.

Steps 1 Log onto Administrator, click the projects tab and select the

DS314Svr project and click the properties tab. Click the tunables tab and verify that the enable row buffering checkbox is unchecked (see diagram below).


2 Using DataStage Manager, import the Orders table meta data from the DSN314Svr ODBC data source into the ODBC\DSN314Svr category. The Orders table contains over 55,000 records and will provide enough processing time to give meaningful statistics.

3 Create a new server job called SalesRowBuffered.

4 Add one ODBC stage and a sequential stage. Name the stages and links appropriately.

5 Define the source stage. Go to the columns tab, load the meta data and select all columns.

6 Open up the sequential stage. The output directory should be C:\DS314SvrFiles and the file name Orders.txt.

7 Compile the job.

8 Log onto DataStage director and execute the job.

9 While the job is running click tools > new monitor.

10 Right click on the stage name and verify that show links is enabled.

11 After the job finishes, scroll to the right and note the rows/sec value. Record this in the column labelled “disabled” in the table provided below.

SalesRowBuffered Job Execution Statistics

Row Buffering Rows/Sec

Disabled (1st run)

Enabled (2nd run)

12 Edit the job again in Designer, click the job properties icon, then the performance tab. Enable in process row buffering.


13 Click OK and recompile the job.

14 Test your job with Director, use the job monitor and record the new value for rows/sec.

15 Compare your results with other students.

NOTE: The job in this exercise consists of one process. Therefore, you do NOT use interprocess row buffering since there are not multiple processes between which your job needs to communicate. However, using this option will not harm the job performance in this instance.

Typical results: Without row buffering: 9,230 rows/sec

With row buffering: 11,075 rows/sec

Test was run on a Dell Latitude single processor, 800MHz with 500M memory and row buffering set to default values.

Your results may differ.

Exercise 15-2: Using the IPC Stage


Access database.

Task: Create IPC job In this task you create a job named SalesIPC that extracts data from the Orders table of the DSN314Svr Access database and puts it into a sequential file named Orders.txt file. However, you will introduce the IPC stage into this job, which will force multi-processing.

Record statistics from the job execution and compare the results to those obtained in exercise 15-1.

Steps 1 Using Director, execute job OrdersRowBuffered.

2 Invoke a new monitor (tools > new monitor).

3 Double-click on the first stage under stagename. Note the value of user. This value identifies the NT process id.

Note that this job has only one process.

4 Edit job SalesRowBuffered and save as SalesIPC.


5 Add one IPC stage between the ODBC and sequential stage. Name the stages and links appropriately. You may need to rebuild the meta data on the links.

6 Compile the job.

7 Log onto DataStage director and execute the job.

8 Invoke a new monitor. You should see two stagename entries.

9 Double-click on the first stage name and note the value for user. Do the same for the second stagename. Are the user values the same? This indicates you are running two separate NT processes.

10 After the job finishes, scroll to the right and note the rows/sec value. Record this in the column labelled “disabled” in the table provided below.

SalesRowBuffered Job Execution Runs

IPC stage Rows/Sec

Without IPC stage

With IPC stage

11 The IPC stage adds overhead to a DataStage job. On a multi-processor machine this overhead can eliminate the performance gain achieved by simultaneously using multi cpus. Does the IPC stage improve performance on a single processor machine? Probably not. If you have the opportunity to run this same test on a multi-processor machine, you will see a performance increase.


Typical results:

Note that you have two links and get two performance figures.

IPC without row buffering: 9,230/9230 rows/sec

IPC with row buffering and in process: 9230/11,075 rows/sec

IPC with row buffering and inter-process: 9230/11,075 rows/sec



Conclusion: If running on a single processor machine, enable row buffering and do not bother using IPC stage. However, these results will definitely differ if you are executing on a mutli-processor machine.

Exercise 15-3: Partitioning Data


Access database.

Task: Create and test the data partitioned job In this task you create a job named SalesPartitioned that extracts data from the Orders table of the DSN314Svr Access database and puts it into a sequential file named Orders.txt file.

The job will create four partitions that will run concurrently.

Steps 1 Create a new server job called SalesPartitioned.

2 Add one ODBC stage and a sequential stage. Add four transformers, one partitioner and one collector. Add links between the partitioner and the tranformers; add links between the transformers and the collector stage. Name all stages and links appropriately.


3 Define the properties for the ODBC Sales source stage. Go to the columns tab, load the Orders meta data and select all columns.

4 Open up the sequential stage. The output directory should be C:\DS314SvrFiles and the file name Orders.txt. Go to the columns tab and load the Orders meta data.

5 Open the paritioner stage, click the properties tab and verify that round-robin is the selected algorithm.

6 For each transformer, right-click on the stage and use the Auto-Match columns option to push columns from input to output links.

7 Open the collector stage stage, click the properties tab and verify that round-robin is the selected collection algorithm.

8 Select job properties and click the performance tab. Enable row buffering and inter process. Note the default value for time-out is 10 sec.

9 Compile your job and test in Director.

10 If your job fails (and it probably will) with a “timeout waiting for mutex” error, go to Designer job properties > tuning and double the value for the timeout. Recompile the job and test again.

11 Use the job monitor to display rows/second and record on the table below.

SalesPartitioned Job Execution Run

DataStage Link Rows/Sec

1

2

3

4


Typical results:

Note that you have four transformers with four input links and and four output links. You also have one output link from the ODBC stage and one into the sequential stage. The performance statistics will create a busy screen.

You are required to enable row buffering with inter-process because you have multiple processes running that must exchange data.

ODBC output link: 2417 rows/sec

Data partitioner output: 629 rows/sec

Data collector input: 629 rows/sec

Data collector output: 2637 rows/sec



Conclusion: If running on a single processor machine, enable row buffering and do not use data partitioning. However, these results will definitely differ – you will get much better performance - if you are executing on a mutli-processor machine. These results indicate that if your development environment differs significantly from your production environment, you should complete tuning work by testing some jobs in the production environment.

Exercise Final: Putting it all together

Assumptions: • Successful completion of the DS314Svr modules, including all

exercises.

Overview: In this exercise you will create a series of jobs the will extract data from a set of database tables, transform and summarize the data, then load the results to a target database table. The final application will run several jobs in parallel then run two more jobs serially. A master control job will manage the job execution network.

You will then apply job optimization techniques to maximize job performance.

Review previous modules in this course as needed to perform the detailed tasks.

Business Problem: Our company uses a datamart (set of database tables) for product sales analysis. This datamart is used by business analysts to form a variety of adhoc queries but is unsuitable for end users seeking simple bottom-line reports. These end users requested a simple MS Access database table that provides summary sales information.

Solution: We have decided to build a pilot project to show end users that we can extract and reformat data to suit their needs and deliver this data on a regular schedule. To begin the project they would like two summary tables: one that gives them total sales by product and another table that gives them total sales by promotion.

We have decided to build an ETL system using DataStage to extract data from the datamart, flatten the data structures, summarize the data, and load it into database tables that can be easily displayed by our end users. The system will be comprised of the following components:

• Extraction jobs that unload the dimension tables from the datamart and load these to target hashed files.

• An extraction job that unloads sales data.

• A lookup job that flattens (denormalizes) the sales data.

DataStage Essentials Exercise Final

• A summarize-and-load job that builds summary fields on the appropriate fields and loads the result to a target database.

• Batch control jobs that manage the correct sequence of job executions.

Task: Create jobs for extract, transform, and load of target reporting file.

Steps 1 In a class discussion, review the overall structure of the

application.

2 Review the layouts for all database tables in the DS314SvrFinal.mdb Access database.

3 Using DataStage Administrator create a new project named DS314SvrFinal.

4 Build an ODBC connection to DS314SvrFinal.mdb.

5 Import meta data from the Products, Stores, Timex, Promotions, and SalesFact tables.

6 Create four jobs that extract data from each of the dimension tables (Products, Stores, Timex, and Promotions) and load these into corresponding hashed files. Use the appropriate -key field as the hashed file key. Use a parameter named DSN314SvrSource to point to the DSN314SvrFinal.mdb database.

7 Create one job to extract data from the SalesFact table, performs a lookup to the Products, Stores, Timex, and Promotions hashed tables and loads the results to a sequential file. Use a parameter named DSN314SvrSource to point to the DSN314SvrFinal.mdb database. Enter constraints to reject records that fail any of the hashed file lookups. The final sequential file should contain the following fields: SalesFact (dollar_sales, unit_sales, dollar_cost, customer_count) Product (full_description, SKU_number, brand) Promotion (promotion_name, ad_type, promo_cost) Store (name, city, store_state), Timex (datex, day_of_week, fiscal_period).

8 Create one job that reads the sequential file created in the preceding step, sums the data on dolloar_sales and groups by product name, then loads the result to a database table (you will create the table) named SumProduct in the DS314SvrFinal.mdb database. Use a parameter named


DSN314SvrTarget to point to the DSN314SvrFinal.mdb database.

9 Create one job that reads the same sequential file, sums the data on dollar_cost and groups by promo_name, then loads the results to a database table named SumPromo in the DS314SvrFinal.mdb database. Use a parameter named DSN314SvrTarget to point to the DSN314SvrFinal.mdb database.

10 Create a controlling job using the job sequencer that will run the hashed file load jobs in parallel.

11 Create a controlling job using the job sequencer that will run the sum table load jobs in parallel.

12 Create a master controlling job that will run the entire network (hashed file load jobs, lookup job, and sum table loads) serially.

13 After you successfully test your job network, schedule the Batch:master to run five minutes from the current time.

Bonus Problem The above solution used the field dollar_sales from the SalesFact table. This field represents gross revenue as the primary metric. If the end users want to instead view gross_profit (new field calculated as dollar_sales minus dollar_cost) what changes would you make to this application?

Modify your jobs as appropriate.

DS314SVR - pudn.comread.pudn.com/downloads155/ebook/688807/Student_Exercises.pdf · DataStage...

Documents

Transcript of DS314SVR - pudn.comread.pudn.com/downloads155/ebook/688807/Student_Exercises.pdf · DataStage...