Teradata utilities- Tpump

21
Teradata Utilities: TPump Reprinted for KV Satish Kumar, IBM [email protected] Reprinted with permission as a subscription benefit of Books24x7, http://www.books24x7.com/

description

Teradata utilities- Tpump

Transcript of Teradata utilities- Tpump

Page 1: Teradata utilities- Tpump

Teradata Utilities: TPump

Reprinted for KV Satish Kumar, [email protected]

Reprinted with permission as a subscription benefit of Books24x7,http://www.books24x7.com/

Page 2: Teradata utilities- Tpump

Table of Contents Chapter 5: TPump............................................................................................................................1

Overview................................................................................................................................1Why it is Called "TPump".................................................................................................1

TPump Has Many Unbelievable Abilities...............................................................................1TPump Has Some Limits.................................................................................................2

Supported Input Formats.......................................................................................................3TPump Commands and Parameters.....................................................................................3LOAD Parameters IN COMMON with MultiLoad...................................................................3.BEGIN LOAD Parameters UNIQUE to TPump.....................................................................4TPUMP Example...................................................................................................................5Creating a Flatfile for our Tpump Job to Utilize......................................................................5Creating a Tpump Script........................................................................................................6Executing the Tpump Script...................................................................................................8TPump Script with Error Treatment Options........................................................................12A TPump Script that Uses Two Input Data Files..................................................................13A TPump UPSERT Sample Script.......................................................................................15Monitoring TPump................................................................................................................16Handling Errors in TPump Using the Error Table................................................................16

One Error Table.............................................................................................................16Common Error Codes and What They Mean.......................................................................17RESTARTing TPump...........................................................................................................18TPump and MultiLoad Comparision Chart...........................................................................18

i

Page 3: Teradata utilities- Tpump

Chapter 5: TPump

"Diplomacy is the art of saying "Nice Doggie" until you can find a rock."– Will Rogers

OverviewThe chemistry of relationships is very interesting. Frederick Buechner once stated, "My assumptionis that the story of any one of us is in some measure the story of us all." In this chapter, you will findthat TPump has similarities with the rest of the family of Teradata utilities. But this newer utility hasbeen designed with fewer limitations and many distinguishing abilities that the other load utilities donot have.

Do you remember the first Swiss Army™ knife you ever owned? Aside from its original intent as acompact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is theSwiss Army™ knife of the Teradata load utilities. Just as this knife was designed for small tasks,TPump was developed to handle batch loads with low volumes. And, just as the Swiss Army™ knifeeasily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you havea large, busy system with few resources to spare. Let's look in more detail at the many facets of thisamazing load tool.

Why it is Called "TPump"

TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this,you must know how the load utilities move the data. Both FastLoad and MultiLoad assemblemassive volumes of data rows into 64K blocks and then moves those blocks. Picture in your mindthe way that huge ice blocks used to be floated down long rivers to large cities prior to the advent ofrefrigeration. There they were cut up and distributed to the people. TPump does NOT move data inthe large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locksat this level, and not at the table level like MultiLoad, TPump can make many simultaneous, orconcurrent, updates on a table.

Envision TPump as the water pump on a well; pumping in a very slow, gentle manner resulting in asteady trickle of water that could be pumped into a cup. But strong and steady pumping results in apowerful stream of water that would require a larger container. TPump is a data pump which, likethe water pump, may allow either a trickle- feed of data to flow into the warehouse or a strong andsteady stream. In essence, you may "throttle" the flow of data based upon your system andbusiness user requirements. Remember, TPump is THE PUMP!

TPump Has Many Unbelievable AbilitiesJust in Time: Transactional systems, such those implemented for ATM machines or Point-of-Saleterminals, are known for their tremendous speed in executing transactions. But how soon can youget the information pertaining to that transaction into the data warehouse? Can you afford to waituntil a nightly batch load? If not, then TPump may be the utility that you are looking for! TPumpallows the user to accomplish near real-time updates from source systems into the Teradata datawarehouse.

Throttle-switch Capability: What about the throttle capability that was mentioned above? WithTPump you may stipulate how many updates may occur per minute. This is also called thestatement rate. In fact, you may change the statement rate during the job, "throttling up" the ratewith a higher number, or "throttling down" the number of updates with a lower one. An example:Having this capability, you might want to throttle up the rate during the period from 12:00 noon to1:30 PM when most of the users have gone to lunch. You could then lower the rate when theyreturn and begin running their business queries. This way, you need not have such clearly definedload windows, as the other utilities require. You can have TPump running in the background all the

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 4: Teradata utilities- Tpump

time, and just control its flow rate.

DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE andDELETE. These can be run solo, or in combination with one another. Note that it also supportsUPSERTs like MultiLoad. But here is one place that TPump differs vastly from the other utilities:FastLoad can only load one table and MultiLoad can load five tables. But, when it pulls data from asingle source, TPump can load more than 60 tables at a time! And the number of concurrentinstances in such situations is unlimited. That's right, not 15, but unlimited for Teradata! Well OK,maybe by your computer. I cannot imagine my laptop running 20 TPumps, but Teradata does notcare.

How could you use this ability? Well, imagine partitioning a huge table horizontally into multiplesmaller tables and then performing various DML functions on all of them in parallel. Keep in mindthat TPump places no limit on the number of jobs that may be established. Now, think of ways youmight use this ability in your data warehouse environment. The possibilities are endless.

More benefits: Just when you think you have pulled out all of the options on a Swiss Army™ knife,there always seems to be just one more blade or tool you had not noticed. Similar to the knife,TPump always seems to have another advantage in its list of capabilities. Here are several thatrelate to TPump requirements for target tables. TPump allows both Unique and Non-UniqueSecondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, whichallows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to bepopulated with data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besidesthis, Referential Integrity is allowed and need not be dropped. As to the existence of Triggers,TPump says, "No problem!"

Support Environment compatibility: The Support Environment (SE) works in tandem with TPumpto enable the operator to have even more control in the TPump load environment. The SEcoordinates TPump activities, assists in managing the acquisition of files, and aids in the processingof conditions for loads. The Support Environment aids in the execution of DML and DDL that occurin Teradata, outside of the load utility.

Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locksmay be dropped with no ill consequences. Is this too good to be true? Are there no limits to this loadutility? TPump does not like to steal any thunder from the other load utilities, but it just mightbecome one of the most valuable survival tools for businesses in today's data warehouseenvironment.

TPump Has Some Limits

TPump has rightfully earned its place as a superstar in the family of Teradata load utilities. But thisdoes not mean that it has no limits. It has a few that we will list here for you:

Rule #1: No concatenation of input data files is allowed. TPump is not designed to support this.

Rule #2: TPump will not process aggregates, arithmetic functions or exponentiation. If youneed data conversions or math, you might consider using an INMOD to prepare the data prior toloading it.

Rule #3: The use of the SELECT function is not allowed. You may not use SELECT in your SQLstatements.

Rule #4: No more than four IMPORT commands may be used in a single load task. Thismeans that at most, four files can be directly read in a single run.

Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the yearportion of the date, not the default format of yy. This must be specified when you create thetable. Any dates using the default yy format for the year are taken to mean 20th century years.

Rule #6: On some network attached systems, the maximum file size when using TPump is2GB. This is true for a computer running under a 32-bit operating system.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 2

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 5: Teradata utilities- Tpump

Rule #7: TPump performance will be diminished if Access Logging is used. The reason forthis is that TPump uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, ifyou use Access Logging for successful table updates, then Teradata will make an entry in theAccess Log table for each operation. This can cause the potential for row hash conflicts betweenthe Access Log and the target tables.

Supported Input FormatsTPump, like MultiLoad, supports the following five format options: BINARY, FASTLOAD, TEXT,UNFORMAT and VARTEXT. But TPump is quite finicky when it comes to data format errors. Sucherrors will generally cause TPump to terminate. You have got to be careful! In fact, you may specifyan Error Limit to keep TPump from terminating prematurely when faced with a data format error.You can specify a number (n) of errors that are to be tolerated before TPump will halt. Here is adata format chart for your reference:

BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is thesmallest address space you can have in Teradata.

FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies theend of the record.

TEXT Each record has a variable number of bytes and is followed by an end of the recordmarker.

UNFORMAT The format for these input records is defined in the LAYOUT statement of theMultiLoad script using the components FIELD, FILLER and TABLE.

VARTEXT This is variable length text RECORD format separated by delimiters such as acomma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) orVARBYTE data formats in your MultiLoad LAYOUT. Note that two delimitercharacters in a row denote a null value between them.

Figure 6-1

TPump Commands and ParametersEach command in TPump must begin on a new line, preceded by a dot. It may utilize several lines,but must always end in a semi-colon. Like MultiLoad, TPump makes use of several optionalparameters in the .BEGIN LOAD command. Some are the same ones used by MultiLoad. However,TPump has other parameters. Let's look at each group.

LOAD Parameters IN COMMON with MultiLoad

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 3

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 6: Teradata utilities- Tpump

PARAMETER WHAT IT DOESERRLIMIT errcount[errpercent]

You may specify the maximum number of errors, or the percentage, thatyou will tolerate during the processing of a load job. The key point here isthat you should set the ERRLIMIT to a number greater than the PACKnumber. The reason for this is that sometimes, if the PACK factor is asmaller number than the ERRLIMIT, the job will terminate, telling you thatyou have gone over the ERRLIMIT. When this happens, there will be noentries in the error tables.

CHECKPOINT (n) In TPump, the CHECKPOINT refers to the number of minutes, orfrequency, at which you wish a checkpoint to occur. This is unlikeMulitload which allows either minutes or the number of rows.

SESSIONS (n) This refers to the number of SESSIONS that should be established withTeradata. TPump places no limit on the number of SESSIONS you mayhave. For TPump, the optimal number of sessions is dependent on yourneeds and your host computer (like a laptop).

TENACITY Tells TPump how many hours to try logging on when less than therequested number of sessions is available.

SLEEP Tells TPump how frequently, in minutes, to try establishing additionalsessions on the system.

Figure 6-2

.BEGIN LOAD Parameters UNIQUE to TPump

MACRODB<databasename>

This parameter identifies a database that will contain anymacros utilized by TPump. Remember, TPump does not run theSQL statements by itself. It places them into Macros andexecutes those Macros for efficiency.

NOMONITOR Use this parameter when you wish to keep TPump fromchecking either statement rates or update status information forthe TPump Monitor application.

PACK (n) Use this to state the number of statements TPump will "pack"into a multiple-statement request. Multi-statement requestsimprove efficiency in either a network or channel environmentbecause it uses fewer sends and receives between theapplication and Teradata.

RATE This refers to the Statement Rate. It shows the initial maximumnumber of statements that will be sent per minute. A zero or nonumber at all means that the rate is unlimited. If the StatementRate specified is less than the PACK number, then TPump willsend requests that are smaller than the PACK number.

ROBUST ON/OFF ROBUST defines how TPump will conduct a RESTART.ROBUST ON means that one row is written to the Logtable forevery SQL transaction. The downside of running TPump inROBUST mode is that it incurs additional, and possiblyunneeded, overhead. ON is the default.

If you specify ROBUST OFF, you are telling TPump to utilize"simple" RESTART logic: Just start from the last successfulCHECKPOINT. Be aware that if some statements arereprocessed, such as those processed after the lastCHECKPOINT, then you may end up with extra rows in yourerror tables. Why? Because some of the statements in theoriginal run may have found errors, in which case they wouldhave recorded those errors in an error table.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 4

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 7: Teradata utilities- Tpump

SERIALIZEOFF/ON

You only use the SERIALIZE parameter when you are going tospecify a PRIMARY KEY in the .FIELD command. For example,".FIELD Salaryrate * DECIMAL KEY." If you specify SERIALIZETPump will ensure that all operations on a row will occurserially. If you code "SERIALIZE", but do not specify ON orOFF, the default is ON. Otherwise, the default is OFF unlessdoing an UPSERT.

Figure 6-3

TPUMP Example

"Don't use a big word where a diminutive one will suffice."- Unknown

Don't use a big utility where TPump will suffice. TPump is great when you just want to trickleinformation into a table at all times. Think of it as a water hose filling up a bucket. Instead of fillingthe bucket up a glass of water a time (Fastload), we can just trickle the information in using a hose(TPUMP). The great thing about Tpump is that like a pump we can trickle in data or we can firehose it in. If users are not on the system then we want to crank up the fire hose. If users are on thesystem and many of them are accessing a table we should trickle in the rows.

For our TPUMP exercise, let's create an empty table:

Now execute the script:

Creating a Flatfile for our Tpump Job to Utilize

"In order to be irreplaceable one must always be different."– Coco Chanel

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 5

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 8: Teradata utilities- Tpump

Tpump is irreplaceable because no other utility works like it. Tpump can also use flat files topopulate a table. While the script is somewhat different compared to other utilities, TPUMPsstructure isn't completely foreign.

Let's create our flat file to populate our empty table

Now we can use the flat file to populate our table:

Creating a Tpump Script

"Acting is all about honesty. If you can fake that, you've got it made."– George Burns

George Burns wasn't a big fan of Teradata, because there's no way that one could fake his / herway through a TPUMP script. The following 2 slides will show a basic Tpump script and point outthe important parts to that script:

Let's create our TPUMP script

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 6

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 9: Teradata utilities- Tpump

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 7

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 10: Teradata utilities- Tpump

Executing the Tpump Script

"Unless you believe, you will not understand."– Saint Augustine

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 8

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 11: Teradata utilities- Tpump

After running through these utility exercises, Teradata is destined to make you as the reader abeliever. Utilities such as TPUMP are very hard to grasp at first. But if you believe utilities work andcontinue to analyze them, enlightenment is just around the corner.

Executing our new TPUMP script

Let's check out our new table:

Much of the TPump command structure should look quite familiar to you. It is quite similar toMultiLoad. In this example, the Student_Names table is being loaded with new data from theuniversity's registrar. It will be used as an associative table for linking various tables in the datawarehouse.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 9

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 12: Teradata utilities- Tpump

/* This script inserts rows into a table calledstudent_names from a single file */

Sets Up a Logtable and thenlogs on with .RUN.

.LOGTABLE WORK_DB.LOG_PUMP;

.RUN FILE C:\mydir\logon.txt;

The logon.txt file contains:.logonTDATA/SQL01,SQL01;.

DATABASE SQL01;Also specifies the databaseto find the necessary tables.

.BEGIN LOAD

ERRLIMIT 5

CHECKPOINT 1

SESSIONS 64

TENACITY 2

PACK 40

RATE 1000

ERRORTABLE SQL01.ERR_PUMP;

Begins the Load Process;

Specifies optionalparameters.

Names the error table forthis run.

.LAYOUT FILELAYOUT; .FIELD Student_ID * INTEGER; .FIELD Last_Name * CHAR(20); .FILLER More_Junk * CHAR(20); .FIELD First_Name * CHAR(14);

/* start comment - this could also be coded as: .FIELD Student_ID * INTEGER; .FIELD Last_Name * CHAR(20); .FIELD First_Name 45 CHAR(14);

end of the comment */

Names the LAYOUT of theINPUT record;

Notice the dots before the.FIELD and .FILLERcommands and thesemi-colons after eachFIELD definition. Also, themore_junk field moves thefield pointer to the start ofthe First_name data.

Notice the comment in thescript.

.DML LABEL INSREC;

INSERT INTO SQL01.Student_Names( Student_ID

,Last_Name ,First_Name )VALUES

(:Student_ID ,:Last_Name ,:First_Name );

Names the DML Label

Tells TPump to INSERT arow into the target table anddefines the row format;

Comma separators areplaced in front of thefollowing column or valuefor easier debugging.

Lists, in order, the VALUESto be INSERTed. Colonsprecede VALUEs.

.IMPORT INFILE CDW_import.txtFORMAT TEXTLAYOUT FILELAYOUTAPPLY INSREC;

Names the IMPORT file;

Names the LAYOUT to becalled from above; tellsTPump which DML Label toAPPLY.

.END LOAD;

.LOGOFF;

Tells TPump to stop loadingand logs off all sessions.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 10

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 13: Teradata utilities- Tpump

Figure 6-4Step One: Setting up a Logtable and Logging onto Teradata — First, you define the Logtableusing the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. TheLogtable is automatically created for you. It may be placed in any database by qualifying the tablename with the name of the database by using syntax like this: <databasename>.<tablename>

Next, the connection is made to Teradata. Notice that the commands in TPump, like those inMultiLoad, require a dot in front of the command key word.

Step Two: Begin load process, add parameters, naming the Error Table — Here, the scriptreveals the parameters requested by the user to assist in managing the load for smooth operation. Italso names the one error table, calling it SQL01.ERR_PUMP. Now let's look at each parameter:

ERRLIMIT 5 says that the job should terminate after encountering five errors. You may setthe limit that is tolerable for the load.

CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in incrementsof one minute.

SESSIONS 64 tells TPump to establish 64 sessions with Teradata.•

TENACITY 2 says that if there is any problem establishing sessions, then to keep on tryingfor a period of two hours.

PACK 40 tells TPump to "pack" 40 data rows and load them at one time.•

RATE 1000 means that 1,000 data rows will be sent per minute.•

Step Three: Defining the INPUT flat file structure — TPump, like MultiLoad, needs to know thestructure the INPUT flat file record. You use the .LAYOUT command to name the layout. Followingthat, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLEcommands. Did you notice that an asterisk is placed between the column name and its data type?This means to automatically calculate the next byte in the record. It is used to designate the startinglocation for this data based on the previous field's length. If you are listing fields in order and need toskip a few bytes in the record, you can either use the .FILLER with the correct number of bytes ascharacter to position to the cursor to the next field, or the "*" can be replaced by a number thatequals the lengths of all previous fields added together plus 1 extra byte. When you use thistechnique, the .FILLER is not needed. In our example, this says to begin with Student_ID, continueon to load Last_Name, and finish when First_Name is loaded.

Step Four: Defining the DML activities to occur — At this point, the .DML LABEL names anddefines the SQL that is to execute. It also names the columns receiving data and defines thesequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row intothe SQL01.Student_NAMES. The data values coming in from the record are named in the VALUESwith a colon prior to the name. This provides the PE with information on what substitution is to takeplace in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORTclause.

Step Five: Naming the INPUT file and defining its FORMAT — Using the .IMPORT INFILEcommand, we have identified the INPUT data file as "CDW_import.txt". The file was created usingthe TEXT format.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 11

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 14: Teradata utilities- Tpump

Step Six: Associate the data with the description — Next, we told the IMPORT command to usethe LAYOUT called, "FILELAYOUT."

Step Seven: Telling TPump to start loading — Finally, we told TPump to APPLY the DML LABELcalled INSREC — that is, to INSERT the data rows into the target table.

Step Seven: Finishing loading and logging off of Teradata — The .END LOAD command tellsTPump to finish the load process. Finally, TPump logs off of the Teradata system.

TPump Script with Error Treatment Options

/* !/bin/ksh* */ Load with a Shell Script

/* ++++++++++++++++++++++++++++++++++ *//* TPUMP SCRIPT - CDW *//*This script loads SQL01.Student_Profile4 *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* +++++++++++++++++++++++++++++++++++++ */

Names and describesthe purpose of thescript; names theauthor.

/* Setup the TPUMP Logtables, Logon Statements andDatabase Default */

.LOGTABLE SQL01.LOG_PUMP;

.LOGON CDW/SQL01,SQL01;

DATABASE SQL01;

Sets up a Logtable andthen logs on toTeradata.

Specifies the databasecontaining the table.

/* Begin Load and Define TPUMP Parameters and ErrorTables */.BEGIN LOAD

ERRLIMIT 5CHECKPOINT 1SESSIONS 1TENACITY 2PACK 40RATE 1000ERRORTABLE SQL01.ERR_PUMP;

BEGINS THE LOADPROCESS

SPECIFIES MULTIPLEPARAMETERS TO AIDIN PROCESS CONTROL

NAMES THE ERRRORTABLE; TPump HASONLY ONE ERRORTABLE.

.LAYOUT FILELAYOUT; .FIELD Student_ID * VARCHAR (11); .FIELD Last_Name * VARCHAR (20); .FIELD First_Name * VARCHAR (14); .FIELD Class_Code * VARCHAR (2); .FIELD Grade_Pt * VARCHAR (8);

Names the LAYOUT ofthe INPUT file.

Defines the structure ofthe INPUT file; here, allVariable CHARACTERdata and the file has acomma delimiter. See.IMPORT below for filetype and the declarationof the delimiter.

.DML LABEL INSRECIGNORE DUPLICATE ROWSIGNORE MISSING ROWSIGNORE EXTRA ROWS;

INSERT INTO Student_Profile4( Student_ID

,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES

Names the DML Label;

SPECIFIES 3 ERRORTREATMENT OPTIONSwith the ; after the lastoption.

Tells TPump to INSERTa row into the targettable and defines therow format.

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 12

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 15: Teradata utilities- Tpump

( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt );

Note that we placecomma separators infront of the followingcolumn or value foreasier debugging.

Lists, in order, theVALUES

to be INSERTed. Acolon always precedesvalues.

.IMPORT INFILE Cdw_import.txtFORMAT VARTEXT ","LAYOUT FILELAYOUTAPPLY INSREC;

Names the IMPORT file;Names the LAYOUT tobe called from above;Tells TPump which DMLLabel to APPLY.

Notice the FORMATwith a comma in thequotes to define thedelimiter between fieldsin the input record.

.END LOAD;

.LOGOFF;

Tells TPump to stoploading and Logs Off allsessions.

Figure 6-5

A TPump Script that Uses Two Input Data Files

/* !/bin/ksh* */Load Runs from a ShellScript

/* ++++++++++++++++++++++++++++++++++ *//* TPUMP SCRIPT using 2 Input Files – CDW *//* It loads STUDT_CONTACT Target Table – CDW *//*This script loads SQL01. Student_Profile3 *//* Version 1.1 *//* Created by Coffing Data Warehousing *//* ++++++++++++++++++++++++++++++++++++++++ */

Names and describesthe purpose of thescript; names theauthor.

.LOGTABLE SQL01.LOG_TPMP;

.LOGON CDW/SQL01,SQL01;

DATABASE SQL01;

Sets Up a Logtable andthen logs on toTeradata.

Specifies the databaseto work in (optional).

.BEGIN LOADERRLIMIT 5CHECKPOINT 1SESSIONS 1TENACITY 2PACK 40RATE 1000ERRORTABLE WORK_DB.ERR_TPMP ;

Begins the load process

Specifies multipleparameters to aid inload management

Names the error table;TPump HAS ONLY ONEERROR TABLE PER

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 13

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 16: Teradata utilities- Tpump

TARGET TABLE

.LAYOUT REC_LAYOUT1 INDICATORS;.FIELD Student_ID * INTEGER;.FIELD Last_name * CHAR(20);.FIELD First_name * VARCHAR(14);.FIELD Class_code * CHAR(2);.FIELD Grade_Pt * DECIMAL(8,2);

Defines the LAYOUT forthe 1st INPUT file alsohas the indicators forNULL data.

.LAYOUT REC_LAYOUT2;.FILLER Rec_Type * CHAR(1);.FIELD Last_name * CHAR(20);.FIELD First_name * VARCHAR(14);.FIELD Student_ID * INTEGER;.FIELD Class_code * CHAR(2);.FIELD Grade_Pt * DECIMAL(8,2);

Defines the LAYOUT forthe 2nd INPUT file with adifferent arrangement offields

.DML LABEL INSREC1IGNORE DUPLICATE ROWSIGNORE EXTRA ROWS;

INSERT INTO Student_Profile_OLD( Student_ID

,Last_Name ,First_Name ,Class_Code ,Grade_Pt )VALUES

( :Student_ID,:Last_Name,:First_Name,:Class_Code,: Grade_Pt );

Names the 1st DMLLabel and specifies 2Error Treatmentoptions.

Tells TPump to INSERTa row into the targettable and defines therow format.

Lists, in order, theVALUES to beINSERTed. A colonalways precedesvalues.

.DML LABEL INSREC2IGNORE DUPLICATE ROWS;

INSERT INTO Student_Profile_NEW( Student_ID

,Last_Name ,First_Name ,Class_Code ,Grade_Pt )VALUES

(:Student_ID,:Last_Name,:First_Name,:Class_Code,:Grade_Pt );

Names the 2nd DMLLabel and specifies 1Error Treatmentoptions.

Tells TPump to INSERTa row into the targettable and defines therow format.

Lists, in order, theVALUES to beINSERTed. A colonalways precedesvalues.

.IMPORT INFILE FILE-REC1.DATFORMAT FASTLOADLAYOUT REC_LAYOUT1APPLY INSREC1;

.IMPORT INFILE FILE-REC2.DATFORMAT TEXTLAYOUT REC_LAYOUT2APPLY INSREC2 ;

Names the TWO ImportFiles as FILE-REC1.DATand FILE-REC2.DAT.The file name is underWindows so the "-"isfine.

Names the TWOLayouts that define thestructure of the INPUTDATA files;

Names the TWO INPUTdata files

.END LOAD;

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 14

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 17: Teradata utilities- Tpump

.LOGOFF; Tells TPump to stoploading and logs off allsessions.

Figure 6-7

A TPump UPSERT Sample Script

/* this is an UPSERT TPump script */.LOGTABLE SQL01.CDW_LOG;.LOGON CDW/SQL01,SQL01;

Sets Up a Logtable and then logson to Teradata.

.BEGIN LOADERRLIMIT 5CHECKPOINT 10SESSIONS 10TENACITY 2PACK 10RATE 10ERRORTABLE SQL01.SWA_ET;

Begins the load process

Specifies multiple parameters toaid in load management

Names the error table; TPump HASONLY ONE ERROR TABLE PERTARGET TABLE

.LAYOUT INREC INDICATORS;.FIELD StudentID * INTEGER;.FIELD Last_name * CHAR(20);.FIELD First_name * VARCHAR(14);.FIELD Class_code * CHAR(2);.FIELD Grade_Pt * DECIMAL(8,2);

Defines the LAYOUT for the 1stINPUT file; also has the indicatorsfor NULL data.

.DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS;

UPDATE Student_ProfileSET Last_Name = :Last_Name

,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_PtWHERE Student_ID = :StudentID ;

INSERT INTO Student_ProfileVALUES ( :StudentID

,:Last_Name,:First_Name,:Class_Code,:Grade_Pt );

Names the 1st DML Label andspecifies 2 Error Treatmentoptions.

Tells TPump to INSERT a row intothe target table and defines therow format.

Lists, in order, the VALUES to beINSERTed. A colon alwaysprecedes values.

.IMPORT INFILE UPSERT-FILE.DATFORMAT FASTLOADLAYOUT INRECAPPLY UPSERTER ;

Names the Import File asUPSERT-FILE.DAT. The file nameis under Windows so the "-"is fine.

The file type is FASTLOAD.

.END LOAD;

.LOGOFF;

Tells TPump to stop loading andlogs off all sessions.

Figure 6-8NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However,there might soon be another way to accomplish this task. NCR has built an UPSERT and we havetested the following statement, without success:UPDATE SQL01.Student_Profile SET Last_Name =:Last_Name

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 15

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 18: Teradata utilities- Tpump

,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID;ELSE INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt);

We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handledinternally. For now, use the original coding technique.

Monitoring TPumpTPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check thestatus of TPump jobs as they run and to change (remember "throttle up" and "throttle down?") thestatement rate on the fly. Key to this monitor is the "SysAdmin.TpumpStatusTbl" table in the DataDictionary Directory. If your Database Administrator creates this table, TPump will update it on aminute-by-minute basis when it is running. You may update the table to change the statement ratefor an IMPORT. If you want TPump to run unmonitored, then the table is not needed.

You can start a monitor program under UNIX with the following command:

tpumpmon [-h] [TDPID/] <UserName>,<Password> [,<AccountID>]

Below is a chart that shows the Views and Macros used to access the "SysAdmin.TpumpStatusTbl"table. Queries may be written against the Views. The macros may be executed.

Views and Macros to access thetable SysAdmin.TpumpStatusTbl

View SysAdmin.TPumpStatusView SysAdmin.TPumpStatusX

Macro Sysadmin.TPumpUpdateSelectMacro TPumpMacro.UserUpdateSelect

Figure 6-9

Handling Errors in TPump Using the Error TableOne Error Table

Unlike FastLoad and MultiLoad, TPump uses only ONE Error Table per target table, not two. Ifyou name the table, TPump will create it automatically. Entries are made to these tables whenevererrors occur during the load process. Like MultiLoad, TPump offers the option to either MARK errors(include them in the error table) or IGNORE errors (pay no attention to them whatsoever). Theseoptions are listed in the .DML LABEL sections of the script and apply ONLY to the DML functions inthat LABEL. The general default is to MARK. If you specify nothing, TPump will assume the default.When doing an UPSERT, this default does not apply.

The error table does the following:

Identifies errors•

Provides some detail about the errors•

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 16

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 19: Teradata utilities- Tpump

Stores a portion the actual offending row for debugging•

When compared to the error tables in MultiLoad, the TPump error table is most similar to theMultiLoad Acquisition error table. Like that table, it stores information about errors that take placewhile it is trying to acquire data. It is the errors that occur when the data is being moved, such asdata translation problems that TPump will want to report on. It will also want to report any difficultiescompiling valid Primary Indexes. Remember, TPump has less tolerance for errors than FastLoad orMultiload.

COLUMNS IN THE TPUMP ERROR TABLEImportSeq Sequence number that identifies the IMPORT command where the error occurredDMLSeq Sequence number for the DML statement involved with the errorSMTSeq Sequence number of the DML statement being carried out when the error was

discoveredApplySeq Sequence number that tells which APPLY clause was running when the error occurredSourceSeq The number of the data row in the client file that was being built when the error took

placeDataSeq Identifies the INPUT data source where the error row came fromErrorCode System code that identifies the errorErrorMsg Generic description of the errorErrorField Number of the column in the target table where the error happened; is left blank if the

offending column cannot be identified; This is different from MultiLoad, which suppliesthe column name.

HostData The data row that contains the error, limited to the first 63,728 bytes related to the error

Figure 6-10

Common Error Codes and What They MeanTPump users often encounter three error codes that pertain to

Missing data rows•

Duplicate data rows•

Extra data rows•

Become familiar with these error codes and what they mean. This could save you time getting to theroot of some common errors you could see in your future!

#1: Error 2816: Failed to insert duplicate row into TPump Target Table. Nothing is wrong whenyou see this error. In fact, it can be a very good thing. It means that TPump is notifying you that itdiscovered a DUPLICATE row. This error jumps to life when one of the following options has beenstipulated in the .DML LABEL:

MARK DUPLICATE INSERT ROWS•

MARK DUPLICATE UPDATE ROWS•

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 17

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 20: Teradata utilities- Tpump

Note that the original row will be inserted into the target table, but the duplicate row will not.

#2: Error 2817: Activity count greater than ONE for TPump UPDATE/DELETE.

Sometimes you want to know if there were too may "successes." This is the case when there areEXTRA rows when TPump is attempting an UPDATE or DELETE.

TPump will log an error whenever it sees an activity count greater than zero for any such extra rowsif you have specified either of these options in a .DML LABEL:

MARK EXTRA UPDATE ROWS•

MARK EXTRA DELETE ROW•

At the same time, the associated UPDATE or DELETE will be performed.

#3: Error 2818: Activity count zero for TPump UPDATE or DELETE.

Sometimes, you want to know if a data row that was supposed to be updated or deleted wasn't!That is when you want to know that the activity count was zero, indicating that the UPDATE orDELETE did not occur. To see this error, you must have used one of the following parameters:

MARK MISSING UPDATE ROWS•

MARK MISSING DELETE ROWS•

RESTARTing TPump

Like the other utilities, a TPump script is fully restartable as long as the log table and error tablesare not dropped. As mentioned earlier you have a choice of setting ROBUST either ON (default) orOFF. There is more overhead using ROBUST ON, but it does provide a higher degree of dataintegrity, but lower performance.

TPump and MultiLoad Comparision Chart

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 18

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Page 21: Teradata utilities- Tpump

Function MultiLoad TPumpError Tables must be defined Optional, 2 per target table Optional, 1 per

target tableWork Tables must be defined Optional, 1 per target table NoLogtable must be defined Yes YesAllows Referential Integrity No YesAllows Unique Secondary Indexes No YesAllows Non-Unique SecondaryIndexes

Yes Yes

Allows Triggers No YesLoads a maximum of n number oftables

Five 60

Maximum Concurrent LoadInstances

15 Unlimited

Locks at this level Table Row HashDML Statements Supported INSERT, UPDATE,

DELETE, "UPSERT"INSERT, UPDATE,

DELETE,"UPSERT"

How DML Statements arePerformed

Runs actual DMLcommands

Compiles DML intoMACROS and

executes DDL Statements Supported All AllTransfers data in 64K blocks Yes No, moves data at

row levelRESTARTable Yes YesStores UPI Violation Rows Yes, with MARK option Yes, with MARK

optionAllows use of Aggregated,Arithmetic calculations orConditional Exponentiation

No No

Allows Data Conversion Yes YesPerformance Improvement As data volumes increase By using

multi-statementrequests

Table Access During Load Uses WRITE lock on tablesin Application Phase

Allows simultaneousREAD and WRITEaccess due to Row

Hash LockingEffects of Stopping the Load Consequences No repercussionsResource Consumption Hogs available resources Allows consumption

management viaParameters

Figure 6-11

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition 19

Reprinted for [email protected], IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited