Teradata Basic

download Teradata Basic

of 17

Transcript of Teradata Basic

What is a Relational Data Model?A Relational Data Model is a defined number of tables, made up of columns and rows, which represent a situation. Heres an example:Teradata stores its information inside Tables. A table consists of rows and columns. A row is one instance of all columns. According to relational concepts column positions are arbitrary and a column always contains like data. Teradata does not care what order you define the columns and Teradata does not care about the order of rows in a table. Rows are arbitrary also, but once a row format is established then Teradata will use that format because a Teradata table can have only one row format. There are many benefits of not requiring rows to be stored in order. Unordered data does not have to be maintained to preserve the order. Unordered data is independent of the query.

Primary Keys are Different than Primary IndexesThe Primary Key of a table is the column or group of columns whose values will identify uniquely each row of that table.

Every table has to have a primary key and only oneTables are very flexible when it comes to defining how a tables data can be laid out. However, every table must have a primary key. Each row within that table must always be uniquely identifiable. If the table happens to have several possible combinations that could work as a primary key, only one can be chosen. You cannot have more than one primary key on a table. The smallest group of columns, often just one, is usually the best.

Foreign KeysA foreign key is a normal column in one table that happens to be a primary key in another table. Foreign keys help to relate tables together. This is where the term relational database comes from.

Primary Key Foreign Key QuizBelow you see the Department Table and the Employee Table. They have a relation. How many Primary Key Foreign Key relationships do these two tables have together? Remember that a Foreign Key is a normal column in one table that is the Primary Key of another table (Hint Hint)

Primary Key Foreign Key Quiz AnswersThere are two Primary Key Foreign Key relationships between the tables below. The first relationship is the Primary Key of the Department Table which is Dept_No. Dept_No is a normal column in the Employee_Table. Notice that they have the same exact names. The second Primary Key Foreign Key relationship is the Employee_No of the Employee Table and the Mgr_No of the Department Table. Notice that they have different names. They are though said to be part of the same domain. That means that both columns have the same data type, the same range of values and represent the same thing. Both represent Employee_Nos. In the Employee Table all Employee_Nos are listed. In the Department Table only the Employee_Nos for Managers are listed.

The Primary Index

"Alone we can do so little; together we can do so much."

Helen KellerHelen Keller may have been blind, but she saw so much more then the rest of us. Can you imagine living in a world of such darkness, yet becoming such a shining light? Helen Keller was the ultimate leader and she helped millions realize that they should continue to always learn, and that the journey of life is the ultimate destination.

Teradata uses the Primary Index of each table to provide a row its destination to the proper AMP. This is why each table in Teradata is required to have a Primary Index. The biggest key to a great Teradata Database Design begins with choosing the correct Primary Index. The Primary Index will determine on which AMP a row will reside. Because this concept is extremely important, let me state again that the Primary Index value for a row is the only thing that will determine on which AMP a row will reside. Many people new to Teradata assume that the most important concept concerning the Primary Index is data distribution. INCORRECT! The Primary Index does determine data distribution, but even more importantly, the Primary Index provides the fastest physical path to retrieving data. The Primary Index also plays an incredibly important role in how joins are performed. Remember these three important concepts of the Primary Index and you are well on your way to a great Physical Database Design.

The Primary Index plays 3 roles: Data Distribution Fastest Way to Retrieve Data Incredibly important for JoinsWhat needs to be known prior to selecting the Primary Index to ensure excellent distribution? Columns that define the index. If they are unique or nearly unique then Teradata will spread the data evenly.

Two Types of Primary Indexes (UPI or NUPI)"A man who chases two rabbits catches none."

Roman ProverbEvery table must have at least one column as the Primary Index. The Primary Index is defined when the table is created. There are only two types of Primary Indexes, which are a Unique Primary Index (UPI) or a Non-Unique Primary Index (NUPI). "A man who chases two rabbits misses both by a HARE! A person who chases two Primary Indexes misses both by an ERR!"

Tera-Tom ProverbEvery table must have one and only one Primary Index. Because Teradata distributes the data based on the Primary Index columns value it is quite obvious that you must have a primary index and that there can be only one primary index per table.

The Primary index is the Physical Mechanism used to retrieve and distribute data. The primary index is limited to the number of columns in the primary index. This means that the primary index is comprised totally of all the columns in the primary index. You can have up to 16 multi-column keys comprising your primary index or as little as one column as your primary index.. Most databases use the Primary Key as the physical mechanism. Teradata uses the Primary Index. There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths.

A Table can only have one primary index, but that Primary Index can consist of a single column or a combination of columns. With V2R5 and V2R6 up to 64 columns.Unique Primary Index (UPI)

"Always remember that you are unique just like everyone else."

AnonymousA Unique Primary Index (UPI) is unique and cant have any duplicates. It is as unique as you are. Nobody is like you and you are extremely beautiful and amazing. Not one other person in the history of mankind has ever been exactly like you. You are the creation of your beautiful parents and must realize how important you are to the world. A Unique Primary Index is not as amazing as you are, but it is also special.

A Unique Primary Index means that the values for the selected column must be unique. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. A Unique Primary Index will always spread the table rows evenly amongst the AMPs. Please dont assume this is always the best thing to do. Below is a table that has a Unique Primary Index. We have selected EMP to be our Primary Index. Because we have designated EMP to be a Unique Primary Index, there can be no duplicate employee numbers in the table.

Employee TableEMP DEPT LNAME FNAME SAL

UPI1 2 3 4 40 20 20 ? BROWN JONES NGUYEN BROWN CHRIS JEFF XING SHERRY 95000.00 70000.00 55000.00 34000.00

A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. It also requires no duplicate row checking.

Non-Unique Primary Index (NUPI) "You miss 100 percent of the shots you never take."

Wayne GretzkyTake a shot at using a Non-Unique Primary Index in your Teradata tables. A Non-Unique Primary Index (NUPI) means that the values for the selected column can be non-unique. You can have many rows with the same value in the Primary Index. A Non-Unique Primary Index will almost never spread the table rows evenly. Please dont assume this is always a bad thing. Below is a table that has a Non-Unique Primary Index. We have selected LNAME to be our Primary Index. Because we have designated LNAME to be a Non-Unique Primary Index we are anticipating that there will be individuals in the table with the same last name.

EMP

DEPT

LNAME

FNAME

SAL

NUPI1 2 3 4 40 20 20 ? BROWN JONES NGUYEN BROWN CHRIS JEFF XING SHERRY 95000.00 70000.00 55000.00 34000.00

A Non-Unique Primary Index (UPI) will almost NEVER spread the rows of the table evenly amongst the AMPs.A Non-Unique Primary Index (NUPI) will contain like data. There can be more than one row with the same Primary Index value because it is non-unique. An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins.

Primary Index Explained in Simple TermsAll Teradata tables must have one and only one Primary Index. The Primary Index will be used to distribute a tables rows to the proper AMP. The Primary Index is also utilized when retrieving the data.What needs to be known prior to selecting the Primary Index to ensure excellent distribution? Columns that define the index. If they are unique or nearly unique then Teradata will spread the data evenly.

Primary Index (PI) Data Distribution in Theory"Acting is all about honesty. If you can fake that, youve got it made"

- George Burns To store the data, the value(s) in the PI are hashed though a calculation to determine which AMP will possess the row. The same data values always hash the same row hash and therefore are always associated with the same AMP. The PI is what makes or breaks the system. The PI is responsible for all of the systems data distribution. Our example below is designed to only show in theory how Teradata places a row on an AMP. We are going to divide the Primary Index value by two. The output is called the Row-Hash. We will take our Row-Hash answer and it will point to a bucket in the Hash Map. That bucket will tell Teradata which AMP will hold the row.

Primary Index with Row HashHere we have divided the Primary Index by two and taken each Row-Hash answer and by utilizing the Hash Map we have placed all the rows on the proper AMP.

Primary Index with Row HashHere we have divided the Primary Index by two and taken each Row-Hash answer and by utilizing the Hash Map we have placed all the rows on the proper AMP.

Primary Index adding a Uniqueness ValueOnce the correct destination AMP has been decided on for a row both the row and the Row-Hash value are placed on the AMP. The AMP will look at a Row-Hash and place a Uniqueness value also. The Row-Hash and the Uniqueness value make up the Row-ID. If a row is delivered to an AMP and the AMP notices it has already received a row for that table with the same Row-Hash it will put a Uniqueness value of two. If another row comes in with the same Row-Hash it will receive a Uniqueness value of three and so on.

Primary Index QuizBelow you see a table with four employees. Notice that the Primary Index is Employee_No. Your job is to place the rows on the proper AMP and include the Row-ID. We have done the first one for you. Remember to divide the Primary Index value by two, then take the answer and point to the Hash Map. The row will go to the AMP listed in the bucket of the Hash Map. Then add a Uniqueness value to the Row-Hash to complete the Row-ID.

Turning the Primary Index Value into a Row Hash The Primary Index is the only thing that determines where a row will reside. It is important that you understand this process. Here are the fundamentals in the simplest form. When a new row arrives into Teradata, the following steps occur: Teradatas PE examines the Primary Index value of the row. Teradata takes that Primary Index value and runs it through a Hashing Algorithm. The output of the Hashing Algorithm (i.e., a formula) is a 32-bit Row Hash. The 32-bit Row Hash will perform two functions: 1. The 32-bit Row Hash will point to a certain spot on the Hash Map, which will indicate which AMP will hold the row. 2. The 32-bit Row Hash will always remain with the Row as part of a Row Identifier (Row ID). Hashing is a mathematical process where an Index (UPI, NUPI) is converted into a 32-bit row hash value. The key to this hashing algorithm is the Primary Index. When this value is determined, the output of this 32bit value is called the Row Hash.

The Row is Delivered to the Proper AMP Now that we know that Employee 99 is to be delivered to AMP 4, Teradata packs up the row, places the Row Hash on the front of the row, and delivers it to AMP 4.

The entire row for employee 99 is delivered to the proper AMP accompanied by the Row Hash, which will always remain with the row as part of the Row ID.Review:

A row is to be inserted into a Teradata table The Primary Index Value for the Row is put into the Hash Algorithm The output is a 32-bit Row Hash The Row Hash points to a bucket in the Hash Map The bucket points to a specific AMP The row along with the Row Hash are delivered to that AMP

The Row Hash determines the Rows Destination The first 16 bits of the Row Hash (a.k.a., Destination Selection Word) are used to locate an entry in the Hash Map. This entry is called a Hash Map Bucket. The only thing that resides inside a Hash Map Bucket is the AMP number where the row will reside.

The first 16 bits of the Row Hash of 00001111000011110000111100001111 are used to locate a bucket in the Hash Map. A bucket will contain an AMP number. We now know that employee 99 whose row hash is 00001111000011110000111100001111 will reside on AMP 4. Note: The AMP uses the entire 32 bits in storing and accessing the row. If we took employee 99 and ran it through the hashing algorithm again and again, we would always get a row hash of 00001111000011110000111100001111. If we take the row hash of 00001111000011110000111100001111 again and again, it would always point to the same bucket in the hash map. The above statement is true about the Teradata Hashing Algorithm. Every time employee 99 is run through the hashing algorithm, it returns the same Row Hash. This Row Hash will point to the same Hash Bucket every time. That is how Teradata knows which AMP will hold row 99. It does the math and it always gets what it always got! Hash values are calculated using a hashing formula. The Hash Map will automatically change if you add additional AMPs.

The AMP will add a Uniqueness ValueWhen the AMP receives a row it will place the row into the proper table, and the AMP checks if it has any other rows in the table with the same row hash. If this is the first row with this particular row hash, the AMP will assign a 32-bit uniqueness value of 1. If this is the second row hash with that particular row hash, the AMP will assign a uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row ID. The Row ID is how tables are sorted on an AMP.

The Row Hash always accompanies when an AMP receives a row. The AMP will then assign a Uniqueness Value to the Row Hash. It assigns a 1 if the Row Hash is unique or a 2 if it is the second or a 3 if the third, etc.An Example of an UPI Table

Below is an example of a portion of a table on one AMP. The table has a Unique Primary Index of EMP.

An Example of an NUPI Table

Below is an example of a portion of a table on one AMP. The table has a Non-Unique Primary Index (NUPI) on the Last Name called LNAME.

How Teradata Retrieves Rows with a Primary IndexIn the example below, a user runs a query looking for information on Employee 99. The PE sees that the Primary Index Value EMP is used in the SQL WHERE clause. Because this is a Primary Index access operation, the PE knows this is a one AMP operation. The PE hashes 99 and the Row Hash is 00001111000011110000111100001111. This points to a bucket in the Hash Map that represents AMP 4. AMP 4 is sent a message to get the Row Hash: 00001111000011110000111100001111 and make sure its EMP 99.

What is a Binary Search?When an AMP searches for a row using a Primary Index the AMP can perform a Binary Search. Do you remember the show "Name that Tune?" Contestants had to name songs while only getting to listen to a couple of notes. One of the famous lines from the show was, "I can name that tune in only 7 notes". This is how an AMP searches its rows for a Primary Index row. Since each table is sorted by the Primary Index Row-ID and all Row-IDs are made up of zeros and ones then Teradata can search the rows like a phone book by doing a Binary Search. This means that the AMP can go to the middle of the rows and pick a row. The system will say either "Too high", "Too low" or "Got it"! If the system says "Too Low" or "Too High" then the AMP will go halfway up or down the file and check again. It will then again hear either "Too high", "Too low" or "Got it"! This will go on until the AMP gets the right row. This is much faster than starting at row 1 and checking row by row. Now imagine there are millions of rows. A Binary Search will speed up the find dramatically.