Teradata Indexes

34
Indexes Objectives After completing this module, you should be able to: Define primary and secondary indexes and their purposes. Distinguish between a primary index and a primary key. Distinguish between a UPI and a NUPI. Define a Partition Primary Index and its purpose. Distinguish between a USI and a NUSI. Explain the makeup of the Row-ID and its role in row storage. Describe the sequence of events for locating a row. Explain the roles of the hashing algorithm and hash map in locating a row. Describe the operation of full table scans in Teradata. Indexes in Teradata Indexes are used to access rows from a table without having to search the whole table. In the Teradata RDBMS, an index is made up of one or more columns in a table. Once Teradata indexes are selected, they are maintained by the system. While other vendors may require data partitioning or index maintenance, these tasks are unnecessary with Teradata. In the Teradata RDBMS, there are two types of Teradata Indexes - Workshop 1

description

different types of indexes used in teradata

Transcript of Teradata Indexes

Indexes

Indexes

ObjectivesAfter completing this module, you should be able to:

Define primary and secondary indexes and their purposes.

Distinguish between a primary index and a primary key.

Distinguish between a UPI and a NUPI.

Define a Partition Primary Index and its purpose.

Distinguish between a USI and a NUSI.

Explain the makeup of the Row-ID and its role in row storage.

Describe the sequence of events for locating a row.

Explain the roles of the hashing algorithm and hash map in locating a row.

Describe the operation of full table scans in Teradata.

Indexes in TeradataIndexes are used to access rows from a table without having to search the whole table. In the Teradata RDBMS, an index is made up of one or more columns in a table. Once Teradata indexes are selected, they are maintained by the system. While other vendors may require data partitioning or index maintenance, these tasks are unnecessary with Teradata.

In the Teradata RDBMS, there are two types of indexes:

Primary Indexes define the way the data is distributed.

Primary Indexes and Secondary Indexes are used to locate the data rows more efficiently than scanning the whole table.

You specify which column(s) are used as the Primary Index when you create a table. Secondary Index column(s) can be specified when you create a table or at any time during the life of the table.

Data DistributionWhen the Primary Index for a table is well chosen, the table rows are evenly distributed across the AMPs for the best performance. The way to guarantee even distribution of data is by choosing a Primary Index whose columns contain unique values. The values do not have to be evenly spaced, or even "truly random," they just have to be unique to be evenly distributed.

The even distribution enables each AMP to be responsible for only a subset of the rows in a table. If the data is evenly distributed, the work is evenly divided among the AMPs so they can work in parallel and complete their processing about the same time. Even data distribution is critical to performance because it optimizes the parallel access to the data.

Unevenly distributed data, also called "skewed data," causes slower response time as the system waits for the AMP(s) with the most data to finish their processing. The slowest AMP becomes a bottleneck.

When data is loaded into the Teradata RDBMS:

The system automatically distributes the data across the AMPs based on row content (the Primary Index values).

The distribution is the same regardless of the data volume being loaded. In other words, large tables are distributed the same way as small tables.

Data is not distributed in any particular order. The automatic, unordered distribution of data eliminates tasks for a Teradata DBA that are necessary with some other relational database systems. The DBA does not waste time on labor-intensive data maintenance tasks. Some benefits of unordered data include:

Prior to loading the data, no initial data ordering or sorting is necessary.

Once data is loaded, no data maintenance is necessary to preserve the order.

SQL requests can be formulated without regard to the data order.

A Teradata system provides high performance because it distributes the data evenly across the AMPs for parallel processing.

Question

Which of the following statements do you think are true about data distribution and Teradata indexes? (Choose two answers.)

A. If a table has 103 rows but there are 4 AMPs in the system, each AMP will not have exactly the same number of rows from that table. However, if the Primary Index is chosen well, each AMP still will contain some rows from that table.

B. The rows of a table are stored on a single disk for best access performance.

C. Skewed data leads to poor performance in processing data access requests. D. Teradata RDBMS performance can be increased by maintaining the indexes and conducting periodic data partitioning and sorting.

Primary Index (PI) A Primary Index is the mechanism for assigning a data row to an AMP and a location on the AMPs disks. It is also used to access rows without having to search the entire table. You specify the column(s) that comprise the Primary Index for a table when the table is created. For a given row, the Primary Index value is the combination of the data values in the Primary Index columns.

Choosing a Primary Index for a table is perhaps the most critical decision a database designer makes, because this choice affects both data distribution and access.

Primary Index RulesThe following rules govern how Primary Indexes implemented in a Teradata system must be defined as well as how they function:

Rule 1: One Primary Index per table.Rule 2: A Primary Index value can be unique or non-unique. Rule 3: The Primary Index value can be NULL.Rule 4: The Primary Index value can be modified.Rule 5: The Primary Index of a table cannot be modified.Rule 6: A Primary Index has a limit of 16 columns.

Rule 1: One PI Per TableEach table must have a Primary Index. The Primary Index is the only way for the system to determine where a row will be physically stored. While a Primary Index may be composed of multiple columns, the table can have only one (single- or multiple-column) Primary Index.

Rule 2: Unique or Non-Unique PI There are two types of Primary Index:

Unique Primary Index (UPI) - For a given row, the combination of the data values in the columns of a Unique Primary Index are not duplicated in other rows within the table. This uniqueness guarantees uniform data distribution and direct access. For example, in the case where old employee numbers are sometimes recycled, the combination of the Last Name and Employee Number columns would be a UPI.

Non-Unique Primary Index (NUPI) - For a given row, the combination of the data values in the columns of a Non-Unique Primary Index can be duplicated in other rows within the table. A NUPI can cause skewed data, but in specific instances can still be a good Primary Index choice. For example, either the Department Number column or the Hire Date column might be a good choice for a NUPI if you will be accessing the table most often via these columns.

Rule 3: PI Can Be NULLIf the Primary Index is unique, you could have one row with a null value. If you have multiple rows with a null value, the Primary Index must be Non-Unique.

Rule 4: PI Value Can Be ModifiedThe Primary Index value can be modified. In the table below, if Loretta Ryan changes departments, the Primary Index value for her row changes.

When you update the index value in a row, Teradata re-hashes it and redistributes the row to its new location based on its new index value.

Rule 5: PI Cannot Be Modified The Primary Index of a table cannot be modified.

In the event that you need a new Primary Index, you must drop the table, recreate it with the new Primary Index, and reload the table.

In Teradata RDBMS V2R5, the ALTER TABLE statement allows you to change the PI of a table if the table is empty.

Rule 6: PI Has 16-Column LimitYou can designate a Primary Index that is composed of 1 to 16 columns.

In Teradata RDBMS V2R5, the maximum number of columns in an index is increased to 64.

SQL Syntax for Creating a Primary Index When a table is created, it must have a Primary Index specified. The Primary Index is created in the CREATE TABLE statement in SQL.

If you do not specify a Primary Index in the CREATE TABLE statement, the system will use the Primary Key as the Primary Index. If a Primary Key has not been specified, the system will choose the first unique column. If there are no unique columns, the system will use the first column in the table and designate it as a Non-Unique Primary Index.

Creating a Unique Primary Index The SQL syntax to create a Unique Primary Index is:

CREATE TABLE sample_1

(col_aINT

,col_bINT

,col_cINT)

UNIQUE PRIMARY INDEX (col_b);

Creating a Non-Unique Primary Index

The SQL syntax to create a Non-Unique Primary Index is:

CREATE TABLE sample_2

(col_xINT

,col_yINT

,col_zINT)

PRIMARY INDEX (col_x);

Modifying thePrimary Index of a Table

As mentioned in the Primary Index rules, you cannot modify the Primary Index of a table. In the event that you need a new Primary Index, you must drop the table, recreate it with the new Primary Index, and reload the table.

Data Mechanics of Primary IndexesThis section describes how Primary Indexes are used in:

Data distribution

Data access

Distributing Rows to AMPsRows are distributed to AMPs during the following operations:

Loading data into a table (one or more rows, using a data loading utility)

Inserting or updating rows (one or more rows, using SQL)

Changing the system configuration (redistribution of data, caused by reconfigurations to add or delete AMPs)

When loading data or inserting rows, the data being affected by the load or insert is not available to other users until the transaction is complete. During a reconfiguration, no data is accessible to users until the system is operational in its new configuration.

Row Distribution Process

The process the system uses for inserting a row on an AMP is described below:

1. The system uses the Primary Index value in each row as input to the hashing algorithm.

2. The output of the hashing algorithm is the row hash value (in this example, 646).

3. The system looks at the hash map, which identifies the specific AMP where the row should be stored (in this example, AMP 3).

4. The row is stored on the target AMP.

UPI: The system automatically checks for duplicate UPI values when rows are loaded or inserted. If a row already exists with the UPI value, the new row is not added.

NUPI: The system does not check for duplicate NUPI values. If a row already exists with the NUPI value, the new row is added to the same AMP.

Hash Map

A hash map is an array that associates hash bucket numbers with specific AMPs. While it has a limited number of hash buckets, there are enough hash buckets to minimize the number of hash collisions (when the hashing algorithm calculates the same row hash value for two different rows).

The hash map is a GDO (globally distributed object), which is a file that is copied and distributed to every node in the system. If an AMP is executing a request that requires information in a GDO, it can access the copy of the GDO on its node.

Duplicate Row Hash Values It is possible for the hashing algorithm to end up with the same row hash value for two different rows. There are two ways this could happen:

Duplicate NUPI values: If a Non-Unique Primary Index is used, duplicate NUPI values will produce the same row hash value.

Hash synonym: Also called a hash collision, this occurs when the hashing algorithm calculates an identical row hash value for two different Primary Index values. Hash synonyms are very rare. When using a Unique Primary Index, you will still get uniform data distribution.

To differentiate each row in a table, every row is assigned a unique Row ID. The Row ID is the combination of the row hash value and a uniqueness value.

Row ID = Row Hash Value + Uniqueness Value

The uniqueness value is used to differentiate between rows whose Primary Index values generate identical row hash values. In most cases, only the row hash value portion of the Row ID is needed to locate the row.

When each row is inserted, the AMP adds the row ID, stored as a prefix of the row. The first row inserted with a particular row hash value is assigned a uniqueness value of 1. The uniqueness value is incremented by 1 for any additional rows inserted with the same row hash value.

Duplicate RowsA duplicate row is a row in a table whose column values are identical to another row in the same table. In other words, the entire row is the same, not just an index. Although duplicate rows are not allowed in the relational model (because every Primary Key must be unique), Teradata does allow duplicate rows because the capability is a part of the ANSI standard.

Because duplicate rows are allowed in Teradata, how does it affect the UPI, which, by definition, is unique? When you create a table, the following definitions determine whether or not it can contain duplicate rows:

MULTISET tables: May contain duplicate rows. Teradata will not check for duplicate rows.

SET tables: The default. Teradata checks for and does not permit duplicate rows. If a SET table is created with a Unique Primary Index, the check for duplicate rows is replaced by a check for duplicate index values.

Accessing a Row With a Primary Index When a user submits an SQL request using the table name and Primary Index, the request becomes a one-AMP operation, which is the most direct and efficient way for the system to find a row. The process is explained below.

Hashing Process 1. The primary index value goes into the hashing algorithm.

2. The output of the hashing algorithm is the row hash value.

3. The hash map points to the specific AMP where the row resides.

4. The PE sends the request directly to the identified AMP.

5. The AMP locates the row(s) on its vdisk.

6. The row data is sent over the BYNET to the PE, and the PE sends the answer set on to the client application.

Choosing a Unique or Non-Unique Primary IndexCriteria for choosing a Primary Index include:

Uniqueness: A UPI guarantees even data distribution, so is often a good choice. A NUPI with few duplicate values could provide good (if not perfectly uniform) distribution, and might meet the other criteria better.

Use in value access: Retrievals, updates, and deletes that specify the Primary Index are much faster than those that do not. Because a Primary Index is a known access path to the data, it is best to choose column(s) that will be frequently used for access. For example, the following SQL statement would directly access a row based on the equality WHERE clause:

SELECT * FROM employee WHERE employee_ID = ABC456789

A NUPI may be a better choice if the access is based on another, mostly unique column. For example, the table may be used by the Mail Room to track package delivery. In that case, a column containing room numbers or mail stops may not be unique if employees share offices, but a better choice for access.

Use in join access: SQL requests that use a JOIN statement perform the best when the join is done on a Primary Index. Consider Primary Key and Foreign Key columns as potential candidates for Primary Indexes. For example, if the Employee table and the Payroll table are related by the Employee ID column, then the Employee ID column could be a good Primary Index choice for one or both of the tables.

Non-volatile values: Look for columns where the values do not change frequently. For example, in an Invoicing table, the outstanding balance column for all customers probably has few duplicates, but probably changes too frequently to make a good Primary Index. A customer ID, statement number, or other more stable columns may be better choices.

When choosing a Primary Index, try to find the column(s) that best fit these criteria and the business need.

QuestionsWhat do you think are key considerations in choosing a Primary Index? (Choose three.)

A. Column(s) containing unique (or nearly unique) values for uniform distribution.

B. Column(s) with values in sequential order for best load and access performance.

C. Column(s) frequently used in queries to access data or to join tables.

D. Column(s) with values that are stable (do not change frequently), to minimize redistribution of table rows.

E. Column(s) with many duplicate values for redundancy.

Partitioned Primary IndexIn Teradata RDBMS V2R5 there is a new indexing mechanism called Partitioned Primary Index (PPI). PPI is used to improve performance for large tables when you submit queries that specify a range constraint. PPI allows you to reduce the number of rows to be processed by using a new technique called partition elimination. PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints.

How Does PPI Work?Data distribution with PPI is still based on the Primary Index:

Primary Index

Hash Value

Determines which AMP gets the row

With PPI, the ORDER in which the rows are stored on the AMP is affected. Using the traditional method, No Partitioned Primary Index (NPPI), the rows are stored in row hash order.

4 AMPs with Orders Table Defined with NPPI

Using PPI, the rows are stored first by partition and then by row hash. In our example, there are four partitions. Within the partitions, the rows are stored in row hash order.

4 AMPs with Orders Table Defined with PPI on O_Date

Data Storage Using PPITo store rows using PPI: specify Partitioning in the CREATE TABLE statement. The query will run through the hashing algorithm as normal, and come out with the Base Table ID, the Partition number(s), the Row Hash, and the Primary Index values.

Data Storage Using PPI

Access Without a PPILet's say you have a table with Store information by Location and did not use a PPI. If you query on Location 3 on this NPPI table, the entire table will be scanned to find records for Location (Full Table Scan).

Access Without a PPIQUERY SELECT * FROM Employee_NPPI WHERE Location_Number = 3;PLAN ALL-AMPs - Full Table Scan

Access With a PPIIn the same example for a PPI table, you would partition the table with as many Locations as you have (or will soon have in the future.) Then if you query on Location 3, each AMP will use partition elimination and each AMP only has to scan partition 3 for the query. This query will run much faster than the Full Table Scan in the previous example.

Access With a PPIQUERY SELECT * FROM EmployeeWHERE Location_Number = 3;PLAN ALL-AMPs - Single Partition Scan

Secondary Index (SI)A Secondary Index is an alternate data access path. It allows you to access the data without having to do a full table scan. Secondary indexes do not affect how rows are distributed among the AMPs.

You can drop and recreate secondary indexes dynamically, as they are needed. Unlike Primary Indexes, Secondary Indexes are stored in separate subtables that require extra overhead in terms of disk space, and maintenance which is handled automatically by the system. So, Secondary Indexes do require some system resources.

Question

In what instances would it be a good idea to define a secondary index for a table? (This information will be covered in this module, but here is a preview.)

1. The Primary Index exists for even data distribution and data access, but a Secondary Index is defined to efficiently generate monthly reports based on a different set of columns.

2. The Product table is accessed by the retailer (who accesses data based on the retailer's product code column), and by a vendor (who access the same data based on the vendor's product code column).

3. The table already has a Unique Primary Index, but a second column must also have unique values. The column is specified as a Unique Secondary Index (USI) to enforce uniqueness on the second column.

4. All of the above.Secondary Index RulesSeveral rules that govern how Secondary Indexes must be defined and how they function are:

Rule 1: Secondary Indexes are optional. Rule 2: Secondary Index values can be unique or non-unique.Rule 3: Secondary Index values can be NULL.Rule 4: Secondary Index values can be modified.Rule 5: Secondary Indexes can be changed.Rule 6: A Secondary Index has a limit of 16 columns.

Rule 1: Optional SIWhile a Primary Index is required, a Secondary Index is optional. If one path to the data is sufficient, no Secondary Index need be defined.

You can define 0 to 32 Secondary Indexes on a table for multiple data access paths. Different groups of users may want to access the data in various ways. You can define a Secondary Index for each heavily used access path.

Rule 2: Unique or Non-Unique SI Like Primary Indexes, Secondary Indexes can be unique or non-unique.

A Unique Secondary Index (USI) serves two possible purposes:

Enforces uniqueness in a column or group of columns. The database will check USIs to see if the values are unique. For example, if you have chosen different columns for the Primary Key and Primary Index, you can make the Primary Key a USI to enforce uniqueness on the Primary Key.

Speeds up access to a row. Accessing a row with a USI requires one or two AMPs, which is less direct than a UPI (one AMP) access, but more efficient than a full table scan.

A Non-Unique Secondary Index (NUSI) is usually specified to prevent full table scans, in which every row of a table is read. The Optimizer determines whether a full table scan or NUSI access will be more efficient, then picks the best method. Accessing a row with a NUSI requires all AMPs.

Rule 3: SI Can Be NULLAs with the Primary Index, the Secondary Index column may contain NULL values.

Rule 4: SI Value Can Be ModifiedThe values in the Secondary Index column may be modified as needed.

Rule 5: SI Can Be ChangedSecondary Indexes can be changed. Secondary Indexes can be created and dropped dynamically as needed. When the index is dropped, the system physically drops the subtable that contained it.

Rule 6: SI Has 16-Column LimitYou can designate a Secondary Index that is composed of 1 to 16 columns. To use the Secondary Index below, the user would specify both Budget and Manager Employee Number.

In Teradata RDBMS V2R5, the maximum number of columns in an index is increased to 64.

Using Secondary IndexesIn the table below, users will be accessing data based on the Department Name column. The values in that column are unique, so it has been made a USI for efficient access. In addition, the company wants reports on how many departments each manager is responsible for, so the Manager Employee Number can also be made a secondary index. It has duplicate values, so it is a NUSI.

How Secondary Indexes Are StoredSecondary indexes are stored in index subtables. The subtables for USIs and NUSIs are distributed differently:

USI: The Unique Secondary Indexes are hash distributed separately from the data rows, based on their USI value. (As you remember, the base table rows are distibuted based on the Primary Index value). The subtable row may be stored on the same AMP or a different AMP than the base table row, depending on the hash value.

NUSI: The Non-Unique Secondary Indexes are stored in subtables on the same AMPs as their data rows. This reduces activity on the BYNET and essentially makes NUSI queries an AMP-local operation - the processing for the subtable and base table are done on the same AMP. However, in all NUSI access requests, all AMPs are activated because the non-unique value may be found on multiple AMPs.

Data Access Without a Primary IndexYou can submit a request without specifying a Primary Index and still access the data. The following access methods do not use a Primary Index:

Unique Secondary Index (USI)

Non-Unique Secondary Index (NUSI)

Full Table Scan

Accessing Data with a USIWhen a user submits an SQL request using the table name and a Unique Secondary Index, the request becomes a one- or two-AMP operation, as explained below.

USI Access 1. The SQL is submitted, specifying a USI (in this case, a customer number of 56).

2. The hashing algorithm calculates a row hash value (in this case, 602).

3. The hash map points to the AMP containing the subtable row corresponding to the row hash value (in this case, AMP 2).

4. The subtable indicates where the base row resides (in this case, row 778 on AMP 4).

5. The message goes back over the BYNET to the AMP with the row and the AMP accesses the data row (in this case, AMP 4).

6. The row is sent over the BYNET to the PE, and the PE sends the answer set on to the client application.

As shown in the example above, accessing data with a USI is typically a two-AMP operation. However, it is possible that the subtable row and base table row could end up being stored on the same AMP, because both are hashed separately. If both were on the same AMP, the USI request would be a one-AMP operation.

Accessing Data with a NUSIWhen a user submits an SQL request using the table name and a Non-Unique Secondary Index, the request becomes an all-AMP operation, as explained below.

NUSI Access 1. The SQL is submitted, specifying a NUSI (in this case, a last name of "Adams").

2. The hashing algorithm calculates a row hash value for the NUSI (in this case, 567).

3. All AMPs are activated to find the hash value of the NUSI in their index subtables. The AMPs whose subtables contain that value become the participating AMPs in this request (in this case, AMP1 and AMP2). The other AMPs discard the message.

4. Each participating AMP locates the row IDs (row hash value plus uniqueness value) of the base rows corresponding to the hash value (in this case, the base rows corresponding to hash value 567 are 640, 222, and 115).

5. The participating AMPs access the base table rows, which are located on the same AMP as the NUSI subtable (in this case, one row from AMP 1 and two rows from AMP 2).

6. The qualifying rows are sent over the BYNET to the PE, and the PE sends the answer set on to the client application (in this case, three qualifying rows are returned).

Accessing Data Without IndexesIn Teradata, you can access data on any column, whether that column is an index or not. You can ask any question, of any data, at any time.

If the request does not use a defined index, Teradata does a full table scan. A full table scan is another way to access data without using Primary or Secondary Indexes. In evaluating an SQL request, the Optimizer examines all possible access methods and chooses the one it believes to be the most efficient.

While Secondary Indexes generally provide a more direct access path, in some cases the Optimizer will choose a full table scan because it is more efficient. A request could turn into a full table scan when:

An SQL request searches on a NUSI column with many duplicates. For example, if a request using last names in a Customer database searched on the very prevalent "Smith" in the United States, then the Optimizer may choose a full table scan to efficiently find all the many matching rows in the result set.

An SQL request uses a non-equality WHERE clause on an index column. For example, if a request searched an Employee database for all employees whose annual salary is greater than $100,000, then a full table scan would be used, even if the Salary column is an index. In this example, full table scan can be avoided by using equality WHERE clause on a defined index column.

An SQL request uses a range WHERE clause on an index column. For example, if a request searched an Employee database for all employees hired between January 2001 and June 2001, then a full table scan would be used, even if the Hire_Date column is an index.

For all requests, you must specify a value for each column in the index or Teradata will do a full table scan. A full table scan is an all-AMP operation, and each data row is accessed only once. As long as the choice of Primary Index has caused the table rows to distribute evenly across all of the AMPs, the parallel processing of the AMPs working simultaneously can accomplish the full table scan quickly.

While full table scans are impractical and even disallowed on some commercial database systems, Teradata routinely permits ad hoc queries with full table scans.

Summary of Keys and IndexesSome fundamental differences between Keys and Indexes are shown below:

KeysIndexesA relational modeling convention used in a logical data model.

A Teradata mechanism used in a physical database design.

Uniquely identify a row (Primary Key).

Used for row distribution (Primary Index).

Establish relationships between tables (Foreign Key).

Used for row access (Primary Index and Secondary Index).

While most commercial database systems use the Primary Key as a way to retrieve data, a Teradata system does not. In a Teradata system, you use the Primary Key only when designing a database, as a mechanism for maintaining referential integrity according to relational theory. The Teradata RDBMS itself does not require keys in order to manage the data, and can function fully with no awareness of Primary Keys.

The Teradata parallel architecture uses Primary Indexes to distribute and access the data rows. A Primary Index is always required when creating a Teradata table.

A Primary Index may include the same columns as the Primary Key, but does not have to. In some cases, you may want the Primary Key and Primary Index to be different. For example, a credit card account number may be a good Primary Key, but customers may prefer to use a different kind of identification to access their accounts.

Rules for Keys and IndexesA summary of the rules for keys (in the relational model) and indexes (in the Teradata RDBMS) is shown below.

Rule

Primary KeyForeign KeyPrimary IndexSecondary Index1

One PKMultiple FKs One PI 0 to 32 SIs 2

Unique valuesUnique or non-unique Unique or non-unique Unique or non-unique 3

No NULLsNULLs allowed NULLs allowedNULLs allowed 4

Values should not changeValues may be changedValues may be changed (redistributes row)Values may be changed5

Column should not changeColumn may changeColumn cannot be changed (drop and recreate table)Index may be changed (drop and recreate index)6

No column limitNo column limit 16-column limit 16-column limit 7

n/aFK must exist as PK in the related tablen/an/a

Defining Primary and Foreign Keys in TeradataAlthough Primary Indexes are required and Primary Keys are not, you do have the option to define a Primary Key or Foreign Key for any table. When you define a Primary Key in a Teradata table, the RDBMS will implement the specified column(s) as an index. Because a Primary Key requires unique values, a defined Primary Key is implemented as one of the following:

Unique Primary Index (If the DBA did not specify the Primary Index in the CREATE TABLE satement.)

Unique Secondary Index (If columns other than the Primary Index are chosen)

When a Primary Key is defined in Teradata SQL and implemented as an index, the rules that govern that type of index now apply to the Primary Key. For example, in relational theory, there is no limit to the number of columns in a Primary Key. However, if you specify a Primary Key in Teradata SQL, the 16-column limit for indexes now applies to that Primary Key.

In Teradata RDBMS V2R5, the maximum number of columns in an index is increased to 64.

Questions

What provides uniform data distribution through the hashing algorithm?

UPI

NUPI

Both UPI and NUPI

Neither UPI nor NUPI

The output from the hashing algorithm is the:

hash map

uniqueness value

row ID

row hash

Choose the appropriate answers from the drop-down boxes that complete each sentence:

Accessing a row with a Unique Secondary Index (USI) typically requires one/ two/all AMP(s).

Accessing a row with a Non-Unique Secondary Index (NUSI) requires one/two/ all AMP(s).

A full table scan accesses one/two/ all row(s).

Accessing a row with a Unique Primary Index (UPI) accesses one/two/all row(s) on one AMP.

Accessing a row with a Non-Unique Primary Index (NUPI) accesses multiple rows on one/two/all AMP(s).

The row ID helps the system to locate a row in case of a(n):

even distribution of rows.

Unique Primary Index.

multi-AMP request.

hash synonym.

PAGE 24Teradata Indexes - Workshop