Teradata Idexes

Click here to load reader

  • date post

    14-Apr-2015
  • Category

    Documents

  • view

    214
  • download

    17

Embed Size (px)

description

Teradata Indexes

Transcript of Teradata Idexes

Teradata IndexesWhat they are and how they workAlison Torres, DirectorTeradata Warehouse Consulting Teradata Certified Master V2R3 & V2R5

Teradata Overview

3

Teradata NodeCommunication InterfacesLAN Gateway

Channel

Teradata RDBMS

PDE (Parallel Database Extensions) UNIX / Windows / Linux O/S

4

Teradata expansion to MPP nodesPE V-Proc V-Net PE V-Proc PE V-Proc V-Net PE V-Proc

B Y N E T S W I T C H

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

AMP V-Proc

DATA

DATA

DATA

DATA

DATA

DATA

DATA

DATA

5

Agenda Primary Indexes Partitioned Primary Index Secondary Indexes> Unique and Non-unique

Other Types of Secondary Indexes> Single Table Join Index > Value Ordered Index > Sparse Join Index > Value Ordered Sparse Single Table Join Index > Hash Index

6

Primary Indexes Primary physical access path Mechanism used to assign a row to an AMP Table must have one and only one Primary Index Primary Index cannot be changed without recreating the table UPIs result in even distribution of the rows of the table across all AMPs. UPIs ensure no duplicate rows PI access are always one-AMP operations NUPIs will result in even distribution of the table rows proportional to the degree of uniqueness of the index and the number of AMPs Primary Indexes may or may not be the same as Primary Keys

7

Primary Indexes: How They WorkPrimary Index Column(s) Value(s)

Teradata Hash Function

RowHash (Hash Bucket) Data Columns

Hash Map BYNET

AMP

AMP

AMP

Table A

RH

D

AMP

AMP

Rows ordered by RH

8

Agenda

Partitioned Primary Index> How PPIs work > PPI Performance Considerations and Trade-offs

9

Partitioned Primary Index (PPI) Description> A table organization to optimize the physical database design for range constrained queries > Allows partitioning of large history tables by a numeric value (e.g.. month) so queries can access only one month of a history

Your Benefit> Significantly improve performance for range constrained queries >Strategic queries still see all data in one table, but tactical queries look only at the subset they need >Performance improvements for other functions like deletes and updates >Read only a subset of table > Easy to manage >None of the pain of re-partitioning >All the self management you expect from Teradata >Reduce high-volume batch insert times by 90% >Delete large volumes of rows, nearly instantaneously >Drop unneeded secondary indexes or value-ordered join indexes

10

Partitioned Primary Index (PPI) Overview of the Basics> Rows are still hash distributed among the AMPs on the primary index columns > Rows are ordered by partition, then by Primary Index Hash within partition > CREATE TABLE statement has new PARTITION BY clause > Partitioning column can be part of primary index, but is not required to be In many cases, better performance occurs when partitioning column is part of the primary index

> Maximum of 65,535 partitions, numbered from one > One or more columns in partitioning expression

11

Partitioned Primary Index (PPI) Restrictions on PPI Tables> For PI to be unique (UPI with partition), partitioning column must be part of the PI > No character or graphic comparison allowed in partitioning expression > PPIs allowed on base tables, Global Temporary, and Volatile Temporary tables only

Performance Awareness> Possible degradation of PI Access If partitioning column is not qualified, all partitions will be read Joins on PI columns from a non-PPI table to a PPI table will result in comparing PI column in every PPI table partition

12

Partitioned Primary Indexes: How They Work(since V2R5.0)Partitioning Columns Primary Index Columns Teradata Hash Function

User-specified Partitioning Function

Partition

RowHash (Hash Bucket)

Data Columns

Hash Map BYNET

AMP

AMP

AMP

Table APartition 1 Partition 2

P

RH

D

AMP

AMP

Partition 3 Rows ordered by RH Partition 4

13

Non-PPI vs. PPIAMP with PI Requires Full Table Scan AMP with PPI Partitions are accessed as needed

SEL WHERE order_date BETWEEN DATE 2007-01-01 and DATE 2008-01-14;

14

Partitioned Primary Indexes Trade-off considerations> Potential advantages Partition elimination Finer granularity on separation of data May eliminate need for some secondary indexes Deletes by partitions are nearly instantaneous

> Potential disadvantages Rows are two bytes wider PI cannot be defined as unique when partitioning column is not part of PI Access can be degraded if partitioning column is not specified in the query Joins to non-partitioned tables with same PI may be degraded

15

Partitioned Primary Indexes Trade-off considerations> Common errors The Optimizer needs to see the partitioning column as a constant to determine which partitions can be excluded Caution when doing range partition deletes, if data falls outside partition range, it will be moved to the NO RANGE partition and the move will not be fast

> Conclusions PPI can offer dramatic improvements in query response time and in high volume data load and maintenance operations May be degradations in PI access and in join steps due to PPI DBA should understand trade-off considerations Testing of various alternatives will usually be necessary to get the maximum benefit from PPI

16

Agenda

Secondary Indexes> Unique Secondary Index (USI) > Non-unique Secondary Index (NUSI)

17

Secondary Indexes >A secondary index is an alternate path to the rows of a table. >Secondary indexes: Do not affect table distribution. Add overhead, both in terms of disk space and maintenance. May be added or dropped dynamically as needed. Are chosen to improve access performance.

18

Unique Secondary Index (USI) AccessCreate USI CREATE UNIQUE INDEX (cust) on customer;AMP 1 AMP 2

BYNETAMP 3 AMP 4

Access SELECT * customer via USI FROM WHERE cust = 54;

USI SubtableRowID Cust RowID

USI SubtableRowID Cust RowID

USI SubtableRowID Cust RowID

USI SubtableRowID Cust RowID

PE

Customer table Id = 100 USI Value = 54

244, 1 505, 1 744, 4 757, 1

74 77 51 27

884, 1 639, 1 915, 9 388, 1

135, 1 296, 1 602, 1 969, 1

98 84 54 49

555, 6 536, 5 778, 7 147, 1

288, 1 339, 1 372, 2 588, 1

31 40 45 95

638, 1 640, 1 471, 1 778, 3

175, 1 489, 1 838, 1 919, 1

37 72 12 62

107, 1 717, 2 147, 2 822, 1

Table ID

Row Hash Unique Val 778 7

Hashing AlgorithmTable ID 100 Row Hash USI Value 602 54

100

BYNETAMP 1 AMP 2 AMP 3 AMP 4

Base TableRowID Cust Name USI Phone NUPI

Base TableRowID Cust Name USI Phone NUPI

Base TableRowID Cust Name USI Phone NUPI

Base TableRowID Cust Name USI Phone NUPI

107, 1 536, 5 638, 1 640, 1

37 84 31 40

White 555-4444 Rice 666-5555 Adams111-2222 Smith 222-3333

471, 1 555, 6 717, 2 884, 1

45 98 72 74

Adams444-6666 Brown 333-9999 Adams666-7777 Smith 555-6666

147, 1 147, 2 388, 1 822, 1

49 12 27 62

Smith 111-6666 Young 777-4444 Jones 222-8888 Black 444-5555

639, 1 778, 3 778, 7 915, 9

77 95 54 51

Jones 777-6666 Peters 555-7777 Smith 555-7777 Marsh 888-2222

19

Non-Unique Secondary Index (NUSI) AccessCreate NUSI CREATE INDEX (name) on customer; Access via NUSI SELECT * FROM customer WHERE name = Adams;BYNET

PECustomer table Id = 100 NUSI Value = Adams Hashing AlgorithmTable ID Row Hash NUSI Value

AMP 1

AMP 2

AMP 3

AMP 4

NUSI SubtableRowID 448, 1 656, 1 567, 3 432, 8 Name White Rice Adams Smith RowID 107, 1 536, 5 638, 1 640, 1

NUSI SubtableRowID Name RowID 567, 2 Adams 471, 1 717, 2 852, 1 Brown 555, 6 432, 3 Smith 884, 1

NUSI SubtableRowID 432, 1 770, 1 567, 6 448, 4 Name Smith Young Jones Black RowID 147, 1 147, 2 338, 1 822, 1

NUSI SubtableRowID 262, 1 396, 1 432, 5 155, 1 Name Jones Peters Smith Marsh RowID 639, 1 778, 3 778, 7 915, 9

100

567

Adams

Base TableRowID Cust Name NUSI Phone NUPI

Base TableRowID Cust Name NUSI Phone NUPI

Base TableRowID Cust Name NUSI Phone NUPI

Base TableRowID Cust Name NUSI Phone NUPI

107, 1 536, 5 638, 1 640, 1

37 84 31 40

White 555-4444 Rice 666-5555 Adams111-2222 Smith 222-3333

471, 1 45 555, 6 98 717, 2 72 884, 1 74

Adams444-6666 Brown 333-9999 Adams666-7777 Smith 555-6666

147, 1 147, 2 388, 1 822, 1

49 12 27 62

Smith 111-6666 Young 777-4444 Jones 222-8888 Black 444-5555

639, 1 778, 3 778, 7 915, 9

77 95 54 51

Jones 777-6666 Peters 555-7777 Smith 555-7777 Marsh 888-2222

20

Full Table Scans vs. Non-Unique Secondary Index (NUSI) Full Table Scans Read Every Data Block> Great when aggregating Table Data Blocks

NUSI NUSI useful when not every block is read> Usefulness depends on % of rows qualifying and number of rows in a data block Optimizer will decide best access methods Use EXPLAIN to determine index usage Collect Statistics on NUSIs

21

Overlooked NUSI CriteriaUsage depends on rows per block that qualify Example 1: IF >= 1 row per block qualifies, THEN full table scan of the base table is faster than NUSI access and NUSI is not used> If 100 rows/block and 1% of the data qualifies, then every block will be read. Full Table Scan is faster.

Example 2: IF < 1 row per block qualifies, THEN NUSI access is faster than full table scan> If 100 rows/block and 1 in 1000 rows qualify, then 1 in every 10 blocks would be read. NUSI will be used.

22

More NUSI CriteriaUneven Distribution of Values Some values represent a large percentage of the table, other values have few instances > Full Table Scan done for values that represent a large percent of table > NUSI is used for values that represent a tiny percent of the table

Example:*> Large corporation with 100,000 calls / month would do Full Table Scan > Residential phone customer with 20 calls / month would use NUSI*(Candidate

for Sparse Single Table Join Index - STJI)

Sparse index is a special case of a STJI Create a join index, qualify with where clause. Cant just put a where clause on a SI.

23

NUSI - Index Covering Index Covering> Occurs when Query can be satisfied by columns in the secondary index > Enables scanning secondary index sub-table instead of primary data table > Savings based on number of bytes (columns) in NUSI definition versus number of bytes (columns) in table definition. > Example on next page Table has 26 data columns; NUSI has 5 data columns I/O savings ~ 60% - 80%

24

NUSI Vertical Partitioning of Data Index Covering - example - NUSI with 5 columns Table has 26 data columns; NUSI has 5 data columns I/O savings ~ 60% - 80%

Table Data

NUSI

NUSI contains Row Hash Code of data row

Query satisfied with NUSI access only

25

NUSI on PI Columns of PPI Table NUSIs can be defined on the same columns as the PI of the PPI table For a given value, accessing a NUSI on PI column of a PPI table results in a single-AMP operation Example:> > > > Access seasonal items sold in a store PPI on Store and Item with Partition on Date NUSI on Item NUSI accesses only the partitions for the months when item was sold

26

Agenda

Other Types of Secondary Indexes> Single Table Join Index > Value Ordered Index > Sparse Join Index > Value Ordered Sparse Single Table Join Index > Hash Index

27

Other Types of Secondary Indexes Join Index> Used to define a pre-joined table on frequently joined columns (with optional aggregation) without denormalizing the database. > Used to create a full or partial replication of a base table with a primary index on a foreign key column table to facilitate joins of very large tables by hashing their rows to the same AMP as the large table. > Used to define a summary table without denormalizing the database. > You can define a join index on one or several tables.

Sparse Index> Any join index, whether simple or aggregate, multi-table or singletable, can be sparse. > Uses a constant expression in the WHERE clause of its definition to narrowly filter its row population.

Value-Ordered NUSI> Very efficient for range conditions and conditions with an inequality on the secondary index column set.

Hash Index> Used for the same purposes as single-table join indexes. > Create a full or partial replication of a base table with a primary index on a foreign key column table to facilitate joins of very large tables by hashing them to the same AMP. Limited to one table only.

28

Spectacular Gains Through IndexesVertical Partitioning Indexes> Non-Unique Secondary Index (NUSI) > Single Table Join Index > Release 5 - Expanded to 64 columns in an index

Enhanced Index Features> Index Covering > Value Ordering > Sparse Index (Qualification of rows to put into STJI)

Horizontal Partitioning of Data Table> Partitioned Primary Index (PPI)

29

Single Table Join Index (STJI) - Index Covering More on Vertical Partitioning of Data> Index Covering - example - STJI with 5 columns1 5 6 11 15 RI 1 5 6 11 15

Table Data

STJI

STJI can have Row Hash Code (ROWID) of data row

Different structure, query satisfied with STJI access Index maintained automatically

30

Single Table Join Index (STJI) Similarities between STJI and NUSI> STJI can be defined with same columns as NUSI > Index covering applies to both STJI and NUSI > Can do value ordering on both STJI and NUSI

Basic Differences between STJI and NUSI> STJI is similar to a table with a primary index and other columns defined Means STJI row can be stored on same or different AMP as table data row; NUSI stored on same AMP as table data row

> Cannot join to a NUSI while can join to STJI > NUSI supported by MultiLoad, but not STJI > All columns of NUSI must be accessed for Index to be considered

31

Single Table Join Index (STJI) Different Primary Index Use Different Primary Index for STJI to avoid row redistribution at time of query Primary Index for base table is (store, item, date_sold) Primary Index for STJI is (store, item) Effect is for a given item and store, all rows with different dates are grouped together in same data block* Same table, indexed two different ways

32

Single Table Join Index (STJI) Different Primary IndexFind certain items sold in a set of stores for a given set of dates Case: table PI = (store, item, date_sold) Rows are redistributed and sorted to get all rows with same store and item and date on same AMP Case: STJI PI = (store, item) Rows have been redistributed and grouped together at the time STJI was built

SELECT item, COUNT (DISTINCT(Store_no)) FROM Sales_History WHERE On_hand_qty > 0 AND qty_sold > 0 AND item IN (x,y,z) AND date_sold IN (a,b,c) AND store IN (d,e,f) GROUP BY 1 ORDER BY 1;

Result: Query suite ran 10 times faster with a STJI

33

Single Table Join Index (STJI) Same Primary Index Partial Covering Use Same Primary Index for STJI so STJI row is on same AMP as table data row Similar to NUSI STJI row is on same AMP as data row Useful for partial covering Partial covering means qualification is done on Index columns before accessing primary data table for non-covered columns. This can reduce the number of rows to retrieve from base table Acts like a NUSI> AMP-local and no BYNET traffic

Useful for scoring:> Only want 2000 names of the 20M Scan the STJI which is a narrower table Then go back to base table

34

STJI - Using LIKE clauseCREATE JOIN INDEX LIKETAB as SELECT car_license, pi_col, ROWID FROM Customer_Info PRIMARY INDEX (pi_col) ; SELECT * from Customer_Info WHERE car_license LIKE ABC% ;

Query scans LIKETAB (a very narrow table), qualifies rows with car_license LIKE ABC%, then uses ROWID to get data from table Customer_Info where row is on same AMP because tables have same primary index* Optimizer will NOT scan NUSI for LIKE, instead it will scan base table

35

STJI - Using LIKE clauseCustomer needed OLTP type response time> seconds, not minutes

LIKE clause on base table does full table scan> 40 million rows with 19 columns on 2 node system > Took 1 minute Full Table Scan

Build STJI with column for LIKE plus ROWID of base table> Use LIKE clause on narrower table > Took 4 seconds

36

Value Ordered Index Value-ordering option on NUSI and STJI> Numeric restriction of 4 bytes only > Integer values only - no character data

V2R5 - Expanded from 16 to 64 columns Syntax CREATE INDEX OrdDate ORDER BY VALUES ON ORDERS ; (orderdate) (orderdate)

37

Value Ordered Indexes

Value Ordered Non-Unique Secondary Index (VONUSI) Value Ordered Single Table Join Index (VOSTJI) Example> Invoice Table - 60 Million rows 1500 days X 100 outlets X 400 sales / day

> Invoice_Item Table - 240 Million rows 1500 days X 100 outlets X 400 sales/day X 4 items/sale

38

Value Ordered IndexesTable Data Rows are hashed Query does full table scan Value Ordered NUSI/STJI Scans only value specified portion of table

39

Value Ordered IndexesTeradata Database V2R4 - VONUSI & VOSTJI

day

Note: Cannot join to NUSI but can join to STJI

40

SPARSE Join IndexesDescription> Indexes a portion of the table that is used most frequently > Uses WHERE clause predicates to limit the rows indexed > Like other index choices, a sparse Join Index should be chosen to support high frequency queries requiring short response times

Your Benefit> A sparse index can focus on the portion of the table(s) that are most frequently used: Reduces the storage requirements for a join index and maintenance costs for updates Indexed Makes access faster since the size Index Column of the join index is smaller Null No change for the user: optimizer will evaluate all join indexes, and Null choose one if it's appropriate for the specific query NullNull

Sparse Index (NOT NULL)

41

Sparse Single Table Join Index Form of Horizontal Partitioning Built with a qualification of which rows to store in index> CREATE JOIN INDEX J1 AS SELECT * FROM sales WHERE Status = ACTIVE;

42

Sparse Single Table Join Index Residential Customers>Comprise 94% of all customers >Make 50% of all calls

Skewed Distribution of Phone Calls per dayNumber of Calls 1200

Large Corporation averages 1000 calls per day1000

800

600

Small Businesses average 25 calls per day

Application needs quick response for Residential customer calls Use Sparse STJI instead of NUSI>Saves space and maintenance time

400

200

Residential Customers average 2.2 calls per day

0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

43

Sparse Single Table Join Index - Other Examples Insurance 10 year history> Clients renew insurance every 6 months > 1/20th of data represents current paid policies

Manufacturing parts - used and available in inventory> Less than 1% of parts are available > Retrieval looking for available parts

Retail Filled versus unfilled Orders> Less than 0.5% of orders are unfilled

44

Value Ordered Sparse Single Table Join Index Combination of Vertical and Horizontal Partitioning> Sparse Join Index qualifies only active rows > Build index only with columns needed

Value order the Sparse Join Index

12 month history of all flyers and all flights

Only todays flights and todays flyers

45

Value Ordered Sparse STJI on PPI Table Join Index Maintenance is not supported by Multiload Qualify Sparse Join Index on Partitioning Value> Rebuilding Index only requires processing qualifying Partition > Time to rebuild and space required for Sparse STJI can be quite small

Retailer has 100M rows/day

Subset of latest data

46

Hash Index Similar to STJIHash Index automatically includes ROWID Hash Index can have a primary index different from the primary index of the base table> Different primary index allows for redistribution of data at time hash index is built useful for index covering > Also useful when number of rows for a given value is less than half the number of AMPs in a table

Hash Index can be value ordered Uses different DDL to create it than STJI syntax

47

Hash Index Similar to STJI Hash Index can have the same primary index as the base table> Having the same primary index allows for Partial Covering where rows are qualified by columns in index and retrieval of table data is on the same AMP

Example - Marketing Campaign Reduce list of prospects with successive qualifying queries using Hash Index until final number achieved, then get detailed data.

48

Join Index Types Simple Join Indexes are, like all Teradata indexes, automatically updated with the base tables, and automatically evaluated and selected by the Optimizer. Single Table Join Indexes, like Hashed NUSIs, are built on a single table, used primarily for covering (base table row IDs are optional) and can be hashed on a user-defined Primary Index. Multi-table Join Indexes can store covering data from as many as 64 base tables. NUSIs can be defined on these indexes, and the user can define the PI column. Aggregate Join Indexes may include SUM and COUNT values (from which Averages may be calculated) on one or more of its columns. They may be defined on: > Single Tables - A columnar subset of a base table with aggregates automatically maintained by the software, or > Multiple Tables - A columnar subset of as many as 64 base tables with aggregate columns automatically maintained. > Sparse Join Indexes are defined with a WHERE clause that limits the number of base table rows included and space required to store them.

49

Teradata Indexes Primary Indexes Partitioned Primary Index Secondary Indexes> Unique and Non-unique

Other Types of Secondary Indexes> Single Table Join Index > Value Ordered Index > Sparse Join Index > Value Ordered Sparse Single Table Join Index > Hash Index

50

Alison [email protected]

Additional Considerations

52

Partitioned Primary Indexes

Obvious PPI candidate table> Large sales history table with 24 full months and current month-to-date > Nightly batch inserts of that days transactions (high volume) and monthly deletes of oldest data Consider partitioning by transaction date

> Primary Index is (product_code, transaction_date, agent_id) No secondary indexes or join indexes

> Some queries access PI > No other tables have the same primary index (and will not join to it using the PI)

53

Partitioned Primary Indexes Obvious PPI candidate table> What makes it such a good PPI candidate? High volume of daily inserts, so there is a bias toward partitioning on transaction date Transaction date is part of PI, so that is even better

> Proposal: Convert to PPI, partitioned by transaction date with daily granularity > Could also consider partitioning by product_code or agent_id Would improve some queries Would not improve batch insert or delete operations

54

Partitioned Primary Indexes Obvious PPI candidate table> Summary of benefits of PPI proposal: > Improves performance of batch inserts of daily transactions and periodic bulk deletes of oldest data Faster inserts: most insets will be appended to end of table Faster deletes: ALTER TABLE DROP RANGE is nearly instantaneous (disclaimer: no secondary index or join index)

> Many queries can benefit from partition elimination > PI access is not degraded much, if at all > Joins should not be degraded (but check EXPLAINs, and do comparison testing if they change) If joins are slower, could consider weekly or monthly partitions instead of daily

55

Partitioned Primary Indexes Obvious PPI candidate table> Disadvantages of PPI proposal: > Full Table scan queries will have to read a little more data, due to the two-byte partition number embedded in each row If rows average 50 bytes, then 4% more disk space is needed. Secondary index rows would also be wider (this example has none)

> Reconfig and Table Rebuild will be slower These are infrequent operations

56

Partitioned Primary Indexes

Obvious PPI candidate table Alternative> Why daily granularity? > Finer granularity improves the batch load performance more than coarser granularity no difference for batch deletes > Queries can get more partition elimination with finer granularity, if they specify short time intervals > No real disadvantage to having daily partitions for this example (if join EXPLAINs are unchanged), so go for maximum granularity

57

Partitioned Primary Indexes Maybe Yes/Maybe No PPI candidate table> Large invoice table Four years of history Unique Primary Index is invoice number Partition candidate is invoice date Nightly batch inserts, monthly deletes of oldest data Fairly high volume of PI accesses Some time-constrained queries Other tables have same PI, and no invoice date column

58

Partitioned Primary Indexes Maybe Yes/Maybe No PPI candidate table> Why is it an uncertain candidate for partitioning? PI is single column, unique, and used for access and joins

> Advantages of partitioning on invoice date Inserts and deletes will be faster, but the secondary index will reduce the amount of improvement Date-constrained queries will be faster

> Disadvantages of partitioning on invoice date Must define PI as non-unique, and define unique secondary index to enforce uniqueness USI prevent MultiLoad protocol inserts

PI access will use secondary indexes, will take two or three times as long as non-PPI PI access Joins to other tables with same PI will probably be degraded

59

Partitioned Primary Indexes

Maybe Yes/Maybe No PPI candidate table> Worthwhile to convert to PPI? Need to measure performance difference Need to assess relative business importance of performance differences Might consider schema changes to accommodate PPI. For example, denormalize other tables with same PI by adding the invoice date column to improve join performance