TD 12 Overview

Teradata Database 12.0 Overview

Rich Charucki

Director – Product Management

[email protected]

2

Ease of UseEnterprise Fit

Active EnablePerformance

Cost, Quality and Supportability

Teradata Strategic Dimensions

3




Teradata Database 12.0 Features for…

4

Teradata Database 12.0 –Features for... Performance§ Multi-Level Partitioned Primary Index

§ OCES-3

§ Enhanced Query Rewrite Capability

§ Extrapolate Statistics Outside of Range

§ Parameterized Statement Caching Improvements

§ Increase Statistics Intervals

§ Collect Statistics for Multi-Column NULL Values

§ Collect AMP Level Statistics Values

§ Parameterized Statement Caching Improvements

§ Windowed Aggregate Functions

§ Hash Bucket Expansion

5

Multi-Level Partitioned Primary Index

§ Description

– Extend the existing Partitioned Primary Index (PPI) capability to support and allow for the creation of a table or non-compressed join index with a Multi-Level Partitioned Primary Index (ML-PPI).

§ Benefit

– The use of ML-PPI on table(s) affords a greater opportunity for the Teradata Optimizer to achieve a greater degree of partition elimination at a more granular level which in turn results in achieving a greater level of query performance.

§ Considerations

– The Teradata Optimizer determines whether or not the Index and partitioning is usable as part of the best-cost query planning process and will engage the use of the Index as part of the plan to execute a given query automatically.

6

ML-PPI - Concepts

§ Multi-level partitioning allows each partition at a level to be sub-partitioned.

§ Each partitioning level is independently defined using a RANGE_N or CASE_N expression.

§ Internally, these range partitions are combined into a single partitioning expression that defines how the data is partitioned on the AMP.

§ If PARTITION BY is specified the table is called a partitioned primary index (PPI) table.

7

ML-PPI - Concepts

§ If only one partitioning expression is specified, that PPI is called a single-level partitioned primary index (or single-level PPI).

§ If more than one partitioning expression is specified, that PPI is called a multi-level partitioned primary index (or multi-level PPI).

§ For PPI tables, the rows continue to be distributed across the AMPs in the same fashion, but on each AMP the rows are ordered first by partition number and then within each partition by hash.

§ Teradata combines multiple “where” predicates that result in partition elimination.

§ In a ML-PPI table, any single partition or any number or combination of the partitions may be referenced and used for partition elimination.

8

ML-PPI – AMP-Level Row GroupingCREATE TABLE Sales

(storeid INTEGER NOT NULL,productid INTEGER NOT NULL,salesdate DATE FORMAT 'yyyy-mm-dd' NOT NULL,totalrevenue DECIMAL(13,2),totalsold INTEGER,note VARCHAR(256))

UNIQUE PRIMARY INDEX (storeid, productid, salesdate)PARTITION BY (

RANGE_N(salesdate BETWEEN DATE '2003-01-01' AND DATE '2005-12-31'EACH INTERVAL '1' YEAR),

RANGE_N(storeid BETWEEN 1 AND 300 EACH 100),RANGE_N(productid BETWEEN 1 AND 400 EACH 100));

RANGE_N(salesdate BETWEEN DATE '2003-01-01' AND DATE '2005-12-31'EACH INTERVAL '1' YEAR),

RANGE_N(storeid BETWEEN 1 AND 300 EACH 100),RANGE_N(productid BETWEEN 1 AND 400 EACH 100));

Promotion1312612003-12-24 36382411

Slow day4730552004-11-09 241175322

Marginal6819722003-07-06 18471211

Good day4241582003-04-15 1096111

notetotalsoldtotal

revenuesalesdate productidstoreidL3L2L1

SalesInput to Partition Function

9

82

2003-12-24

Level 3: ProductidLevel 2: StoreidLevel 1: SalesDate

363

2411752004-11-09

ML-PPI: Partitioning Visual Graphic

10

ML-PPI – Partition Scanning Visual Example§ Data Scanned: Last Month vs. Same Month LY for one

Region out of Four:

No Partitioning

Single Level Partitioning

Multi-Level Partitioning

11

ML-PPI - Example

§ An insurance company often performs analysis for a specific state and within a date range that is a small percentage of the many years of claims history in their data warehouse.

§ Partition elimination using multiple expressions for filtering based on WHERE clause predicates would benefit performance.

§ If analysis is being performed for Connecticut claims, claims in June 2005, or Connecticut claims in June 2005, a partitioning of the data that allows elimination of all but the desired claims has an extreme performance advantage.

§ It should be noted that ML-PPI provides direct access to partitions regardless of the number of levels specified in the query assuring partition elimination and enhancing query performance.

12

ML-PPI - Example

CREATE TABLE claims

(claim_id INTEGER NOT NULL,

claim_date DATE NOT NULL,

state_id BYTEINT NOT NULL,

claim_info VARCHAR(20000) NOT NULL)

PRIMARY INDEX (claim_id)

PARTITION BY (

/* Level one partitioning expression. */

RANGE_N( claim_date BETWEEN DATE '1999-01-01' AND DATE '2005-12-31'

EACH INTERVAL '1' MONTH),

/* Level two partitioning expression. */

RANGE_N( state_id BETWEEN 1 AND 75 EACH 1));

13

ML-PPI - Example

§ Eliminating all but one month out of their many years of claims history would facilitate scanning of less than 5% of the claims history for satisfying the following query:

SELECT * FROM claims

WHERE claim_date

BETWEEN DATE '2005-06-01' AND DATE '2005-06-30';

14

ML-PPI - Example

§ Similarly, eliminating all but the Connecticut claims out of the many states where the insurance company does business would facilitate scanning of less than 5% of the claims history for satisfying the following query:

SELECT * FROM claims, states

WHERE claims.state_id = states.state_id

AND states.state = 'Connecticut';

15

ML-PPI - Example

§ Combining both of these predicates for partition elimination would facilitate scanning less than 0.5% of the claims history for satisfying the following query:

SELECT * FROM claims, states

WHERE claims.state_id = states.state_id

AND states.state = 'Connecticut'

AND claim_date

BETWEEN DATE '2005-06-01'

AND DATE '2005-06-30';

16

ML-PPI - Rules

§ Existing limits and restrictions for partitioned primary indexes also apply to a multi-level partitioned primary index with the following additions:

– If more than one partitioning expression is specified in the PARTITION BY clause, each such partitioning expression must consist solely of either a RANGE_N or CASE_N function

– If more than one partitioning expression is specified in the PARTITION BY clause, the product of the number of partitions defined by each partitioning expression must not exceed 65535 and each partitioning expression must define at least two partitions

– The maximum number of partitioning expressions is 15

– A partitioning expression must not contain the system-derived columns PARTITION#L1 through PARTITION#L15

17

ML-PPI: Partition Ordering

§ Partition Ordering:

– In a ML-PPI scheme, defined partitions are hierarchical in nature

– Query performance is still optimized through partition elimination even when only one-level of an ML-PPI scheme is specified

– Can only ADD partitions to the First-Level of ML-PPI scheme

– First-order of Partitioning should be the level that potentially may change the most

18

OCES-3§ Description

– Implement the next level of enhancements for the Optimizer Cost Estimation Sub-System such as:§ New costing methods

§ More accurate row and spool estimates

§ Expanded statistical information

– Goal is to improve the accuracy of costing the various operations within a query plan.

§ Benefit– Major improvements to accuracy of query planning will result in overall

query performance improvement and reduction of query rework efforts.

– Accurate plans feed more accurate and more granular workload management via workload categorizations and filters.

§ Considerations– Potentially, some queries can have performance regressions and in most

cases, these will be considered as defects when the performance impact is larger than the standard 5% of margin error for performance testing

–

19

OCES-3: Main Categories of Enhancements§ Derived Statistics

– An expansion, re-interpretation and propagation of collected statistics§ Improved Single Table Estimates

– Improve the accuracy of estimates that result from applying selection criteria on single-tables

§ Handling of Stale Statistics– Detection and adjustment of stale statistics by comparing against random

AMP sampling

§ Consolidation of multiple sources of information– Join index statistics, check constraints, referential integrity constraints

(hard or soft), all can supplement base table statistics

§ Many other minor costing enhancements

– Better skew detection during join planning

– Editing cost of result set

– Nested joins and bit map index costing

20

OCES-3: Derived Statistics are Central to Enabling Other OCES Enhancements

§ Within the query: Statistics are re-assessed and adjusted after each logical operation (selection, join, aggregation)

– Previously, base table statistics were re-used for all steps

– New “derived statistics” allow for more accurate costing for multi-step plans

– Information about skew can now be applied to spool files

§ Across the session: Derived Statistics can be propagated to global or volatile tables

– Session-level derived statistics are held in memory across multiple requests

– Similar information as in the statistics histogram

– Used by standard insert/select operations

21

Enhanced Query Rewrite Capability

§ Description

– This Enhanced Query Rewrite Feature (QRW) is referred to as the process of re-writing a query Q to Q’ such that both queries are semantically equivalent (produce the same answer set) and that the Q’ query (after rewrite) runs faster than the original query Q. Join elimination, view folding, transitive closure, predicate move around and join index usage are examples of QRW techniques.

§ Benefit

– Architecture re-organization and code cleanup. QRW will be a separate subsystem called directly by the parser as opposed to being driven by the Resolver.

– Functional enhancements of the existing rewrite. This part mainly addresses enhancing the logic of view folding to include a more general class of views that are involved with outer joins.

– The addition of a new rewrite is added that pushes projections into views and could actually trigger other rewrites.

§ Considerations

– Query Rewrite (QRW) requires no user intervention and is completely done by the Optimizer. Some queries will run faster with these optimizations and query explain plans may change because of the extra conditions added or joins that have been eliminated.

22

Enhanced Query Rewrite Capability –Projection Pushdown

§ Projection Pushdown

– Eliminates columns in a view definitions SELECT list if the columns are not referenced by the query itself.

CREATE VIEW Sales_By_Product ASSELECT Product_Key,

Product_Name,SUM(Quantity * Amount) Total

FROM Sales, ProductWHERE Sales_Product_Key =

Product_KeyGROUP BY Product_key,

Product_name;

SELECT MAX(total) Max_SaleFROM Sales_By_Product;

Select Max(Total) Max_SaleFrom (Select Sum (Quantity * Amount) Total

From Sales, Product Where Sales_Product_Key =

Product_Key Group By Product_Key,

Product_Name) Sales_By_Product;

Projection Pushdown rewrite will offer a performance gain and a reduction in spool consumption by only spooling the columns thatare necessary to support the query.New: Cases where the View or Derived Table must be spooled have these optimizations applied.

23

Enhanced Query Rewrite Capability –Predicate Pushdown§ Predicate Pushdown

– Provides the capability to rewrite certain queries such that WHERE predicates that are stated outside a view or derived table, can be “pushed” inside a view or derived table and applied directly as part of the query execution.

SELECT MAX (Total) TotalFROM (SELECT Product_Key,



Product_KeyGROUP BY Product_Key,

Product_Name) VWHERE Product_Key IN (10, 20, 30);

SELECT MAX (Total) TotalFROM (SELECT Product_key,



Product_KeyAND Product_Key IN (10, 20, 30)

GROUP BY Product_Key,Product_Name) V

QRW provides for diminished spool usage and performance gain through the application of WHERE predicates directly inside a view or derived table. New: Cases where the View or Derived Table must be spooled have these optimizations applied.

24

Enhanced Query Rewrite Capability –Pushing Joins Into UNION ALL Views

§ Pushing Joins Into UNION ALL Views

– A cost-based rewrite that allows certain foreign-key –primary-key (FK-PK) joins to be applied before UNION ALL.

CREATE VIEW Jan_Feb_Sales ASSELECT *

FROM Sales1 UNION ALL

SELECT * FROM Sales2;

SELECT SUM(Quantity * Amount) Total

FROM Jan_Feb_Sales, ProductWHERE Sales_Product_Key =

Product_Key AND Product_Name

LIKE ‘Gourmet%';

SELECT SUM(Quantity * Amount) TotalFROM (SELECT Quantity,

Amount FROM Sales1, Product WHERE Sales_Product_Key =

Product_Key AND Product_Name

LIKE ‘Gourmet%' UNION ALL SELECT Quantity,

Amount FROM Sales2, Product

WHERE Sales_Product_Key =Product_Key

AND Product_Name LIKE ‘Gourmet%'

) Jan_Feb_Sales ;

QRW provides for diminished spool usage and performance gain through the application of Joins and WHERE predicates at applicable points within the UNION ALL query.

25

Extrapolate Statistics Outside of Range

§ Description

– Enhance the Teradata Optimizer to include a new extrapolation technique specifically designed to more accurately provide for astatistical estimate for date range-based queries that specify a “future”date that is outside the bounds of the statistics that have been collected for the column.

§ Benefit

– The Optimizer extrapolation technique for date range-based queries that supply “future” dates will result in better query plans due to the fact that cardinality estimation will be much more accurate. Because of the new extrapolation formula it is also possible that statistics for the associated date columns would not have to be re-collected as often.

§ Considerations

– Extrapolation for date range-based queries will not change the procedure for dropping or collecting statistics nor will the help statistics features be affected. However, the information displayed within a query Explain plan will change because of the new numbers for estimated rows.

– Specific consideration should be given to collecting statistics less frequently on columns which will now be extraoplated.

26

Stale Statistics Detection

§ Currently, the table row count is estimated from the random AMP sampling or the statistics on the primary index (PI) of the table.

§ If the primary index has statistics collected, they are always trusted and used by ignoring the random AMP samples.

§ In TD 12.0, instead of always trusting Primary Index histogram row count, the row count from Random AMP Sampling and the histogram are compared and a decision is made based on certain normalization heuristics.

§ The histogram row count is compared with the table row count and if the deviation is more than the defined threshold, the histogram is determined as stale.

§ Stale histograms are specially tagged in the optimizer and valuecount/row extrapolations are done when they are used for cardinality estimations.

§ Stale Statistics Detection also applies to tables that have zero statistics as well and allows for table row count extrapolation.

27

Stale Statistics Detection

Prior to TD 12.0 TD 12.0

PI Random Samples

Row count

from PI Histogram

Join planner

Used only when no stats on PI

PI Random SamplesRow count from PI Histogram

Table demo

Normalization heuristics

Join planner

Table demo

28

Extrapolate Statistics Outside of Range§ Assuming RPV (rows per value) is the same across the data

table.

Data Table(Days)

l h

Added number of unique values = v’(Estimated from Distinct value extrapolation)

e

1')( -´-+= v

vlhheExtrapolated boundary:

Statistics Collected(Number of unique values = v) Extrapolated boundary

29

Extrapolate Statistics Outside of Range –Closed Range Query - Example I

select * from ordertbl

where O_ORDERDATE between ’07-17-2007’ and ’07-23-2007’

§ Current behavior: provides an estimate of approximately 3 million rows based on collected statistics

Query:

QueryDateRange

Data Table(Days)

01/01/07 07/19/07

Statistics Collected(Average values per day = 1 million

for 200 days)

07/17/07 07/23/07

Extrapolated boundary = 08/08/07

§ New behavior: provides an estimate of approximately 7 million rows based on current and extrapolated statistics

Extrapolated # of distinct values = 20

20020)'01012007''19072007(''19072007' ´-----+--@

30

Extrapolate Statistics Outside of Range –Closed Range Query - Example II



§ Current behavior: provides “1 row” as the estimate and assumes statistics are correct

01/01/07 07/19/07

Statistics Collected(Average values per day = 1 million)

07/21/07 07/25/07

Extrapolated boundary(08/08/07)

(Estimated number of rows = 5 million)

§ New behavior: provides an estimate of approximately 5 million rows based on extrapolated statistics

Query:

QueryDateRange

Data Table(Days)

31

Extrapolate Statistics Outside of Range –Closed Range Query - Example III




01/01/07 07/19/07


08/06/07 08/11/07




Query:

QueryDateRange

Data Table(Days)

32

Extrapolate Statistics Outside of Range –Open Range Query - Example I


where O_ORDERDATE >= ’07-16-2007’

§ Current behavior: provides an estimate of approximately 4 million rows based on collected statistics

01/01/07 07/19/07


07/16/07 No End Date




Query:

QueryDateRange

Data Table(Days)

33

Extrapolate Statistics Outside of Range –Open Range Query - Example II




01/01/07 07/19/07






Query:

QueryDateRange

Data Table(Days)

34

Extrapolate Statistics Outside of Range –Open Range Query - Example III



§ Current and new behavior remains the same:

Provides “one row” as the estimate (zero is rounded up to one)

01/01/07 07/19/07



Extrapolated date(08/08/07)

(Est. number of rows = Zero)

Query:

QueryDateRange

Data Table(Days)

35



§ Current behavior: provides an estimate of approximately 2 million+ rows based on collected statistics

01/01/07 05/31/07

Statistics Collected(Average rows per day = 1 million until

05/31/07 when rows become very sparse)




§ New behavior: provides an estimate of approximately 8 million+ rows based on extrapolated statistics

01/01/2147

Extrapolate Statistics Outside of Range –For Column With HIGH DATE

Query:

QueryDateRange

Data Table(Days)

36





37

Teradata Database 12.0 –Features For...Active Enabled

§ Online Archive

§ Bulk SQL Error Logging Tables

§ Full ANSI Merge-Into Capability

§ Replication Scalability

§ Restartable Scandisk

§ Checktable Utility Performance

§ Table Functions Without Join-Back

38

Online Archive

§ Description

– Online archive allows the archival of a running database; that is, a database can be archived in conjunction with concurrently executing update transactions for the tables in the database. Transactional consistency is maintained by tracking any changes to a table in a log such that changes applied to the table during the archive can berolled back to the transactional consistency point after the restore.

§ Benefit

– Online archive removes the requirement of having a “window” where updates must be held up while backup procedures are executed. Additionally, object locking will be eased and the full-performance impact of permanent journals will be removed.

§ Considerations

– Online archive will be integrated into the Open Teradata Backup

(OTB) suite of products associated with this release.

39

Online Archive – How it Works

§ Backup Method– Initiate archive with new LOGGING statement

– ARC takes a checkpoint on table(s)

– Rows written to the transient journal are logged

– After table is backed up, log is backed up automatically to complete the archive

§ Restore Method

– Restore the table(s)

– ARC restores the log rows as part of the Restore process

– Log rows automatically used to “roll” the restored table back to the state at the beginning of the archive

§ Locking

– Read Lock required during the Checkpoint

– “Drains” transactions and utilities to get clean point at beginning

– No locks held during archive

– DDL are aborted (not blocked)

40

Bulk SQL Error Logging Tables§ Description

– Provide support for complex error handling capabilities during bulk SQL Insert, Update and Delete operations through the use of new SQL-based error tables.

§ Benefit

– Complementary capability to using native Teradata utilities, this feature increases flexibility/opportunity in developing load strategies by allowing SQL to be used for batch updates that contain errors, provide error reporting similar to current load utilities while overcoming current restrictions on having unique indexes, join indexes and triggers resident on target tables.

– Feature allows a SQL DML statement to continue to completion after logging an error rather than performing an abort and rollback.

§ Considerations

– Strong consideration should be given to re-evaluating current batch load/ETL processes to take advantage of bulk SQL load operations that are not currently considered due to current limitations/restrictions.

41

Bulk SQL Error Logging Tables –Error Table Creation Syntax

§ New Non-ANSI Syntax:

CREATE ERROR TABLE [<error table>] FOR <data table>;

– If an optional name is not supplied, the error table name will default to ET_<data table name>

– If the data table name is longer than 27 characters, it will be truncated at the end. No warning will be returned.

– If <error table> is not specified, or if <error table> is specified without an explicit database name, then the error table will be created in the current default database for the session

– An error table may be created for a data table with a maximum of 2,048 columns

– In addition to the data table contents, the error table will house 18 additional error related columns

– COMMENTs on columns in the data table will not be carried over to the error table. However, COMMENTs may be added to the error table columns if desired.

– Access rights required for CREATE ERROR TABLE statements would be the same as those for CREATE TABLE statements

42

Bulk SQL Error Logging Tables - Logging

§ SQL Bulk SQL Logging Options

– A “LOGGING ERRORS” option has been added to existing SQL syntaxes for INSERT-SELECT and MERGE-INTO statements

– This option permits users to specify the kinds of errors that can be logged

– Errors will be classified into 2 categories: Local and Non-local

§ Local errors are defined as errors that occur on the same AMP that inserts the data row

§ Non-Local errors are defined as errors that occur on an AMP that does not own the data row

– The “LOGGING ERRORS” options is applicable to both ANSI and Teradata modes.

43

Bulk SQL Error Logging Tables - Logging

§ Logging Local Errors:

– Local Errors are comprised of the following:

– Duplicate row (ANSI mode only):

§ INSERT-SELECT into a SET table is ignored in Teradata mode

– Duplicate Primary Key Errors

– CHECK constraint violations

– LOB non-pad data truncation errors

– Data conversion errors that occur during data row inserts

§ Logging Non-Local Errors:

– Non-Local Errors are comprised of the following:

– Referential Integrity violations

– Unique Secondary Index Violations

44

Bulk SQL Error Logging Tables –Insert-Select Syntax Extension

INS[ERT] [INTO] tablename{ [VALUES] (expr [ ... ,expr] ) }{ (columnname [ ... ,columnname]) VALUES (expr [ ... ,expr]) } ;{ [ (columnname [ ... ,columnname] ) ] subquery [error_logging_option] }{ DEFAULT VALUES }Where error_logging_option is

{ ALL [except_option] } [ LOGGING [ { } ] ERRORS [error_limit_option] ] ;

{ DATA }Where error_limit_option is

{ NO LIMIT }[ WITH { } ]

{ LIMIT OF <number> }and except option is

{ REFERENCING }[ EXCEPT { } ]

{ UNIQUE INDEX }

45

Full ANSI Merge-Into SQL Capability§ Description

– Enhance the Merge-Into SQL capability to support full ANSI functionality. This feature will allow the database to perform a true bulk UPSERT operation with a standard SQL statement. Additionally, this enhancement also provides for the non-ANSI extensions to support additional error-handling capabilities.

§ Benefit

– The new SQL Merge functionality lifts the current restriction of only supporting single-row merges and will allow multiple table rows to be processed in this fashion. Bulk UPSERT processing capability will no longer be limited to the Multiload utility and the extended error-handling capabilities will allow native SQL to become usable in given load strategy scenarios while at the same time overcoming current utility restrictions regarding unique indexes, join indexes and triggers resident on target tables.

§ Considerations

– Strong consideration should be given to re-evaluating current batch load/ETL processes to take advantage of full ANSI Merge-Into SQL capability for load operations that are not currently considered due to current limitations/restrictions

46

Full ANSI Merge-Into SQL Capability - SyntaxMERGE [ INTO ] <target table> [ [ AS ] <merge correlation name> ]

USING <table reference>

ON <search condition>

<merge operation specification>;

<merge correlation name> ::= <correlation name>

<merge operation specification> ::= <merge when clause> ...

<merge when clause> ::= <merge when matched clause> |

<merge when not matched clause>

<merge when matched clause> ::= WHEN MATCHED THEN<merge update specification>

<merge when not matched clause> ::= WHEN NOT MATCHED THEN<merge insert specification>

<merge update specification> ::= UPDATE SET <set clause list>

<merge insert specification> ::=INSERT [ ( <insert column list ) ][ <override clause> ]VALUES <merge insert value list>

<merge insert value list> ::= ( <merge insert value element[ { , <merge insert value element> } ... ] )

<merge insert value element> ::= <value expression> |

<contextually typed value specification>

47

Full ANSI Merge-Into SQL Capability –Syntax ExtensionMERGE [INTO] tablename [ [AS] aname ]

{ VALUES (expr [...,expr]) }USING { } [AS] source_tname (cname, [...,cname])

{ ( subquery ) }ON match-conditionWHEN MATCHED THENUPD[ATE] SET cname = expr [...,cname = expr]

WHEN NOT MATCHED THEN{ [VALUES] (expr [...,expr]) }

INS[ERT] { } { (cname [...,cname]) VALUES (expr [...,expr]) }

{ ALL [except_option] } [ LOGGING [ { } ] ERRORS [error_limit_option] ] ;

{ DATA }Where error_limit_option is

{ NO LIMIT }[ WITH { } ]

{ LIMIT OF <number> }and except option is

{ REFERENCING }[ EXCEPT { } ]

{ UNIQUE INDEX }

48

Bulk SQL Batch Test Result Summary

60%19%Elapsed Time Improvement

138=14 + 124294=14 + 280295=14 + 281FastLoad + Merge

342 sec361sec296 secMLoad Total

100 % UPD50% INS,

50% UPD100% INSERT

FastLoad= 14 sec

End to end comparison:MLoad (Acquisition + Applied Phases) vs. FastLoad + SQL Bulk Batch

49





50

Teradata Database 12.0Features for…Enterprise Fit

§ Java Stored Procedures

§ Restore/Copy Dictionary Phase

§ Restore/Copy to Different Configuration Data Phase Performance

§ Cursor Positioning for MSR

§ UNICODE Support for password control & encryption

§ Custom Password Dictionary Support

§ New Password Encryption Algorithm

51

Java Stored Procedures

§ Description

– Provide the database user with a means to define an external stored procedure (XSP) written in the Java language which can use JDBC to dynamically execute SQL within the same session.

§ Benefit

– Java applications will now be able to access data from the Teradata database directly. This feature leverages the ever-present Java skills in our customer base.

§ Considerations

– Java Stored procedures will operate only on Linux or Windows based Teradata platforms. MP-RAS support is not planned.

52





53

Teradata Database 12.0 Features for Cost, Quality & Supportability

§ Compression on Soft/Batch RI Columns

§ Dispatcher Fault Isolation

54

Allow Compression on Soft/Batch Referential Integrity Columns§ Description

– Allow compression on columns that are either a Parent Key or Foreign Key in a Soft/Batch Referential Integrity scheme.

§ Benefit

– Lifting of this restriction will allow two important features (Referential Integrity and Compression) to work together without limitation.

§ Considerations

– Columns that are part of a Primary Index designation are still not compressible. As such, if primary Index columns are used in a PK-FK referential integrity scheme they will not be compressible as well.

55

Dispatcher Fault Isolation

§ Description

– Increasing system availability by preventing DBS reset loops arising from re-submissions of fault-causing requests and gracefully abort a fault-causing request or transaction where possible. Also, a session will be logged-off forcibly when the number of fault instances has exceeded a threshold value.

§ Benefits

– Escalating a fault to a DBS reset will be performed only as a last resort thus naturally facilitating an increase in system availability.

§ Considerations

– With this feature, the Dispatcher will join both the Parser and the AMPs in support of fault isolation.

56





57

Teradata Database 12.0 –Features for…Ease of Use

§ TASM: Query Banding

§ TASM: Traffic Cop

§ TASM: Global/Multiple Exceptions

§ TASM: Utility Management

§ TASM: Open API’s

§ Enhanced Data Collection: DBQL & ResUsage

§ Enhanced Explain Plan Details

§ Stored Procedure Result Sets

§ SQL Invocation via External Stored Procedures

§ Index Wizard Support for PPI

§ Dynamic Result Row Specification on Table Functions

§ Normalized AMPUsage View for Coexistence

58

TASM: Query Banding

§ Description

– A method for “tagging” a query, utilizing a Name/Value pair identification scheme, such that a query’s originating source and purpose can be readily identified.

§ Benefit

– Increases accuracy and granularity of a query’s source and purpose, fosters better resource accounting and makes the request-generating application an integral part of workload management. Additionally, TASM rules can be setup to act specifically on any existing Name/Value pair enabling better and finer workload management.

§ Considerations

– Feature should be used for all applications, but especially for applications that submit queries through session-pooling that use a common logon user-id. Capability can also be set at both the Session and Transaction levels.

59

TASM: Query Banding – Usage

§ Examples

– SET QUERY_BAND = ‘org=Finance;report=Fin123;’

FOR SESSION;

– SET QUERY_BAND = ‘Document=XY1234;Universe=East;’FOR TRANSACTION;

– SET QUERY_BAND = NONE FOR SESSION;

§ Note:

– Partner tools and Teradata applications are planning roadmaps to utilize/generate the Query Banding.

– Customers need to have Query Banding added to their applications.

60

TASM: “Traffic Cop”

§ Description– Extends active workload management to automatically detect, notify,

and act on planned and unplanned system and enterprise events. TASM then automatically implements a new set of workload management rules, a working value set (WVS), specific to the detected events and resulting system condition.

§ Benefits– Ability to automatically adjust workload management rules when the

system enters or exits a degraded mode to ensure critical systemwork continues to get priority for resources. This also allows for “application situational” events to be considered beyond just the prior date/time operating environments. For instance workload management rule changes based on actual batch reports completingversus an approximate completion time.

§ Considerations– Involves creation and configuration of a TASM 2-dimensional “State

Matrix” in TDWM aligning operating environments with system conditions.

61

TASM “Traffic Cop” Events§ AMP Fatal: Number of AMP’s reported as fatal at system startup.

§ Gateway Fatal: Number of gateways reported as fatal at system startup.

§ PE Fatal: Number of PE’s reported as fatal at system startup.

§ Node Down: Maximum percent of nodes down in a clique at system startup.

§ AWT Limit: Number of AWT’s used for MSGWORKNEW & MSGWORKONE work on an AMP.

– The levels of these two message work types are a barometer of the work level for an AMP. Configured as an AMP threshold with an AWT “limit” number with an associated qualification time (default of 180 seconds)

§ Flow Control: Number of AMP’s in flow control

– Configured as a “limit” number with an associated qualification time (default of 180 seconds)

§ User Defined: Notification via an API when the event occurs and completes or can complete based on a time-out period set through the API at start up.

– Begin/End batch processing

– Begin/End key workload (end of month processing)

– Dual System offline/online

62

TASM “Traffic Cop” State MatrixState Matrix - Detections can yield a change in system state and associated WVS. The state matrix facilitates simplicity.

63

TASM “Traffic Cop” State MatrixState Matrix - Detections can yield a change in system state and

associated WVS. The state matrix facilitates simplicity.

All levels of priority workload and

session concurrency allowed.

64



System Exception Occurs:

Node Failure

65



Workload automatically changes and low priority

session work reduced from 20 to 3 concurrent sessions

66

TASM Utility Management§ Description

– Provide a “Delay” option to Utility Throttles allowing for queuing of jobs exceeding the threshold instead of the prior limitation of only being able to “Reject”.

– Extend utility management from load and export control to include backup and recovery jobs as well. An “Archive/Restore” option has been added to both Utility Throttles and WD Utility Classification Criteria.

§ Benefit

– Additional TASM functionality for load, export, backup, and restore jobs is a key area of focus in extending Teradata workload management. Feature provides the ability to ensure that these utilities do not impact higher priority system work, can be controlled during system state changes, or that they can get prioritized when deemed necessary (for instance, during a batch window).

§ Considerations

– TASM utility rules apply to FastLoad, MultiLoad, FastExport, TPTLoad/Update/Export operator, JDBC FastLoad, and ARCMAIN. Additionally note that utility throttles apply only to the type and number of utilities running on Teradata, as such they cannot be associated with Teradata Database objects.

67

TASM: Open APIs

§ Description

– Provide a set of Application Program Interfaces (APIs) that allow application or third-party tools to interface directly with the Teradata Active Systems Management (TASM) software components.

§ Benefit

– Feature provides the mechanism by which applications or third-party tools may influence or enhance the working of the TASM software to suit their particluar needs and requirements.

§ Considerations

– Use of the TASM Open APIs can have a profound affect on given workload and how they are managed and should only be utilized if the required functionality is not available with the TASM framework.

68

Enhanced Data Collection: DBQL

§ Description

– Provide additions to the data columns in the Database Query Logging (DBQL) facility.

§ Benefit

– DBQL is an extremely valuable tool that facilitates query analysis, including the capture of executed SQL and associated resource consumption.

– Extending the DBQL content simply enables the capability to provide deeper query analysis, fosters better workload understanding andprovides the basis for query tracking and for optimizing performance.

§ Considerations

– As the content of DBQL expands now and in the future, a greater consideration as to it’s adoption should be considered by database administrators.

69

Enhanced Data Collection: DBQL

§ Additional Information in DBQL:

– Parsing Engine CPU time

– High and low AMP byte counts for spool

– Normalized CPU data for co-existence systems

– Cost estimates (CPU, I/O, network, heuristics)

– Estimate processing time and row counts

– Additional Utility related information

70

Enhanced Data Collection: Resusage

§ Description

– Provide new tables to the Teradata Resource Sub-System (RSS) that provide additional details on AMP Worker Tasks (AWTs) and enhance other RSS tables with provisional information on workload definitions.

§ Benefit

– Enhancement of the Resusage system tables provides additional insight into system consumption and fosters better workload management.

§ Considerations

– As the content of Resusage system tables expands now and in the future, a greater consideration as to leverage it’s contents should be considered by database administrators.

71


§ “ResUsageSAWT” Columns of Interest:

–– MailBoxDepthMailBoxDepth Current depth of the AMP work mailbox.

–– FlowControlledFlowControlled Specifies if an AMP is in flow control.

–– FlowCtlCntFlowCtlCnt Number of times during the log period that the system entered the flow control state.

–– InuseMaxInuseMax Maximum # of AWTs in use at any one time.

–– WorkTypeInuseWorkTypeInuse Current # of AWTs in use during the log period for each work type for the VprId vproc.

–– WorkTypeMaxWorkTypeMax Maximum # of AWTs in use at one time during the log period for each work type for the VprIdvproc.

72


§ “ResUsageSPS” Columns of Interest:

–– AGIdAGId Identifies current Allocation Group for Perf Group ID

–– RelWgtRelWgt Relative weight of the Allocation Group.

–– CPUTimeCPUTime Milliseconds of CPU time consumed by assoc. task.

–– IOBlksIOBlks # of logical data blocks read and/or written by PG.

–– NumProcsNumProcs # of processes assigned to the PG.

–– NumSetsNumSets Allocation Group set division type.

–– NumRequestsNumRequests # of requests for the AMP Worker Task’s.

–– QWaitTimeQWaitTime Time that work requests waited on an input queue before being serviced.

–– QWaitTimeMaxQWaitTimeMax Maximum time that work requests waited.

–– QLengthQLength # of work requests waiting on the input queue.

–– QLengthMaxQLengthMax Max # of work requests waiting on input queue.

–– ServiceTimeServiceTime Time that work requests required for service.

–– ServiceTimeMaxServiceTimeMax Max time that work requests required for service.

73

Enhanced Explain Plan Details

§ Description

– Enrich the content of SQL explain plans by additional information to the explain output including spool size estimates, view namesand actual column names for Hashing, Sorting or Grouping columns.

§ Benefits

– The enhancing of explain plan details facilitates explain outputreadability and understanding as well as aids in the debugging of complex queries and for identifying intermediate result spool skewing.

§ Considerations

– There is no special mechanism needed to acquire enhanced explain plan details. A simple “Explain SQL” statement will generate all the aforementioned features.

74

Stored Procedure Result Sets

§ Description

– Provide the functionality that allows Stored Procedures to build and use answer sets as a result of its execution.

§ Benefits

– Extending the Stored Procedure capability greatly simplifies application development against the Teradata database and provides a long awaited capability.

§ Considerations

– Currently, without the Stored procedure Result Set capability, temporary tables need to be created and used to store answer sets and follow the Stored Procedure CALL with a SELECT statement.

– Strong consideration should be given to removing these intermediate steps from current applications.

75

SQL Invocation via External Stored Procedures

§ Description

– Extend the current External Stored Procedure (XSP) capability to provide an interface that allows an XSP to invoke and use SQL in the current session.

§ Benefits

– This feature will foster greater application development and enhance the ability of a client application to access and use the Teradata database directly.

§ Considerations

– Initial primary development focus will be to use CLIv2 to facilitate and allow and XSP to submit SQL to the Teradata database.

76

Individual New Feature Performance12.0 compared to 6.2

5% improvement in plans generated by new OCES vs old OCES (costing parameters).

Optimizer Cost Estimation Subsystem improvements

Up to 30% on queries that the new algorithm determines to cache.

Parameterized cache request enhancement

Up to 30% on queries that can take advantage of partition elimination, e.g. multidimensional queries.

Multi Level PPI

Up to 50% on Level Two Secondary Index Check checking on large tables (> 20M rows).

CheckTable

20% time-savings improvement over non-online.

Online Archive

10% to 30% on queries that QRW can optimize.

New query rewrite

20% to 60% over MultiLoad end to end.

Bulk SQL Load

New Feature Performance Improvement

77

Enterprise Fit

• JAVA SP’s (with JDBC) (Linux and Windows)

• Cursor positioning for multi-statement requests• UNICODE support for password control and encryption• Custom password Dictionary support• New password encryption algorithm• Restore/Copy Dictionary Phase • Restore/Copy to Different Configuration Data Phase • UNIX/Kerberos Authentication for Windows Clients

Ease of Use• Additional EXPLAIN plan details• TASM enhancements:

– Query Banding– Traffic Cop Enhancements– Global/Multiple Exceptions– Provide for Open API SQL capability for TDWM– Dynamic load utility management

§ Data collection: DBQL, Resusage • Index wizard support for PPI• SQL invocation via External Stored Procedures• Stored Procedure result sets• Dynamic Result Row Specification on Table Functions• Normalized AMPusage View for coexistence

Active Enable• Online Archive• Replication Scalability • Restartable Scandisk• Bulk SQL error logging tables• Full ANSI Merge-Into SQL capability• CheckTable utility performance enhancements• Table Function without Join Back

Performance• OCES (phase 3)• Statistics enhancements:

– Increase statistics intervals– Extrapolate statistics outside range (e.g. DATE)– Collect stats for multi-column NULL values– Collect AMP Level statistics values

§ Enhanced query rewrite capability

– Projection Pushdown– Push Joins into UNION ALL Views

• Parameterized statement caching improvements § Hash bucket expansion§ Multi-level Partitioned Primary

Index (PPI)§ Windowed Aggregate Functions

Cost, Quality, & Supportability

Teradata Database 12.0 Features

• Dispatcher Fault Isolation• Compression on Soft/Batch

Referential Integrity Columns

78

Questions.....

[email protected]

TD 12 Overview

Documents

Transcript of TD 12 Overview