Post on 18-Jan-2016
description
Teradata Database 13.10 Overview
Todd Walter CTO Teradata Labs
2
Fine Print
•Nothing in this presentation constitutes a commitment to deliver any specific functionality at any specific time.
•Current planning date for 13.10 release in Q32010.
Key Features
4
What is a Temporal DatabaseDefinitions
• Temporal – the ability to store all historic states of a given set of data (a database row), and as part of the query select a point in time to reference the data. Examples: > What was this account balance (share price, inventory level, asset value,
etc) on this date? > What data went into the calculation on 12/31/05, and what adjustments
were made in 1Q06? > On this historic date, what was the service level (contract status, customer
value, insurance policy coverage) for said customer?
• Three Types of Temporal Tables> Valid Time Tables
– When a fact is true in the modeled reality– User specified times
> Transaction Time Tables– When a fact is stored in the database– System maintained time, no user control
> Bitemporal Tables– Both Transaction Time and Valid Time
• User Defined Time> User can add time period columns, and take advantage of the added
temporal operators> Database does not enforce any rules on user defined time columns
5
Temporal Query
Provide a list of members who were reported as covered on Jan. 15, 2000 in the Feb. 1, 2000 NCQA report, with names as accurate as our best data shows today.
SELECT member.member_id, member.member_nm
FROM edw.member_x_coverage
VALIDTIME AS OF DATE ‘2000-01-15’ AND TRANSACTIONTIME AS OF DATE ‘2000-01-01’ ,edw.member
WHERE member_x_coverage.member_id = member.member_id;
select member.member_id
,member.member_nm
from edw.member_x_coverage coverage
,edw.member
where coverage.member_id = member.member_id
and coverage.observation_start_dt <= '2000-02-01'
and (coverage.observation_end_dt > '2000-02-01'
or
coverage.observation_end_dt is NULL)
and coverage.effective_dt <= '2000-01-15'
and (coverage.termination_dt > '2000-01-15'
or
coverage.termination_dt is NULL)
With Temporal SupportWithout Temporal Support
6
Temporal Update – BiTemporal Table
With Temporal SupportUPDATE objectlocation
SET LOCATION = ‘External’
WHERE item_id = 125
AND item_serial_num = 102
Without Temporal SupportINSERT INTO objectlocation
SELECT item_id, item_serial_num, ‘External’, CURRENT_TIME, END(vt), CURRENT_TIME, ‘Until_Closed’
FROM objectlocation
WHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIME
AND END(vt) > CURRENT_TIME
AND END(tt) = ‘Until_Closed’;
INSERT INTO objectlocation
SELECT item_id, item_serial_num, location, BEGIN(vt), CURRENT_TIME, CURRENT_TIME, ‘Until_Closed’
FROM objectlocation
WHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIME
AND END(vt) > CURRENT_TIME
AND END(tt) = ‘Until_Closed’;
UPDATE objectlocation
SET END(tt) = CURRENT_TIME
WHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) <= CURRENT_TIME
AND END(vt) > CURRENT_TIME
AND END(tt) = ‘Until_Closed’;
INSERT INTO objectlocation
SELECT item_id, item_serial_num, ‘External’, BEGIN(vt), END(vt), CURRENT_TIME, ‘Until_Closed’
FROM objectlocation
WHERE item_id = 125 AND item_serial_num = 102
AND BEGIN(vt) > CURRENT_TIME
AND END(tt) = ‘Until_Closed’
UPDATE objectlocation
SET END(tt) = CURRENT_TIME
WHERE item_id =125 AND item_serial_num = 102
AND BEGIN(vt) > CURRENT_TIME
AND END(vt) = ‘Until_Closed’
Current valid time, current transaction time QueryJeans (125,102) are sold today (2005-08-30)
7
Moving Current Date in PPI
• Description> Support use of CURRENT_DATE and CURRENT_TIMESTAMP built-in functions
in Partitioning Expression.> Ability to reconcile the values of these built-in functions to a newer date or
timestamp using ALTER TABLE. – Optimally reconciles the rows with the newly resolved date or timestamp value. – Reconciles the PPI expression.
• Benefit> Users can define with ‘moving’ date and timestamps with ease instead of
manual redefinition of the PPI expression using constants. – Date based partitioning is typical use for PPI. If a PPI is defined with ‘moving’
current date or current timestamp, the partition that contains the recent data can be as small as possible for efficient access.
> Required for Temporal semantics feature – provides the ability to define ‘current’ and ‘history’ partitions.
8
Time Series Expansion Support
• Description> New EXPAND ON clause added to SELECT to expand row with a
period column into multiple rows– EXPAND ON clause allowed in views and derived tables
> EXPAND ON syntax supports multiple ways to expand rows
• Benefit> Permits time based analysis on period values
– Allows business questions such as ‘Get the month end average inventory cost during the last quarter of the year 2006’
– Allows OLAP analysis on period data> Allows charting of period data in an excel format> Provides infrastructure for sequenced query semantics on
Temporal tables
9
Time series Expansion support
• What will it do?> Expand a time period column and produce value equivalent rows
one each for each time granule in the period – Time granule is user specified– Permits a period representation of the row to be changed into an event
representation> Following forms of expansion provided:
– Interval expansion – By the user specified intervals such as INTERVAL ‘1’ MONTH
– Anchor point expansion– By the user specified anchored points in a time line
– Anchor period expansion– By user specified anchored time durations in a time line
10
Geospatial Enhancements
• Description> Enhancements to the Teradata 13 Geospatial offering drastically
increasing performance, adding functionality and providing integration points for partner tools
• Benefits> Increased performance by changing UDF’s to Fast Path System
functions> Replace the Shape File Generator client tool (org2org) with a
stored procedure for tighter integration with the database and tools such as ESRI ARCGIS
> Provide geodetic distance methods – SphericalBufferMBR()> WFS Server provides better tool integration support for MapInfo
and ESRI products
11
ESRI ArcGIS Connecting to Teradata via Safe Software FME
1. FME connection in ArcView2. Connect to Teradata
via TPT API3. Select Teradata tables for ArcView
analysis
12
Projection of Impact Zone & Storm Path to Google Earth
Where do I deploy my cat management team.
13
Algorithmic Compression
• Description> Provide the capability that will allow users the option of defining
compression/decompression algorithms that would be implemented as UDFs and that would be specified and applied to data at the column level in a row. Initially, Teradata will provide two compression/decompression algorithms; one set for UNICODE columns and another set for LATIN columns.
• Benefit> Data compression is the process by which data is encoded so that
it consumes less physical storage space. This capability reduces both the overall storage capacity needs and the number of physical disk I/Os required for a given operation. Additionally, because less physical data is being operated on there is the potential to improve query response time as well.
• Considerations> At some point, compressed data will have to be decompressed
when required. This can cause the use of some extra CPU cycles but in general, the advantages of compression outweigh the extra cost of decompression.
14
Multi-Value Compression For Varchar Columns
• Example – Multi-Value Compression for Varchar Column:
CREATE TABLE Customer
(Customer_Account_Number INTEGER
,Customer_Name VARCHAR(150)
COMPRESS (‘Rich’,‘Todd’)
,Customer_Address CHAR(200));
15
Block Level Compression
• Description> Feature provides the capability to perform compression on whole
data blocks at the file system level before the data blocks are actually written to storage.
• Benefit> Block level compression yields benefit by reducing the actual
storage required for storing the data, especially cool/cold data, and significantly reduce the I/O required to read the data.
• Considerations> There is a CPU cost to perform the act of compression or
decompression on whole data blocks and is generally considered a good trade since CPU cost is decreasing while I/O cost remains high.
16
User-Defined SQL Operators
• Description> This feature provides the capability that will allow users to define
and encapsulate complex SQL expressions into a User Defined Function (UDF) database object.
• Benefits> The use of the SQL UDFs Feature allows users to define their own
functions written using SQL expressions. Previously, the desired SQL expression would have to be written into the query for each use or alternatively, an external UDF could have been written in another programming language to provide the same capability.
> Additionally, SQL UDFs allow one to define functions available in other databases and with alternative syntax (e.g. ANSI).
• Considerations> The Teradata SQL UDF feature is a subset of the SQL function
feature described in the ANSI SQL:2003 standard.> Additionally, this feature does not introduce any changes to the
definition of the Dictionary Tables per se, but will add additional rows into the DBC.TVM and DBC.UDFInfo tables to indicate the presence of a SQL UDF.
17
SQL UDF - Example
• The “Months_Between” Function:
CREATE FUNCTION Months_Between (Date1 DATE, Date2 DATE) RETURNS Interval Month (4) LANGUAGE SQL DETERMINISTIC CONTAINS SQL PARAMETER STYLE SQL RETURN(CAST(Date1 AS DATE)- CAST(Date2 AS DATE)) MONTH (4); SELECT MONTHS_BETWEEN ('2008-01-01', '2007-01-01');
MONTHS_BETWEEN ('2008-01-01', '2007-01-01')
---------------------------------------------------
12
Performance
19
Character-Based PPI (CPPI)
• Description> This feature leverages current Teradata Primary Partitioned Index
(PPI) technology and extends this capability to allow the use of character data (CHAR, VARCHAR, GRAPHIC, VARGRAPHIC) as table partitioning mechanisms.
• Benefit> Currently, only an integer datatype is allowed to be used in a PPI
scheme as a partitioning mechanism which facilitates superior query performance advantage via partition elimination. The extension of this capability to use character-based datatypes as a partitioning mechanism will allow for more partitioning options and in-turn yield similar query performance advantage as the current PPI technology gleans today.
• Considerations> As with all Teradata indexes or partitioning database design
choices, the Optimizer will determine the appropriate index/PPI to use that will provide the best-cost plan for executing the query. No end-user query modification is required.
20
Timestamp Partitioning
• Description> Provide the capability that allows users to explicitly specify a time
zone for PPI tables involving DateTime partitioning expressions in order to make the expressions deterministic (e.g., not dependent on the session time zone).
> Implement the enhancements that will extend the PPI partition elimination capability to include timestamp data types in partitioning expressions.
• Benefit> Insuring that DateTime partitioning expressions to be
deterministic will eliminate the possibility of any errors that may occur as a result of incorrect dependence on session time zones.
> The extension of this capability to use timestamp data types as a partitioning mechanism will allow for more partitioning options and in-turn yield similar query performance advantage as the current PPI technology gleans today.
• Considerations> Enhancements related to deterministic time zone handling will
also be applied to sparse join index search conditions as well.
21
Fastpath Functions
• Description> The Fastpath Function project combines the extensibility, short
development cycles, and ease-of-use aspects of UDFs with the high performance and ease-of-use aspects of Teradata system functions to yield and alternate development path by which Teradata Engineering software developers may add new Teradata system functions to the Teradata server.
• Benefit> The Fastpath Function project will allow Teradata to use a shorter
development cycle to fulfill many customer specific requests for implementing new system functions that additionally perform in the same manner as native Teradata system functions.
• Considerations> Source code and/or libraries used in the development of Teradata
system functions must be solely managed and maintained by Teradata Engineering. End-users will not be able to develop Fastpath system functions.
22
FastExport – Without Spooling
• Description> Enhance the FastExport utility to provide an option that would
allow the utility to execute in a mode that eliminates the requirement that the query data be spooled prior to the actual export process.
• Benefit> The “direct without spooling” method will provide the mechanism
to extract data from Teradata table quickly and efficiently with the main benefit being realized as a performance gain and minimum resource utilization.
• Considerations> The “direct without spooling” method is not transparent to the
user and must be specified as a discrete option when executing the FastExport utility. It is a user decision to choose between using either the “spool” or “no spool” method.
Teradata Workload Management
24
TASM: Additional Workload Definitions
• Description> Feature increases the number of available TASM Workload
Definitions (WDs) to 250 (instead of 40).
• Benefits> Complex mixed workloads require the ability to have a finer
degree of granular control over the parts of the workload. Increasing the number of WDs will allow customers to better manage and report on resource usage of their system to meet either subject area (e.g. by country, application or division) resource distribution requirements, or category-of-work (e.g. high vs. low priority) resource distribution requirements.
• Considerations> Administrators should be aware that when defining a large number
of workloads which will run concurrently, it will become difficult to create significant differentiation among the different workloads when the resource division granularity itself gets very small.
25
TASM: Common Classifications
• Description> This feature provides for capability to have Workload Definition
classification criteria be available for Teradata Workload Management Category 1, 2 and 3 (Filters, System Throttles and Workload Definitions) and additionally, extends wildcard support to Filters and Throttles.
• Benefit> The implementation of Common Classifications addresses the
differences and delivers consistency between the TDWM categories (Filters, System Throttles and Workload Definitions), which improves the Teradata Workload Management user interface and it’s subsequent usability.
• Considerations > Consideration should be given to re-evaluating the current settings
for the different categories insofar as common classification extends the ability to manage a workload in an easier and simpler fashion.
26
TASM: Common Classifications
• “Who” Criteria> Account String / Account Name > Teradata Username / Teradata Profile> Application Name> Client Address or Client Name> QueryBand
• “Where” Criteria (Data Objects)> Databases> Tables / Views / Macros> Stored Procedures
• “What” Criteria > Statement Type (SELECT, DDL, DML)> Utility Type > AMP Limits, Row Count, Final Row Count> Estimated Processing (CPU time)> Join Types
– ALL or no joins– ALL or no product joins– ALL or no unconstrained product joins
27
TASM Utility Management
• Description> This feature enhances the TASM utility to augment the existing TD
Utility Management capability to provide controls to be similar to the workload management of regular SQL requests and to provide for the automatic selection of the number of sessions used by Teradata utilities.
• Benefits> Feature provides for more granular and centralized control of utility
execution and allows deployment to a much wider audience of users and applications. Additionally, the use of Teradata utility sessions is moved inside the database and is automated to eliminate the detailed management of sessions in each job.
• Considerations> Consideration should be given to a reevaluation of current rule sets
and settings to maximize control of the workload and relative utility execution.
> Throttling in TASM eliminates need for Tenacity and Sleep. Execution of queued jobs becomes FIFO. Execution of queued jobs is immediate when resource available rather than at end of Sleep time”
28
TASM Utility Session Configuration Rules
• For FastLoad, MultiLoad, and FastExport utilities, the DBS default for number of AMP sessions is one per AMP.
• On a large system with hundreds or thousands of AMPs, this default becomes inappropriate.
• Currently, a user can override this default by changing individual load/export script, or changing the MAXSESS parameter in the configuration file, or specifying through runtime parameters (i.e., MAXSESS or –M).
• These overriding methods are inconvenient.
• This feature allows a DBA to define TDWM rules in one central place that specifies the number of AMP sessions to be used based a combination of the following criteria:> Utility Name> “Who” criteria (user, account, client address, query band, etc.)> Data size
29
TASM Utility Session Configuration Rules
• Session configuration rules are optional.
• These rules are active when any category of TDWM is enabled.
• In each session configuration rule, the DBA specifies the criteria and the number of sessions to be used when these criteria are met.
• For example, for stand alone MultiLoad jobs submitted by user Charucki, use 10 sessions.
• Session configuration rules also support the Archive/Restore utility.
• The DBA can define similar rules to specify the number of HUTPARSE sessions to be used for a specific set of criteria.
• A new internal DBSControl field: DisableTDWMSessionRules is provided to disable user-defined session configuration rules and default sessions rules while TDWM is enabled.
• When this field is set, Client and DBS will operate as in Teradata 13.
Availability, Serviceability, DBA Tasks Improvements
31
Fault Isolation
• Description> Remove cases where faults can cause restarts> Specific cases
– EVL fault isolation– Unprotected UDFs– Dictionary cache re-initialization
• Benefits> Identify and isolate the fault to only the query or session> Issues in query calculation and qualification will be isolated> Badly behaving UDFs will have less opportunity to affect the
system> Faults in the dictionary cache will result in the dictionary cache
being flushed and reloaded rather than affecting the entire system
32
AMP Fault Isolation
• Description > This feature is intended to catch those AMP errors that currently
cause DBS restarts where the error can be dealt with by taking a snapshot dump and aborting the transaction that caused the error
• Benefit> This feature can reduce the number of DBS restarts for customers,
thus improving overall system availability
• What will it do?> Current AMP Fault Isolation only avoids a full database restart for
errors when accessing spool tables> The scope of fault isolation will be increased to cover ERRAMP* or
ERRFIL* errors on permanent tables as well spools> Retrofitted to current supported releases
33
Read From Fallback
• Description> In the event of encountering a data block read error, either
unreadable or corrupt data blocks, this feature will leverage the pre-existing Fallback Table facility to transparently retrieve the required data block from the fallback copy.
• Benefit> When fallback is available, feature seriously improves fault
tolerance and system availability. Significantly improves the value of having fallback and protects non-redundant (RAID 0 or JBOD) storage media, such as SSD, from data loss without restart/failover.
• Considerations> Fallback does not need to be instantiated as system-wide property,
because fallback is a table-level attribute, it can be applied selectively to the largest/most critical customer tables.
> This facility does not in-and-of itself repair bad data blocks, but allows them to be read from fallback until they can be repaired.
34
Read From Fallback - Particulars
• Reading data blocks from the Fallback copy is transparent to both a user and/or application. Manual intervention is not required whatsoever.
• Feature does not require any special or particular locking mechanism.
• A manual process is still required to rebuild the table to repair unreadable or corrupt data blocks.
• Facility cannot recover from data block errors in the Cylinder Index, NUSI Secondary Index or Permanent Journals.
• Read errors are fallback recoverable on TD Data Dictionary tables with the exception of the unhashed system tables such as the WAL log, Transient Journal and Space Accounting tables.
• Facility applies to SQL Queries with data block read errors, SQL Insert…Select statements and the Archive utility where the block read error is on the source table only.
35
Transparent Cylinder Packing
• Description> Develop a new file system background task that will pro-actively
and transparently monitor and adjust the utilization (high or low) of user data cylinders and pack/unpack said cylinders accordingly with the goal of returning them to a more efficiently utilized state.
• Benefit1. Cylinder Packing will result in cylinders having a higher datablock
to cylinder index ratio making Cylinder Read operations more effective by reading less unoccupied sectors.
2. Higher cylinder utilization translates into data tables occupying less cylinders leaving more cylinders available for other purposes.
3. Diminishes the chances that a “mini-cylpack” operation will be executed and lessens the need for administrators to perform regularly scheduled Packdisk operations.
• Considerations> This feature will have several customer tunable parameters in
DBSControl that will allow customers to mange and adjust the level of impact of the Transparent Cylinder Packing operations.
36
Merge Data Blocks During Full Table Modify Operations
• Description> During full table modification operations such as Multiload, Insert
Select and Update or Delete Where, combine adjacent blocks when small blocks are present.
• Benefit> Small data blocks increase the I/Os necessary to read a table and
interferes with features such as compression and large cylinders.> Reduce the instances of small data blocks by combining them
when doing work on those blocks or adjacent ones.
37
Archive DBQL Rule Table
• Description> Enhance the Teradata Archive utility to include two additional DBC
tables to the DBC database (Dictionary) backup/restore:– DBC.DBQLRuleTbl– DBC.DBQLRuleCountTbl
• Benefit> Inclusion of the additional DBC tables in the DBC Archive/Restore
process will provide a mechanism by which these tables can be archived/restored and will altogether eliminate the cumbersome task of having to every time redefine the appropriate DBQL rules after a Dictionary initialization.
> Implementation of this feature avoids the possibility of any table synchronicity issues and offers simplicity, convenience, and integrity when conducting a DBC archive/restore.
• Considerations> DBC Archive will include these tables automatically in the
Dictionary Archive; no user intervention is required.
Be AwareEspecially if Considering Tech Refresh
39
Large Cylinder Support
• Description> This feature increases data storage cylinder size, the basic
allocation unit for disk space in the Teradata file system. This also includes an increase in the Cylinder Index size thus allowing for a commensurate increase in storing more data blocks per cylinder.
• Benefit> Eliminates the inefficiency associated with managing a large
number of small cylinders on very large disk drives, allows larger AMP sizes (~10 TB per AMP), permits the more efficient storage of Large Objects and provides the foundation for block level compression by allowing more small blocks on a cylinder.
• Consideration> This capability is only available starting in Teradata 13.10 and
going forward and requires a System Initialization (SysInit) to be performed so that large cylinder support can be engaged. It is anticipated that typically this activity would be performed during technology refresh opportunities.
40
Packed Row format for 64-bit platforms
• Description> With the introduction of Teradata 13.10, data will now be stored
on the database in byte-packed format whereas previously the data had been stored in byte-aligned format.
• Benefits> Translates directly into a 4-7 % disk space savings insofar as less
disk space is required to store byte-packed data than is required with byte-aligned data. Additionally, enables data rows to be accessed using fewer I/Os thus potentially enhancing the performance of some workloads.
• Considerations> This capability is only available starting in Teradata 13.10 and
going forward and requires a System Initialization (SysInit) to be performed so that packed row format support can be engaged. It is anticipated that typically this activity would be performed during technology refresh opportunities.
41
Enhanced Teradata Hashing Algorithm
• Description> Enhance the Teradata Hashing Algorithm to reduce the effects of
irregularities in character data on hash results.
• Benefit> This enhancement is targeted to reduce the number of hash
collisions for character data stored as either Latin or Unicode, notably strings that contain primarily numeric data. Reduction in hash collisions reduces access time per AMP and produces a more balanced row distribution which in-turn improves parallelism. Reduced access time and increased parallelism translate directly to better performance.
• Considerations> This capability is only available starting in Teradata 13.10 and
going forward and requires a System Initialization (SysInit) to be performed so that the enhanced hashing algorithm can be engaged. It is anticipated that typically this activity would be performed during technology refresh opportunities.
42 >
Teradata Database 13.10
Qu
ality
/
Su
pp
or
t-ab
ility
• AMP fault isolation• Parser diagnostic information capture
• Dictionary cache re-initialization • EVL fault isolation and unprotected UDFs
Perfo
rman
ce
• FastExport without spooling • Character-based PPI• Timestamp partition elimination• User Defined Ordered Analytics
• Merge data blocks during full table modify operations
• Statement independence• TVS Initial suggested temperature tables
Activ
e
En
ab
le
• Restart time reduction• Read from Fallback• TASM: Workload Designer
• TASM: Utilities Management• TASM: Additional Workload Definitions
Ease o
f U
se
• Teradata 13.10 Teradata Express Edition• Domain Specific System Functions
• Moving current date in PPI• Automatic cylinder packing
En
terp
rise F
it
• Algorithmic Compression for Character Data• VLC for VARCHAR columns• Block level compression • Variable fetch size (JDBC)• User Defined SQL Operators • Temporal Processing
• Temporal table support• Period data type enhancements• Replication support• Time series Expansion support
• Archive DBQL rule table• Enhanced trusted session security • External Directory support enhancements• Geospatial enhancements• Statement Info Parcel Enhancements (JDBC)• Support for IPv6• Support unaligned row format for 64-bit platforms • Enhanced hashing algorithm • Large cylinder support
3/18/10
43 >
Teradata Developer Exchangehttp://developer.teradata.com/
• What is it?> Portal for technical insights
– Articles, blogs, podcasts – Forums, FAQs, “How to”, etc.
> Community of Teradata experts– Customers, Teradata R&D
and PS
> Share software– Portlets, UDFs, SPs, scripts,
etc.– Sample applications
• Who can use it? > Anyone (read only)> Registered contributors
– Blogs, code, ratings, articles, etc.