Teradata SQL

42
TERADATA SQL

Transcript of Teradata SQL

TERADATA SQL

Teradata RDBMS Architecture

Objective of this training is to understand Teradata SQL

features. This module is structured as: Teradata Objects Teradata SQL Indexes ANSI vs. Teradata mode Data Types DDL DML SELECT statement HELP, SHOW, EXPLAIN Data Conversions Aggregation Subquery Processing Join Processing Date and Time Processing Character String Processing OLAP Functions SET Operators Data Manipulation Data Interrogation View Processing Macro Processing Reporting Totals and Subtotals Data Definition Language Temporary Tables Trigger Processing

Teradata Objects

There are five fundamental objects which may be found in a Teradata database.

Tables - rows and columns of data

Views - predefined subsets of existing tables

Macros - predefined, stored SQL statements

Triggers - SQL statements associated with a table

Stored Procedure - program stored within TD

These objects are created, maintained and deleted using Structured Query Language (SQL). Object definitions are stored in the Data DictionaryDirectory (DD/D). DEFINITIONS OF ALL

DATABASE OBJECTS

DD/D

TABLE 1 TABLE 2 TABLE 3

VIEW 1 VIEW 2 VIEW 3

MACRO 1 MACRO 2 MACRO 3

DATABASE or USER

TRIGGER 1 TRIGGER 2 TRIGGER 3

Stored Procedure 1

Stored Procedure 2

Stored Procedure 3

Databases

A Teradata database is a defined logical repository for tables, views, macros, SPs.

A database is empty until objects are created within it.

Teradata has the concept of parent and child databases.

A database has one and only one creator. The owner can be different from the creator if the database is ‘given’ to another user.

Users

•A Teradata user is a database with an assigned password.

•A user may logon to Teradata and access objects–within itself–other databases for which it has access rights.

•A user is an active repository while a database is a passive repository.

•A user is empty until objects are created within it.

The Data Dictionary Directory (DD/D)

The DD/D

- is an integrated set of system tables

- contain definitions of and information about all objects in the system

- is entirely maintained by the RDBMS

- is “data about the data” or “metadata”

- is distributed across all AMPs like all tables

- may be queried by administrators or support staff

- is accessed via Teradata supplied views

Examples of views:

DBC.Tables - info about all tables

DBC.Users - info about all users

DBC.AllRights - info about access rights

DBC.AllSpace - info about space utilization

Structured Query Language (SQL)

SQL is a query language for Relational Database Systems.- A fourth-generation language- A set-oriented language- A non-procedural language

(e.g, doesn’t have IF, GO TO, DO, FOR NEXT, or PERFORM statements)SQL consists of:

Data Definition Language (DDL)- Defines database structures (tables, users, views, macros, and triggers)

CREATE DROPALTER

Data Manipulation Language (DML)- Manipulates rows and data values

SELECT INSERTUPDATE DELETE

Data Control Language (DCL)- Grants and revokes access rights

GRANTREVOKE

Teradata SQL also includes Teradata Extensions to SQLHELP SHOWEXPLAIN CREATE MACROREPLACE MACRO

Indexes

An index is a mechanism that can be used by the SQL query optimizer to maketable access more performant.Teradata provides four different index types.

Primary indexAll Teradata tables require a primary index because the system distributes table rows to the AMPs based on their primary index values. Primary indexes types are:

• Unique primary index (UPI)• Nonunique primary index (NUPI)• Nonpartitioned primary index (NPPI)• Partitioned primary index (PPI)

Secondary index•Unique secondary index (USI)• Nonunique secondary index (NUSI)

Join index• Multitable join index• Single-table join index

Hash index

Primary Index

Controls data distribution and retrieval using the Teradata hashing algorithm.

Defined with the CREATE TABLE data definition statement.

If no explicit primary index is defined, then CREATE TABLE assigns oneautomatically.

Can be unique or non-unique and partitioned or non-partitioned.

If the primary index is not defined explicitly as unique, then the definitiondefaults to non-unique.

Can be composed of as many as 64 columns.

Can be generated automatically if defined on an identity column.

A minimum of one and a maximum of one must be defined per table.

Improves performance when used correctly in the WHERE clause of an SQL data manipulation statement to perform the following actions

• Single-AMP retrievals• Joins between tables with identical primary indexes, the optimalscenario.

Secondary Index

Can enhance the speed of data retrieval.

Can be Unique (USI) or non-unique (NUSI).

NUSIs can be hash-ordered or value-ordered.

Do not affect base table data distribution.

Maximum of 32 secondary and join indexes defined per table.

Can be composed of as many as 64 concatenated columns.

Can be created or dropped dynamically as data usage changes or if they arefound not to be useful for optimizing data retrieval performance.

Require additional disk space to store subtables.

Require additional I/Os on INSERTs, DELETEs, and possibly on UPDATEs.

Should not be defined on columns whose values change frequently.

Composite secondary index is useful if it reduces the number of rows thatmust be accessed.

Join Index

Join indexes are file structures designed to permit queries (join queries in thecase of multitable join indexes) to be resolved by accessing the index instead of having to access and join their underlying base tables.

Joins multiple tables (optionally with aggregation) in a prejoin table.

Replicates all, or a vertical subset, of a single base table and partitions itsrows using a different primary index than the base table, such as a foreignkey column to facilitate joins of very large tables by hashing them to thesame AMP.

Aggregates one or more columns of a single table as a summary table.

Join indexes are useful for queries where the index table contains all the columns referenced by

one or more joins, thereby allowing the Optimizer to cover all or part of the query by planning to access the index rather than its underlying base tables.

queries that aggregate columns from tables with large cardinalities.

Hash Index

Hash indexes are file structures that share properties with both single-table join indexes and secondary indexes.

Hash indexes are not indexes in the usual sense of the word. They are base tables that cannot be accessed directly by a query.

A hash index always has at least one of the following functions. Replicates all, or a vertical subset, of a single base table and partitions

its rows with a user-specified partition key column set, such as a foreign key column to facilitate joins of very large tables by hashing them to the same AMP.

Provides an access path to base table rows to complete partial covers.

Hash indexes are useful for queries where the index table contains the columns referenced by a query, thereby allowing the Optimizer to cover it by planning to access the index rather than its underlying base table.

ANSI Vs TERADATA MODE

Teradata RDBMS has the ability to execute all SQL in either Teradata mode or in ANSI mode.

Teradata mode:

All SQL commands are implicitly a complete transaction. Therefore, once a change is made, it is committed and becomes permanent. It contains an implied COMMIT or an explicit END TRANSACTION (ET).

ANSI mode:

All SQL commands are considered to be part of the same logical transaction. A transaction is not complete until an explicit COMMIT is executed.

Data Types

Conforming to ANSI

INTEGER

SMALLINT

DECIMAL(X,Y)

NUMERIC(X,Y)

FLOAT

REAL

PRECISION

DOUBLE PRECISION

CHARACTER(X), CHAR(X)

VARCHAR(X),

CHARACTER VARYING(X)

CHAR VARYING(X)

DATE

TIME

TIMESTAMP

Specific to teradata

BYTEINT

BYTE(X)

VARBYTE(X)

LONG VARCHAR

GRAPHIC(X)

VARGRAPHIC(X)

Two categories of data types are supported by Teradata.

Data Definition Language (DDL)CREATE TABLE employee

,FALLBACK,NO BEFORE JOURNAL,NO AFTER JOURNAL,FREESPACE = 30,DATABLOCKSIZE = 10000 BYTES(employee_number INTEGER NOT NULL,dept_number SMALLINT,job_code INTEGER COMPRESS,last_name CHAR(20) NOT NULL,first_name VARCHAR (20),street_address VARCHAR (30) TITLE 'Address',city CHAR (15) DEFAULT ‘Atlanta’

COMPRESS ‘Atlanta’,state CHAR (2) WITH DEFAULT,birthdate DATE FORMAT 'mm/dd/yyyy',salary_amount DECIMAL (10,2),sex CHAR (1) UPPERCASE)UNIQUE PRIMARY INDEX (employee_number),INDEX (dept_number);

CREATE INDEX (job_code) ON employee;

DROP INDEX (job_code);

DROP TABLE employee;

Data Manipulation Language (DML)

The SELECT statement is used to retrieve data from tables.

Who was hired on October 15, 1986?

1006 1019 301 312101 Stein John 861015 631015 39450001008 1019 301 312102 Kanieski Carol 870201 680517 39250001005 0801 403 431100 Ryan Loretta 861015 650910 41200001004 1003 401 412101 Johnson Darlene 861015 560423 46300001007 1005 403 432101 Villegas Arnando 870102 470131 59700001003 0801 401 411100 Trader James 860731 570619 4785000

EMPLOYEE (partial listing)MANAGER

EMPLOYEE EMPLOYEE DEPT JOB LAST FIRST HIRE BIRTH SALARYNUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT

PK FK FK FK

LASTNAMEStein RyanJohnson

FIRSTNAMEJohnLorettaDarlene

Answer

SELECT Last_Name ,First_Name

FROM EmployeeWHERE Hire_Date = 861015;

SELECT statement

Basic SELECT command:

SELECT * FROM Student_Table ;

Compound Comparisons:

SELECT *FROM Student_TableWHERE Grade_Pt = 3.0 OR Grade_Pt = 4.0 AND Class_Code = 'FR' ;

Using NOT in sql comparisons:

SELECT Last_Name ,First_Name ,Class_Code ,Grade_Pt FROM Student_Table WHERE NOT ( Grade_Pt >= 3.0 AND Grade_Pt IS NOT NULL AND Class_Code <> 'SR' AND Class_Code IS NOT NULL )

Multiple Value search (IN):SELECT Last_Name ,Class_Code ,Grade_PtFROM Student_TableWHERE Grade_Pt IN ( 2.0, 3.0, 4.0 ) ;

Using Quantifiers vs INSELECT Last_Name ,Class_Code ,Grade_PtFROM Student_TableWHERE Grade_Pt = ANY ( 2.0, 3.0, 4.0 ) ;

SELECT statement contd....

Multiple Value Rage Search(BETWEEN):SELECT Grade_PtFROM Student_TableWHERE Grade_Pt BETWEEN 2.0 and 4.0 ;

Character String Search(LIKE):

SELECT *FROM Student_TableWHERE Last_Name LIKE ('_a%' ) ;

Derived Columns:SELECT salary (format 'ZZZ,ZZ9.99') ,salary/12 (format 'Z,ZZ9.99')FROM Pay_Table ;

Order By:SELECT *FROM Student_TableWHERE Grade_Pt > 3ORDER BY Grade_Pt DESC;

Distinct Function:SELECT DISTINCT Class_codeFROM student_tableORDER BY class_code;

SELECT statement contd....

Creating a Column Alias Name:ASSELECT salary AS annual_salary ,salary/12 AS Monthly_salaryFROM Pay_Table ;

NAMED

SELECT salary (NAMED Annual_salary) ,salary/12 (NAMED Monthly_salary)FROM Pay_Table ;

Naming Conventions

When creating an alias only valid Teradata naming characters are allowed. The alias becomes the name of the column for the life of the SQL statement. The only difference is that it is not stored in the Data Dictionary.

Breaking Conventions:

When it is necessary or desirable to use non-standard characters in a name, double quotes (") are used around the name. This technique tells the PE that the word is not a reserved word and makes it a valid name. This is the only place that Teradata uses a double quote instead of a single quote (‘).

SELECT salary "Annual salary" ,salary/12 "Monthly_salary"FROM Pay_TableORDER BY "Annual Salary" ;

Databases and Users:

HELP DATABASE customer_service ;

HELP USER Dave_Jones ;

Tables, Views, and Macros:

HELP TABLE employee ;

HELP VIEW emp;

HELP MACRO payroll_3;

HELP COLUMN employee.*;

employee.last_name;

emp.* ;

emp.last;

HELP INDEX employee;

HELP STATISTICS employee;

HELP CONSTRAINT employee.over_21;

HELP Commands

HELP DATABASE customer_service;

*** Help information returned. 10 rows.

*** Total elapsed time was 1 second.

Table/View/Macro name Kind Comment

contact T ?

customer T ?

department T ?

employee T ?

employee_phone T ?

job T ?

location T ?

location_employee T ?

location_phone T ?

Example of HELP DATABASE

SHOW commands display how an object was created.

Command Returns

SHOW TABLE tablename; CREATE TABLE statement

SHOW VIEW viewname; CREATE VIEW statement

SHOW MACRO macroname; CREATE MACRO statement

SHOW TABLE employee;

CREATE SET TABLE CUSTOMER_SERVICE.employee ,FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER, last_name CHAR(20) NOT CASESPECIFIC NOT NULL, first_name VARCHAR(30) NOT CASESPECIFIC NOT NULL, hire_date DATE NOT NULL, birthdate DATE NOT NULL, salary_amount DECIMAL(10,2) NOT NULL)UNIQUE PRIMARY INDEX ( employee_number );

SHOW Command

The EXPLAIN Facility

The EXPLAIN modifier in front of any SQL statement generates an English translation of the Parser’s plan.

The request is fully parsed, and optimized but not actually executed.

EXPLAIN returns:

- Text showing how a statement will be processed (a plan)

- An estimate of how many rows will be involved

- A relative cost of the request (in units of time)

This information is useful for:

- predicting row counts

- predicting performance

- testing queries before production

- analyzing various approaches to a problem EXPLAIN

EXPLAIN SELECT last_name, department_number FROM employee;

Explanation (partial):

3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.

Data Conversion

CASTData can be converted from one type to another by using the CAST function.

SELECT CAST('ABCDE' AS CHAR(1)) AS Trunc ,CAST(128 AS CHAR(3)) AS OK ,CAST(127 AS INTEGER ) AS Bigger ,CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ;

Trunc OK. Bigger Whole Rounder

A 128 127 121 122

Implied CAST

Prior to CAST, conversion was requested by placing the "implied' data type conversion in

parentheses after the column name.

SELECT 'ABCDE' (CHAR(1)) AS Shortened ,128 (CHAR(3)) AS OK ,-128 (CHAR(3)) AS N_OK ,128 (INTEGER) AS Bigger ,121.13 (SMALLINT) AS Whole ;

Shortened OK_ N_OK_ Bigger _ Whole

A - 128 121

Subquery Processing

Using IN

SELECT Order_number ,Order_totalFROM Order_TableWHERE Customer_number IN ( SELECT Customer_number FROM Customer_table WHERE Customer_name LIKE 'Bill%');

Using NOT IN

SELECT Customer_name ,Phone_number FROM Customer_Table WHERE Customer_number NOT IN ( SELECT Customer_number FROM Order_table) ;

Using ANY

SELECT Customer_name ,Phone_numberFROM Customer_TableWHERE customer_number = ANY (SELECT customer_number FROM Order_Table WHERE Order_total > ( SELECT AVG(Order_total) FROM Order_Table ) );

Using EXISTSSELECT Customer_nameFROM Customer_table AS CUSTWHERE EXISTS ( SELECT * FROM Order_table AS OT WHERE CUST.Customer_number = OT.Customer_number ) ;

Join Processing

A join is the combination of two or more tables in the same FROM of a single

SELECT statement.

Different types of Joins provided are:

Inner Join

Outer Join

Left Outer Join

Right Outer Join

Full Outer Join

Cross Join

Self Join

Date and Time Processing

DATE, TIME and TIMESTAMP are valid Teradata data types.

The Teradata RDBMS stores the date in YYYMMDD format on disk. For January 1, 1999

Teradata stores 0990101 on the disk.

The following calculation demonstrates how Teradata converts a date to the

YYMMDD date format, for storage of January 1, 1999:

The stored data for the date January 1, 1999 is converted to:

Date and Time Processing contd...

INTEGERDATE in the form of YY/M/DD is the default display format for most Teradata database client utilities

Output date format can be changed by using DATEFORMAT System Level Definition

MODIFY GENERAL 14 = 0 /* INTEGERDATE (YY/MM/DD) */ MODIFY GENERAL 14 = 1 /* ANSIDATE (YYYY-MM-DD) */

User Level DefinitionCREATE USER username ....... • • • DATEFORM={INTEGERDATE | ANSIDATE} ;

Session Level DeclarationSET SESSION DATEFORM = {ANSIDATE | INTEGERDATE} ;

Since Teradata stores the date as an INTEGER, it allows simple and complex mathematics to calculate new dates from dates

Other functions provided are ADD MONTHS, EXTRACT, OVERLAPS etc.

Character String Processing

Teradata provides character string processing functions like:

CHARACTERS - used to count the number of characters stored in a data column.

TRIM - used to eliminate space characters from fixed length data values.

SUBSTRING - used to retrieve a portion of the data stored in a column.

SUBSTR - the original Teradata substring operation.

POSITION - used to return a number that represents the starting location of a specified character string with character data.

INDEX - used to return a number that represents the starting position of a specified character string with character data.

ANSI mode is case sensitive and Teradata mode is not. Therefore, the output from most of the string processing functions will differ accordingly.

OLAP Functions

Powerful OLAP (On-Line Analytical Processing) functions provide data mining

capabilities to discover a wealth of knowledge from the data.

OLAP functions combined with standard SQL within the data warehouse,

provide the ability to analyze large amounts of historical, business

transactions from the past through the present

Like traditional aggregates, OLAP functions operate on groups of rows and

permit qualification and filtering of the group result.

Unlike aggregates, OLAP functions also return the individual row detail data

and not just the final aggregated value.

OLAP Functions provided by Teradata are:

OLAP Functions contd....

SET Operators

The Teradata database provides the following SET operators:

INTERSECT - used to match or join the common domain values from two or more sets.

UNION - used to merge the rows from two or more sets. The join performed for a UNION is more similar to an OUTER JOIN.

EXCEPT - used to eliminate common domain values from the answer set by throwing away the matching values. This is the primary SET operator that provides a capability not available using either an INNER or OUTER JOIN

MINUS - is exactly the same as the EXCEPT. It was the original SET operator in Teradata before EXCEPT became the standard

Data Interrogation Functions

Data Interrogation functions test the data values after a row passes the

WHERE test and is read from the disk. These functions not only allow the data

to be tested, but also allow for additional logic to be incorporated into the

SQL. Functions provided by teradata are:

NULLIFZERO - compares the data value in a column for a zero and when found, converts the

zero, for the life of the SQL statement, to a NULL value.

NULLIF - only converts a zero to a NULL. It can convert anything to a NULL.

ZEROIFNULL - compares the data value in a column and when it contains a NULL,

transforms it, for the life of the SQL statement, to a zero.

COALESCE - searches a value list, ranging from one to many values, and returns the first

Non-NULL value it finds. At the same time, it returns a NULL if all values in the list are NULL.

CASE - provides an additional test that allows for multiple comparisons on multiple columns

with multiple outcomes. It also incorporates logic to handle a situation in which none of the

values compares equal.

View Processing

Views are pre-defined subsets of existing tables consisting of specified columns and/or rows from the table(s).

A single table view:

- is a window into an underlying table

- allows users to read and update a subset of the underlying table

- has no data of its own

MANAGEREMPLOYEE EMPLOYEE DEPT JOB LAST FIRST HIRE BIRTH SALARYNUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT

1006 1019 301 312101 Stein John 861015 631015 39450001008 1019 301 312102 Kanieski Carol 870201 680517 39250001005 0801 403 431100 Ryan Loretta 861015 650910 41200001004 1003 401 412101 Johnson Darlene 861015 560423 46300001007 1005 403 432101 Villegas Arnando 870102 470131 59700001003 0801 401 411100 Trader James 860731 570619 4785000

EMPLOYEE (Table)

PK FK FK FK

EMP NO DEPT NO LAST NAME FIRST NAME HIRE DATE

1005 403 Villegas Arnando 870102 801 403 Ryan Loretta 861015

Emp_403 (View)

Multi-Table ViewsA multi-table view allows users to access data from multiple tables as if it were in a single table. Multi-table views are also called join views. Join views are used for reading only, not updating.

MANAGERDEPT DEPARTMENT BUDGET EMPLOYEENUMBER NAME AMOUNT NUMBER

501 marketing sales 80050000 1017301 research and development 46560000 1019302 product planning 22600000 1016403 education 93200000 1005402 software support 30800000 1011401 customer support 98230000 1003201 technical operations 29380000 1025

PK FK

DEPARTMENT (Table)

EMPLOYEE (Table)

1006 1019 301 312101 Stein John 861015 631015 39450001008 1019 301 312102 Kanieski Carol 870201 680517 39250001005 0801 403 431100 Ryan Loretta 861015 650910 41200001004 1003 401 412101 Johnson Darlene 861015 560423 46300001007 1005 403 432101 Villegas Arnando 870102 470131 59700001003 0801 401 411100 Trader James 860731 570619 4785000

MANAGEREMPLOYEE EMPLOYEE DEPT JOB LAST FIRST HIRE BIRTH SALARYNUMBER NUMBER NUMBER CODE NAME NAME DATE DATE AMOUNT

PK FK FK FK

LAST DEPARTMENT NAME NAME

Stein research & developmentKanieski research & developmentRyan educationJohnson customer supportVillegas educationTrader customer support

EmpDept (View)

MACRO Processing

Macros are SQL statements stored as an object in the Data Dictionary (DD).

Unlike a view, a macro can store one or multiple SQL statements.

Additionally, the SQL is not restricted to only SELECT operations. INSERT, UPDATE, and DELETE commands are valid within a macro.

When using BTEQ, conditional logic and BTEQ commands may also be incorporated into the macro.

You can only have one DDL statement within a macro.

If a macro contains DDL, it must be the last statement in the macro.

Macro commands:

CREATE MACRO - initially builds a new macro

REPLACE MACRO - used to modify an existing macro

EXECUTE MACRO - used to run a macro

DROP MACRO - deletes a macro of the DD.

Reporting TOTALS and SUBTOTALS

Teradata has the capability to generate the total and subtotals and at the same

time display the detail data from the rows that goes into creating the totals.

Totals(WITH)SELECT Last_Name ,First_Name ,Dept_no ,SalaryFROM Employee_tableWITH SUM(Salary);

Subtotals (WITH..BY)SELECT Last_Name ,

First_Name ,

Dept_no ,

Salary FROM Employee_table

WITH SUM(salary) (TITLE 'Departmental Salaries:') BY dept_no

Temporary Tables

Why Temporary Tables?

You can usually use simpler SQL statements. The system doesn't have to do aggregation.

The system may access Accounts based on the Primary Index value, which results in

a fast response.

Temporary Table types -

DERIVED TABLES: Tables which are created in spool and dropped when the query is completed.

VOLATILE TEMPORARY TABLES: Tables that do not survive a system restart.

GLOBAL TEMPORARY TABLES :require a base definition which is stored in the Data Dictionary(DD). Remains materialized until it is dropped or session terminates.

Temporary Tables contd....

Trigger Processing

A trigger is an event driven maintenance operation. The event is caused by a

modification to one or more columns of a row in a table. Triggering Statement

The user's initial SQL maintenance request that causes a row to change in a table

and then causes a trigger to fire (execute).

It can be: INSERT, UPDATE, DELETE, INSERT/SELECT

It cannot be: SELECT

Triggered Statement

It is the SQL that is automatically executed as a result of a triggering statement.

It can be: INSERT, UPDATE, DELETE, INSERT/SELECT, ABORT/ROLLBACK, EXEC

It cannot be: BEGIN/END TRANSACTION, COMMIT, CHECKPOINT, SELECT

Stored Procedures

Teradata provides Stored Procedural Language (SPL) to create Stored Procedures.

These procedures allow the combination of both SQL and SPL control statements to manage the delivery and execution of the SQL.

The processing flow of a procedure is more like a program. It is a procedural set of commands, where SQL is a non-procedural language.

DDL is not allowed within a procedure.

Stored Procedure Commands:

CREATE PROCEDURE, REPLACE PROCEDURE, DROP PROCEDURE,

SHOW PROCEDURE, RENAME PROCEDURE.

Stored Procedures contd....