Teradata Overview v 070503

26
Teradata Overview 7th May 2003

Transcript of Teradata Overview v 070503

Page 1: Teradata Overview v 070503

Teradata Overview

7th May 2003

Page 2: Teradata Overview v 070503

Agenda

• Technical Summary of Teradata Database• Our Development Priorities

Page 3: Teradata Overview v 070503

Teradata Database

• Is Relational Database Management System• Client Server architecture• Support for open standards (ODBC, OLE-DB, ANSI)• Support for emerging interoperability• Built-in automatic parallel processing

– Enables SHARED NOTHING architecture– Special purpose data loads– Special purpose backup utilities

• Runs on Intel platforms– NCR hardware (UNIX SRV.4, W2K)– and non NCR hardware (W2K)– and 64 Bit Intel (HP-UX)

Page 4: Teradata Overview v 070503

Op

erat

ing

Sys

tem

Hardware Platform

DB Services- Locking- Memory mgt- Data buffers- Optimiser

JournalUser data

Operating System Interfaces

Database Engine and Task Mgt

Disk/File Access Interfaces

Teradata is a S/W RDBMS

Page 5: Teradata Overview v 070503

Teradata is a DB Server

Un

ix S

RV

.4 o

r W

2K

Hardware Platform - INTEL

DB Services- Locking- Memory mgt- Data buffers- Optimiser

JournalUser Data

Parallel Data Extensions

(V)AMP = (Virtual) Access Module Processor

Point-to-point SCSI Interface

• Teradata is run as the only ‘application’ on the hardware platform

Page 6: Teradata Overview v 070503

Teradata - the VAMP

• SHARED NOTHING

• The VAMP is an autonomous copy of the RDBMS

• Each VAMP ‘owns’ a set of logical disks

• Multiple VAMPs run concurrently on the hardware node

• VNET see later

• Typically 6-10 VAMPs per node

VAMP1 VAMP2 VAMP3 VAMP4

INTEL Node

User Data

User Data

User Data

User Data

VNET

Page 7: Teradata Overview v 070503

The Teradata Optimiser

• The Teradata Optimiser (Parsing Engine)

• Talks SQL• No complied plans• One PE per external ‘data

source’ connection• Optimiser produces the data

access plan using advanced statistics> Cost based> Not sensitive to sequence> No hints> No overrides

PE1PE2

SELECT CustName, CustAddressFROM CustomerWHERE City = ‘Altrincham’ORDER BY 1;

Cache

VAMP1 VAMP2 VAMP3 VAMP4

User Data

User Data

User Data

User Data

VNET

Page 8: Teradata Overview v 070503

UN

IX S

VR

4 o

r N

T

Partitioning the Data

• Every Table MUST be defined with a primary index

• Teradata partitions the table automatically using the PI column as the row is inserted. Called HASHing.

• Every table is evenly distributed to every VAMP

VAMP1 VAMP6VAMP5VAMP4VAMP3VAMP2

VNET

Page 9: Teradata Overview v 070503

Automatic Data Partitioning

UN

IX S

VR

4 o

r N

TVAMP1 VAMP3VAMP2

PE1PE2

INSERT INTO Employee(Name,EmpNo,DeptNo,DOB,Sex,EdLev)VALUES (’SMITH T’,10021,700,460729,’F’,16);

VNET

• The Parsing Engine ‘compiles’ INSERT SQL

• The HASH routine will generate a value between 0-65536

> The HASH Map locates a VAMP within the system

• Note how the VNET is used for message passing - to pass the row to its destination VAMP

Page 10: Teradata Overview v 070503

Automatic Parallel Reads

• SQL request is optimised by the PE> PE issues an ‘All-AMPs’ broadcast to

the VNET

• Each AMP qualifies its rows autonomously

• PE waits on each AMP to broadcast ‘completion’

> PE issues an All-Amps ‘send’ broadcast to the VNET

> Each AMP sends a row to the VNET> VNET merges the qualifying rows U

NIX

SV

R4

or

NT

AMP1 AMP3AMP2

PE1PE2

SELECT * FROM EmployeeWHERE DeptNo = 700ORDER BY EmpNo;

VNET

Page 11: Teradata Overview v 070503

Scalability - Multiple Nodes

• Teradata can utilise (up to 512) loosely coupled hardware nodes

• BYNet is the hardware/software interconnect> PCI card and cabling> The BYNet performs the inter-node messaging

• The VAMPs appear as a single database image

UN

IX S

VR

4 o

r N

T

VAMP1 VAMP3VAMP2

PE1PE2

VNET

UN

IX S

VR

4 o

r N

T

VAMP4 VAMP6VAMP5

PE4PE3

VNET

BYNET

Page 12: Teradata Overview v 070503

Teradata Core Client Tools

MultiLoad

TPump

FastExport

BTEQ

Yes

Yes

Yes

Yes

Yes

Utility Parallel?Purpose

Fast Update, Insert, Upsert, Delete into 1-5 tables for 1 input pass.

Continuous Update, Insert, Upsert, Delete

Fast Data Unload of data from tables.

More traditional execution of SQL for creating tables, reports, tiny update.

FastLoad Fast initial data load into new table. Secondary indexes built later.

Page 13: Teradata Overview v 070503

Teradata is Teradata

• Automatic self balancing data placement• Automatic load balancing of client sessions• Automatic parallelism for data load/update/archive• Automatic transaction back-out and control• Automatic checkpoint/restart of load/update/archive• Automatic raid disk transparency• Automatic node recovery transparency• Automatic workload management• Automatic re-start of database after abort• Automatic data connectors for pipes, messaging queuing

• No Files, no TableSpaces, no Extents, no Datasets• No single point of failure• = Very High RAS

Page 14: Teradata Overview v 070503

MainframesIBM, Bull

and more...

Teradata Speaks Many Languages

DesktopWindows 9x, NT

XP, W2K

InternetNetwork Computers,MS Internet Explorer

Netscape, Java

UNIXNCR, Solaris,

HP, AIX

TeradataTeradataWarehouseWarehouse

Page 15: Teradata Overview v 070503

Query Tools Client Server

• ODBC standard connection from user workstation• Nominate all IP addresses in set-up for workload balancing

Middle TierODBC connect- BO Server (Universe)- MSI Server

PE1PE2

VNET

LAN

User Tier ODBC connect- BO Server- DSS Agent ServerQueryman TeraMiner

VAMP1 VAMP3VAMP2

Page 16: Teradata Overview v 070503

Call Level Interface

FastLoadLOGON TDP0/Vic, Winch;

DROP TABLE INVOICELINE_ERROR1;DROP TABLE INVOICELINE_ERROR2;BEGIN LOADING INVOICELINE ERRORFILES INVOICELINE_ERROR1, INVOICELINE_ERROR2;

DEFINE ORDERNO (CHAR(08)) , ORDERQTY (DEC(05)) , CUSTOMERNO (CHAR(08)) , ITEMNO (CHAR(08))File = /Custdata; SHOW;INSERT INTO INVOICELINE( OderNumber , OrderQuantity, CustmerId, ProductId) ;END LOADING;

FastLoad

TCP/IP• Empty single target only

UN

IX S

VR

4 AMP1 AMP3AMP2

PE1PE2

VNET

Page 17: Teradata Overview v 070503

FastLoad

• Disables transient journals for this job (= fast)• BIG History loads (several files = several jobs)

• Do Checkpoint• Can re-start a job• Do check the Error Tables as you go• Each job is moving a files worth to Teradata• Table is not useable until…• END LOADING (initiates Step 2)….Table now useable

• Can abort a single job (Drop all Tables) and start again

Page 18: Teradata Overview v 070503

MultiLoad

• Multiple input files• Multiple target tables• Logic for control of SQL

processing Call Level Interface

MultiLoad

TCP/IP

.BEGIN IMPORT MLOAD TABLES ACC_DATA WORKTABLES ACC_LOAD_DELTA_WT, ERRORTABLES ACC_LOAD_DELTA_ET ACC_LOAD_DELTA_UV; .LAYOUT ACCDELTA.FIELD ACC_NO INTEGER,.FIELD CONTROL_CDE CHAR(1),…...

.DML LABEL INSACC; INSERT INTO ACC_DATA (….

.DML LABEL UPDACC DO INSERT FOR MISSING UPDATE ROWS; UPDATE ACC_DATA SET…. INSERT INTO ACC_DATA SET …..

.IMPORT INFILE MLOADIN LAYOUT ACCDELTA APPLY INSACC WHERE CONTROL_CDE = 'I ' APPLY UPDACC WHERE CONTROL_CDE = 'U';

.END MLOAD;

UN

IX S

VR

4 AMP1 AMP3AMP2

PE1PE2

VNET

Page 19: Teradata Overview v 070503

MultiLoad

• Uses purpose built MLOAD journals (not Transient Journal)…..Sorts to the sequence processed from the input file(s)

• UPSERT processing• Must think in SET processing terms• MultiLoad places an MLOAD lock on the Table

• The table is not accessible (dirty read only )

• NEVER delete the restart table log which is generated by Multiload

• NEVER abort a job - the Table is still not accessible• ALWAYS re-submit the script and allow to finish

Page 20: Teradata Overview v 070503

Teradata Development Priorities

@ctive Data Warehousing

Page 21: Teradata Overview v 070503

Enterprise Resource

Management

Billing &Collections

ServiceProvisioning

ERP / SCMBack-Office Operational

Sales

Customer Service

E-C

om

merc

e

Marketing

CRM Front-Office Operational

Enterprise Data WarehouseEnvironment

Bett

er,

Faste

r C

usto

mer

Com

mu

nic

ati

on

s • Customer Relationships

• Demand Chain • Supply Chain

• Financial Operations

• Business Process Management•E-commerce

•Industry-specific operations

Bett

er,

Faste

r O

pera

tion

al A

cti

on

s

• A single view of the business• Analysis of detail-level data• Unlimited ability to grow • Real-time access to the data from front or back office operational

systems• Near real-time data feeds from operational systems• Eliminate expensive, inefficient data marts and Operational Data Stores

Teradata EDW Positioning

Page 22: Teradata Overview v 070503

Integrated, Strategic Decision Support Data

Batch Updates

Complex Queries

Strategic Decision Support

All Decision-Making Data Integrated

Complex, Strategic Queries Short, Tactical

Queries

Continuous Updates

QueryManager

Tactical Decision Support

Demand for Mixed Workload

Page 23: Teradata Overview v 070503

Teradata Data Warehouse Solution

Operational Data

Business Users

IT Users

Data Transformation

Data Staging

Centralised Data Warehouse & Management

Logical Data Mart DM

SingleShared

Teradata EDW

DatabaseODS DataDM

Page 24: Teradata Overview v 070503

Traditional Data Warehousing

SQL

Enterprise Information Repository

Data Warehouse

User Interface /Application

User Interface /Application

BusinessQuestion

BusinessQuestion

Basis of Decision

Basis of Decision

Acti

on

Information

Input

Data

Streams of information

• ‘Users construct Business

Questions which are used to

query the Data Warehouse

• Information is returned to the

User Application

• Users make decisions based on

the information, and then take

action

Page 25: Teradata Overview v 070503

SQL Query

TRIGGERTRIGGERPOINT?POINT?

TRIGGERTRIGGERPOINT?POINT?

Enterprise Information Repository

@ctive Data Warehousing

ACTION

Application Triggers

Application Triggers

Basis of Decision

Basis of Decision Information

Data Warehouse

BusinessQuestions

BusinessQuestions

TRIGGERTRIGGERFREQUENCY?FREQUENCY?

TRIGGERTRIGGERFREQUENCY?FREQUENCY?

• ‘Triggers’ are identified, and ‘optimized’ by the Data Warehouse

• Application continuously queries the Data Warehouse to analyze ‘real-time’ information (which is continuously refreshed)

• Results are compared to trigger points

• If a ‘threshold’ is reached, an ‘automatic action’ is initiated, or a User is ‘alerted’…

Business Application

Data

Continuous streams of information

Automatic Action

Automatic Action

Business Application

Continuous Query

Process

Triggers defined

Page 26: Teradata Overview v 070503

Teradata Scalability

Amount of Detailed Data Concurrent Users

CUSTOMER

CUSTOMER NUMBERCUSTOMER NAMECUSTOMER CITYCUSTOMER POSTCUSTOMER STCUSTOMER ADDRCUSTOMER PHONECUSTOMER FAX

ORDER

ORDER NUMBERORDER DATESTATUS

ORDER ITEM BACKORDERED

QUANTITY

ITEM

ITEM NUMBERQUANTITYDESCRIPTION

ORDER ITEM SHIPPED

QUANTITYSHIP DATE

Complexity of Data Model•Simple Direct at the start

•Moderate Multi-table Join

•Regression analysis

•Query tool support

•Complex, 58-way table join

•15 Pages, 37 From Clauses, 7 UNION’s, (Largest table >1 B rows)

Query Complexity