Teradata Overview v 070503
-
Upload
harishkode -
Category
Documents
-
view
59 -
download
4
Transcript of Teradata Overview v 070503
Teradata Overview
7th May 2003
Agenda
• Technical Summary of Teradata Database• Our Development Priorities
Teradata Database
• Is Relational Database Management System• Client Server architecture• Support for open standards (ODBC, OLE-DB, ANSI)• Support for emerging interoperability• Built-in automatic parallel processing
– Enables SHARED NOTHING architecture– Special purpose data loads– Special purpose backup utilities
• Runs on Intel platforms– NCR hardware (UNIX SRV.4, W2K)– and non NCR hardware (W2K)– and 64 Bit Intel (HP-UX)
Op
erat
ing
Sys
tem
Hardware Platform
DB Services- Locking- Memory mgt- Data buffers- Optimiser
JournalUser data
Operating System Interfaces
Database Engine and Task Mgt
Disk/File Access Interfaces
Teradata is a S/W RDBMS
Teradata is a DB Server
Un
ix S
RV
.4 o
r W
2K
Hardware Platform - INTEL
DB Services- Locking- Memory mgt- Data buffers- Optimiser
JournalUser Data
Parallel Data Extensions
(V)AMP = (Virtual) Access Module Processor
Point-to-point SCSI Interface
• Teradata is run as the only ‘application’ on the hardware platform
Teradata - the VAMP
• SHARED NOTHING
• The VAMP is an autonomous copy of the RDBMS
• Each VAMP ‘owns’ a set of logical disks
• Multiple VAMPs run concurrently on the hardware node
• VNET see later
• Typically 6-10 VAMPs per node
VAMP1 VAMP2 VAMP3 VAMP4
INTEL Node
User Data
User Data
User Data
User Data
VNET
The Teradata Optimiser
• The Teradata Optimiser (Parsing Engine)
• Talks SQL• No complied plans• One PE per external ‘data
source’ connection• Optimiser produces the data
access plan using advanced statistics> Cost based> Not sensitive to sequence> No hints> No overrides
PE1PE2
SELECT CustName, CustAddressFROM CustomerWHERE City = ‘Altrincham’ORDER BY 1;
Cache
VAMP1 VAMP2 VAMP3 VAMP4
User Data
User Data
User Data
User Data
VNET
UN
IX S
VR
4 o
r N
T
Partitioning the Data
• Every Table MUST be defined with a primary index
• Teradata partitions the table automatically using the PI column as the row is inserted. Called HASHing.
• Every table is evenly distributed to every VAMP
VAMP1 VAMP6VAMP5VAMP4VAMP3VAMP2
VNET
Automatic Data Partitioning
UN
IX S
VR
4 o
r N
TVAMP1 VAMP3VAMP2
PE1PE2
INSERT INTO Employee(Name,EmpNo,DeptNo,DOB,Sex,EdLev)VALUES (’SMITH T’,10021,700,460729,’F’,16);
VNET
• The Parsing Engine ‘compiles’ INSERT SQL
• The HASH routine will generate a value between 0-65536
> The HASH Map locates a VAMP within the system
• Note how the VNET is used for message passing - to pass the row to its destination VAMP
Automatic Parallel Reads
• SQL request is optimised by the PE> PE issues an ‘All-AMPs’ broadcast to
the VNET
• Each AMP qualifies its rows autonomously
• PE waits on each AMP to broadcast ‘completion’
> PE issues an All-Amps ‘send’ broadcast to the VNET
> Each AMP sends a row to the VNET> VNET merges the qualifying rows U
NIX
SV
R4
or
NT
AMP1 AMP3AMP2
PE1PE2
SELECT * FROM EmployeeWHERE DeptNo = 700ORDER BY EmpNo;
VNET
Scalability - Multiple Nodes
• Teradata can utilise (up to 512) loosely coupled hardware nodes
• BYNet is the hardware/software interconnect> PCI card and cabling> The BYNet performs the inter-node messaging
• The VAMPs appear as a single database image
UN
IX S
VR
4 o
r N
T
VAMP1 VAMP3VAMP2
PE1PE2
VNET
UN
IX S
VR
4 o
r N
T
VAMP4 VAMP6VAMP5
PE4PE3
VNET
BYNET
Teradata Core Client Tools
MultiLoad
TPump
FastExport
BTEQ
Yes
Yes
Yes
Yes
Yes
Utility Parallel?Purpose
Fast Update, Insert, Upsert, Delete into 1-5 tables for 1 input pass.
Continuous Update, Insert, Upsert, Delete
Fast Data Unload of data from tables.
More traditional execution of SQL for creating tables, reports, tiny update.
FastLoad Fast initial data load into new table. Secondary indexes built later.
Teradata is Teradata
• Automatic self balancing data placement• Automatic load balancing of client sessions• Automatic parallelism for data load/update/archive• Automatic transaction back-out and control• Automatic checkpoint/restart of load/update/archive• Automatic raid disk transparency• Automatic node recovery transparency• Automatic workload management• Automatic re-start of database after abort• Automatic data connectors for pipes, messaging queuing
• No Files, no TableSpaces, no Extents, no Datasets• No single point of failure• = Very High RAS
MainframesIBM, Bull
and more...
Teradata Speaks Many Languages
DesktopWindows 9x, NT
XP, W2K
InternetNetwork Computers,MS Internet Explorer
Netscape, Java
UNIXNCR, Solaris,
HP, AIX
TeradataTeradataWarehouseWarehouse
Query Tools Client Server
• ODBC standard connection from user workstation• Nominate all IP addresses in set-up for workload balancing
Middle TierODBC connect- BO Server (Universe)- MSI Server
PE1PE2
VNET
LAN
User Tier ODBC connect- BO Server- DSS Agent ServerQueryman TeraMiner
VAMP1 VAMP3VAMP2
Call Level Interface
FastLoadLOGON TDP0/Vic, Winch;
DROP TABLE INVOICELINE_ERROR1;DROP TABLE INVOICELINE_ERROR2;BEGIN LOADING INVOICELINE ERRORFILES INVOICELINE_ERROR1, INVOICELINE_ERROR2;
DEFINE ORDERNO (CHAR(08)) , ORDERQTY (DEC(05)) , CUSTOMERNO (CHAR(08)) , ITEMNO (CHAR(08))File = /Custdata; SHOW;INSERT INTO INVOICELINE( OderNumber , OrderQuantity, CustmerId, ProductId) ;END LOADING;
FastLoad
TCP/IP• Empty single target only
UN
IX S
VR
4 AMP1 AMP3AMP2
PE1PE2
VNET
FastLoad
• Disables transient journals for this job (= fast)• BIG History loads (several files = several jobs)
• Do Checkpoint• Can re-start a job• Do check the Error Tables as you go• Each job is moving a files worth to Teradata• Table is not useable until…• END LOADING (initiates Step 2)….Table now useable
• Can abort a single job (Drop all Tables) and start again
MultiLoad
• Multiple input files• Multiple target tables• Logic for control of SQL
processing Call Level Interface
MultiLoad
TCP/IP
.BEGIN IMPORT MLOAD TABLES ACC_DATA WORKTABLES ACC_LOAD_DELTA_WT, ERRORTABLES ACC_LOAD_DELTA_ET ACC_LOAD_DELTA_UV; .LAYOUT ACCDELTA.FIELD ACC_NO INTEGER,.FIELD CONTROL_CDE CHAR(1),…...
.DML LABEL INSACC; INSERT INTO ACC_DATA (….
.DML LABEL UPDACC DO INSERT FOR MISSING UPDATE ROWS; UPDATE ACC_DATA SET…. INSERT INTO ACC_DATA SET …..
.IMPORT INFILE MLOADIN LAYOUT ACCDELTA APPLY INSACC WHERE CONTROL_CDE = 'I ' APPLY UPDACC WHERE CONTROL_CDE = 'U';
.END MLOAD;
UN
IX S
VR
4 AMP1 AMP3AMP2
PE1PE2
VNET
MultiLoad
• Uses purpose built MLOAD journals (not Transient Journal)…..Sorts to the sequence processed from the input file(s)
• UPSERT processing• Must think in SET processing terms• MultiLoad places an MLOAD lock on the Table
• The table is not accessible (dirty read only )
• NEVER delete the restart table log which is generated by Multiload
• NEVER abort a job - the Table is still not accessible• ALWAYS re-submit the script and allow to finish
Teradata Development Priorities
@ctive Data Warehousing
Enterprise Resource
Management
Billing &Collections
ServiceProvisioning
ERP / SCMBack-Office Operational
Sales
Customer Service
E-C
om
merc
e
Marketing
CRM Front-Office Operational
Enterprise Data WarehouseEnvironment
Bett
er,
Faste
r C
usto
mer
Com
mu
nic
ati
on
s • Customer Relationships
• Demand Chain • Supply Chain
• Financial Operations
• Business Process Management•E-commerce
•Industry-specific operations
Bett
er,
Faste
r O
pera
tion
al A
cti
on
s
• A single view of the business• Analysis of detail-level data• Unlimited ability to grow • Real-time access to the data from front or back office operational
systems• Near real-time data feeds from operational systems• Eliminate expensive, inefficient data marts and Operational Data Stores
Teradata EDW Positioning
Integrated, Strategic Decision Support Data
Batch Updates
Complex Queries
Strategic Decision Support
All Decision-Making Data Integrated
Complex, Strategic Queries Short, Tactical
Queries
Continuous Updates
QueryManager
Tactical Decision Support
Demand for Mixed Workload
Teradata Data Warehouse Solution
Operational Data
Business Users
IT Users
Data Transformation
Data Staging
Centralised Data Warehouse & Management
Logical Data Mart DM
SingleShared
Teradata EDW
DatabaseODS DataDM
Traditional Data Warehousing
SQL
Enterprise Information Repository
Data Warehouse
User Interface /Application
User Interface /Application
BusinessQuestion
BusinessQuestion
Basis of Decision
Basis of Decision
Acti
on
Information
Input
Data
Streams of information
• ‘Users construct Business
Questions which are used to
query the Data Warehouse
• Information is returned to the
User Application
• Users make decisions based on
the information, and then take
action
SQL Query
TRIGGERTRIGGERPOINT?POINT?
TRIGGERTRIGGERPOINT?POINT?
Enterprise Information Repository
@ctive Data Warehousing
ACTION
Application Triggers
Application Triggers
Basis of Decision
Basis of Decision Information
Data Warehouse
BusinessQuestions
BusinessQuestions
TRIGGERTRIGGERFREQUENCY?FREQUENCY?
TRIGGERTRIGGERFREQUENCY?FREQUENCY?
• ‘Triggers’ are identified, and ‘optimized’ by the Data Warehouse
• Application continuously queries the Data Warehouse to analyze ‘real-time’ information (which is continuously refreshed)
• Results are compared to trigger points
• If a ‘threshold’ is reached, an ‘automatic action’ is initiated, or a User is ‘alerted’…
Business Application
Data
Continuous streams of information
Automatic Action
Automatic Action
Business Application
Continuous Query
Process
Triggers defined
Teradata Scalability
Amount of Detailed Data Concurrent Users
CUSTOMER
CUSTOMER NUMBERCUSTOMER NAMECUSTOMER CITYCUSTOMER POSTCUSTOMER STCUSTOMER ADDRCUSTOMER PHONECUSTOMER FAX
ORDER
ORDER NUMBERORDER DATESTATUS
ORDER ITEM BACKORDERED
QUANTITY
ITEM
ITEM NUMBERQUANTITYDESCRIPTION
ORDER ITEM SHIPPED
QUANTITYSHIP DATE
Complexity of Data Model•Simple Direct at the start
•Moderate Multi-table Join
•Regression analysis
•Query tool support
•Complex, 58-way table join
•15 Pages, 37 From Clauses, 7 UNION’s, (Largest table >1 B rows)
Query Complexity