Introduction to the Teradata RDBMS for UNIX
of 315
/315
Embed Size (px)
Transcript of Introduction to the Teradata RDBMS for UNIX
Untitled DocumentVersion 2 Release 2.1.0 BD10-4955-B
BD10-4955-B 01.00.00 May 29, 1998
Introduction to the Teradata® RDBMS for UNIX® Version 2 Release 2.1
The product described in this book is a licensed product of NCR Corporation.
BYNET is a registered trademark of Teradata Corporation CICS, CICS/ESA, CICS/VS, DATABASE2, DB2, IBM, MVS/ESA, MVS/XA, QMS, RACF, SQL/DS, VM/XA, and VTAM are trademarks or registered trademarks of International Business Machines Corporation in the U. S. and other countries. DBC/1012 is a registered trademark of Teradata Corporation. DEC, VAX, MicroVax, and VMS are registered trademarks of Digital Equipment Corporation. EXCELAN is a trademark of Excelan, Incorporated. HEWLETT-PACKARD is a registered trademark of Hewlett-Packard Company, INTELLECT and KBMS are trademarks of Trinzic Corporation. INTERTEST is a registered trademark of Computer Associates International, Inc. ISO is a trademark of International Standards Organization. MICROSOFT, MS-DOS, DOS/V, Windows, Windows 95, and Windows NT are registered trademarks of Microsoft Corporation. SABRE is a trademark of Seagate Technology, Inc. SAS and SAS/C are registered trademarks of SAS Institute Inc. SUN and SUN OS are trademarks of Sun Microsystems, Incorporated. TCP/IP protocol is a United States Department of Defense Standard ARPANET protocol. TERADATA is a registered trademark of Teradata Corporation. UNIX is a registered trademark of UNIX System Laboratories. YNET is a registered trademark of Teradata Corporation. X/Open and the X device are trademarks of X/Open Company Limited. XNS is a trademark of Xerox Corporation.
It is the policy of NCR Corporation (NCR) to improve products as new technology, components, software, and firmware become available. NCR, therefore, reserves the right to change specifications without prior notice.
All features, functions, and operations described herein may not be marketed by NCR in all parts of the world. In some instances, photographs are of equipment prototypes. Therefore, before using this document, consult with your NCR representative or NCR office for information that is applicable and current.
To maintain the quality of our information products, we need your comments on the accuracy, clarity, organization, and value of this book. Please complete the User Feedback Form and mail or e-mail the form to:
[email protected]
Information Engineering NCR Corporation 100 North Sepulveda Boulevard El Segundo, CA 90245-4361 U.S.A.
Copyright © 1998 By NCR Corporation Dayton, Ohio U.S.A. All Rights Reserved Printed in U.S.A.
i
About This Book Preface
About This Book
Note: The name of the Teradata Database System (DBS) has been changed to the Teradata Relational Database Management System (RDBMS) to more accurately reflect the true nature of the product. This change will take place over a period of time in documentation, product names, and screen displays. In the meantime, all occurrences of “Teradata Database System,” “Teradata DBS,” or “DBS” should be read as referring to the “Teradata Relational Database Management System.”
Purpose This book provides an introduction to the Teradata RDBMS for UNIX.
Audience This book is intended for anybody who uses the Teradata RDBMS for UNIX.
How This Book Is Organized
This book contains thirteen chapters, one appendix and a glossary:
Chapter 1, “Overview,” introduces the Teradata RDBMS, including its design philosophy and goals, its shared information architecture, and its scalability.
Chapter 2, “Teradata RDBMS Architecture,” introduces the hardware and software architecture that supports the Teradata RDBMS, including both client and server software. System 3500, System 4500, and System 5100 (WorldMark) hardware is described.
Chapter 3, “The Relational Model,” presents an overview of the relational model for database management, including an introduction to normalization and a brief discussion of Teradata RDBMS macros.
Chapter 4, “Data Definition,” describes the data definition capabilities of Teradata SQL, the Structured Query Language, including how to create, change, and delete databases, tables, indexes, and macros.
Chapter 5, “Data Manipulation,” describes the data manipulation capabilities of Teradata SQL, including the SELECT, INSERT, UPDATE, and DELETE statements.
Chapter 6, “Views,” introduces the concept of the view, emphasizing that views are virtual, not base tables. The chapter also describes why views are the recommended means by which to present base table information to end users.
Preface About This Book
Chapter 7, “Data Dictionary,” describes the Data Dictionary (DD), the system catalog for the Teradata RDBMS. The DD includes definitions for the database objects, user characteristics, and much more.
Chapter 8, “Application Development,” introduces application development in the Teradata RDBMS environment, including the use of embedded SQL and CLI calls in client programming languages.
Chapter 9, “Fault Tolerance,” describes fault tolerance in the Teradata RDBMS, including both hardware and software elements.
Chapter 10, “Concurrency Control and Recovery,” introduces the topic of concurrency control and transactions. Object locking, serializability of transactions, and the two-phase commit protocol for distributed databases are among the subjects described.
Chapter 11, “Security and Integrity,” discusses security and integrity in the Teradata RDBMS environment.
Chapter 12, “System Administration,” introduces system administration of the Teradata RDBMS. Topics include user and space allocation, accounting, monitoring, and server-resident utilities.
Chapter 13, “Operating and Configuration Specifications,” describes the capacities of and requirements for the Teradata RDBMS.
Appendix A, “How the Teradata RDBMS for UNIX Differs from the Teradata RDBMS for TOS,” describes the differences between Version 1 and Version 2 Teradata database management systems.
The “Glossary” defines frequently used terms in the Teradata RDBMS environment.
Prerequisites You should be familiar with basic computer technology, NCR system hardware, the Teradata RDBMS, the system console environment, and X Windows.
It may be helpful to review the following books:
Introduction to Teradata RDBMS for UNIX Teradata RDBMS for UNIX Support Utilities Reference
ii Introduction to the Teradata RDBMS for UNIX
Preface Changes to This Book
Changes to This Book
Changes made to the Introduction to the Teradata RDBMS for UNIX are focused on DR maintenance and include:
Join Index
DR 37060
Join Index represents a new type of indexing structure. For introductory information on Join Index see page 4-10 and page 4-17.
For general information on Join Index, see the Teradata RDBMS for UNIX V2R2.1 Base System Release Definition and Transmittal Document. For usage information see the section on Join Index in the Teradata RDBMS for UNIX Database Design and Administration Manual.
RFC to provide ESCON mainframe channel connectivity
DCR 7030
This DR addresses changes to the mainframe physical connection to the Teradata server. Pages in this document that are impacted: page 2-3, page 2-4, page 2-5 and page 2-11.
Hash Join
DR 39131
Hash Join is an alternative join scheme and is introduced on page 5-13.
Decimal 18 Default is Regression Problem
DR 39789
The increase of the maximum Decimal value for TotalDigits from 15 to 18 has caused regression problems some customer applications and third party vendor processes. This DR is addressed in page 4-2 and page 4-2.
Introduction to the Teradata RDBMS for UNIX iii
Preface Changes to This Book
Minor wording changes include:
DR 38139
Throughout this reference, there is frequent mention of the DATE parameter in a 2-digit year format ‘YY/MM/DD’.
Teradata RDBMS V2R2.1 introduces the use of a system-wide default called the CenturyBreak parameter which the RDBMS software will use to internally convert 2-digit dates (‘YY’) to the correct 4-digit date (‘XXYY’). This new parameter is a new general field in the DBS control record.
For more information on the CenturyBreak parameter see Chapter 14, “Setting Up, Creating, and Modifying the Database Structure,” of the Teradata RDBMS for UNIX Database Design and Administration Manual.
iv Introduction to the Teradata RDBMS for UNIX
Preface List of Acronyms
List of Acronyms
The following acronyms, listed in alphabetical order, are used in this book:
1NF First Normal Form
2NF Second Normal Form
API Application Programming Interface
ASF2 Archive Storage Facility 2
AWS Administrative Workstation
CMS Conversational Monitor System
FIPS Federal Information Processing Standards
Introduction to the Teradata RDBMS for UNIX v
Preface List of Acronyms
I/O Input/Output
MOSI Micro Operating System Interface
MPP Massively Parallel Processing
MVS Multiple Virtual Storage
NUPI Nonunique Primary Index
NUSI Nonunique Secondary Index
ODBC Open Database Connectivity
OS/VS Operating System/Virtual Storage
PDE Parallel Database Extensions
RI Referential Integrity
SMP Symmetric Multi-Processing
TDP Teradata Director Program
TOS Teradata Operating System
TPA Trusted Parallel Application
TSO Time Sharing Option
UPI Unique Primary Index
USI Unique Secondary Index
vi Introduction to the Teradata RDBMS for UNIX
Preface List of Acronyms
VM/SP Virtual Machine/System Product
Preface Teradata RDBMS for UNIX Library
Teradata RDBMS for UNIX Library
Titles of publications in the Teradata RDBMS for UNIX library begin with Teradata RDBMS for UNIX. The following publications, listed in alphabetical order, apply to Teradata RDBMS for UNIX, Version 2 Release 2.1, and will be available May 29, 1998:
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata RDBMS for UNIX publications, please visit our Internet site at:
http://www.info.ncr.com
BD10-5060-B Database Window Reference
BD10-5061-E Field Support Guide
BD10-4956-A Master Index, Bibliography, and Glossary
BD10-5062-D Messages Reference
BD10-5013-A Performance Monitor Reference
BD10-5064-C Resource Usage Macros and Tables
BD10-5052-B Security Administration Guide
B035-1507-048B SQL Quick Reference
BD10-5067-D Utilities Reference
B035-1902-048D Teradata RDBMS for UNIX V2R2.1 and Client 9801 User Documentation CD-ROM
viii Introduction to the Teradata RDBMS for UNIX
Preface Client Reference Library
The following publications, listed in alphabetical order, apply to Teradata Client 9801 products:
Product ID Publication Title
BD10-4971-B Robotic Library Manager Installation and User Guide
B035-3032-097B Robotic Library Manager Reference Card
BD10-4952-C Teradata Application Programming With Embedded SQL for C, COBOL, and PL/I
BD10-5069-C Teradata Archive/Recovery Reference for Channel-Attached Systems
BD10-5087-B Teradata Archive Storage Facility 2 (ASF2) Administration and Operations
BD10-5086-B Teradata Archive Storage Facility 2 (ASF2) Command Language Reference Manual
BD10-5091-C Teradata BTEQ Reference
B035-2401-038A Teradata Client Command Summary
BD10-5084-C Teradata Client for MVS Installation Guide
BD10-5095-C Teradata Client for NCR UNIX MP-RAS Installation Guide
BD10-5085-B Teradata Client for VM Installation Guide
BD10-5024-B Teradata Data Definition Language Processor Reference
B035-3027-107A Teradata Database Query Manager (DBQM) Administrator’s Guide
B035-3029-107A Teradata Database Query Manager (DBQM) Programmer’s Guide
Introduction to the Teradata RDBMS for UNIX ix
Preface Client Reference Library
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata Client publications, please visit our Internet site at:
http://www.info.ncr.com
BD10-5094-B Teradata Enhanced Call-Level Interface Reference
BD10-5079-C Teradata FastExport Reference
BD10-4954-D Teradata FastLoad Reference
BD10-5075-A Teradata ITEQ User’s Guide for Channel-Attached Systems
BST0-2122-30 Teradata ITEQ Keypad Template
BST0-2122-34 Teradata ITEQ Keypad Template (3270 PC)
BST0-2126-20 Teradata ITEQ Reference
BD10-5076-C Teradata MultiLoad Reference
BST0-2141-00 Teradata ODBC Driver for Windows Installation and User’s Guide
B035-3021-018A Teradata Parallel Data Pump (TPump) Reference
BD10-5062-D Teradata RDBMS for UNIX Messages Reference
BD10-4966-C Teradata TDP Reference
BD10-5083-B Teradata TS/API Installation Guide
BD10-5082-B Teradata TS/API System & Database Administration Guide
BD10-5081-B Teradata TS/API User’s Guide
BD10-5090-A Teradata WinCLI Application Developer’s Guide
BD10-5093-A Teradata WinCLI Installation Guide
B035-1902-048D Teradata RDBMS for UNIX V2R2.1 and Client 9801 User Documentation CD-ROM
Product ID Publication Title
Preface How to Order Teradata Publications
How to Order Teradata Publications
You may always order Teradata publications through your NCR Sales Representative, or you may use one of the methods listed below.
Order Form To order Teradata publications, use the Information Products Order Form (form number IPP-WD02001).
Ordering Address Send orders to the following address:
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata publications, please visit our Internet site at:
http://www.info.ncr.com
Non- U.S. Orders
NCR IPP-BRUSSELS-OTC Rue de la Fusee 50 B-1130 Brussels Belgium
FAX: 32-2-727-95-50 PHONE: 32-2-727-95-49 or 32-2-727-95-71 E-MAIL: [email protected]
Introduction to the Teradata RDBMS for UNIX xi
Preface How to Order Teradata Publications
xii Introduction to the Teradata RDBMS for UNIX
Contents
Preface
About This Chapter...............................................................................2-1 Introduction .....................................................................................2-1 Hardware .........................................................................................2-1 System Configuration.....................................................................2-3 Client Software ................................................................................2-6 Server Software ...............................................................................2-8
Table of Contents
Table of Contents
About This Chapter...............................................................................3-1 Introduction .....................................................................................3-1 What is a Relational Database? .....................................................3-1 Some Other Definitions..................................................................3-2
Table of Contents
Table of Contents
Introduction ...................................................................................4-19 Dropping a Table ..........................................................................4-19 Dropping an Index........................................................................4-19
The SELECT Statement .........................................................................5-2 Introduction .....................................................................................5-2 Relational Algebra ..........................................................................5-2 Teradata SQL Expressions.............................................................5-3 Arithmetic Operators .....................................................................5-3 Aggregate Operators ......................................................................5-4 Comparison Operators...................................................................5-4 Logical Operators............................................................................5-5 Partial String Matching Operator .................................................5-5 Set Operators ...................................................................................5-6 Other Operators ..............................................................................5-6 Arithmetic Functions......................................................................5-7
Using Fully Qualified Names to Reference Databases and Tables in Teradata SQL......................................................................5-8
Introduction .....................................................................................5-8 Fully Qualified Names ...................................................................5-8
Select Specific Rows ...............................................................5-11 Specifying Order in the Results Table........................................5-12 Defining Groups............................................................................5-12 Including Information from More Than
Table of Contents
Using Teradata SQL in Application Programs ...............................5-20 Introduction ...................................................................................5-20 Embedded SQL and Client Programming Languages............5-20 Cursors ...........................................................................................5-21
Restrictions on DML Operations on Views .......................................6-6 Introduction .....................................................................................6-6 Views with Aggregates ..................................................................6-6 Views with Joins..............................................................................6-6
For More Information ...........................................................................6-7
Table of Contents
Using Macros as SQL Applications ....................................................8-4 Introduction .....................................................................................8-4 Creating a Macro.............................................................................8-4 Using a Macro..................................................................................8-5 Modifying a Macro .........................................................................8-5 Deleting a Macro .............................................................................8-5
Using the EXPLAIN Statement As a Tool To Optimize Your SQL Code..................................................................8-6
Introduction .....................................................................................8-6
Table of Contents
Using EXPLAIN: First Example....................................................8-7 Using EXPLAIN: Second Example...............................................8-8
Introduction ...................................................................................8-11 TS/API Products...........................................................................8-11 Compatible Third Party Software Products..............................8-11
Table of Contents
For More Information .........................................................................9-13
About This Chapter.............................................................................10-1 Introduction ...................................................................................10-1 Concurrency Control ....................................................................10-1 Recovery .........................................................................................10-1
Table of Contents
About This Chapter.............................................................................11-1 Introduction ...................................................................................11-1 Definition of Security ...................................................................11-1 Definition of Integrity ..................................................................11-1 Tools for Enforcing System Security ..........................................11-1 Tools for Enforcing System Integrity .........................................11-2
Table of Contents
Table of Contents
About This Chapter.............................................................................13-1 Introduction ...................................................................................13-1
For More Information .........................................................................13-6
Appendix A How the Teradata RDBMS for UNIX Differs from the Teradata RDBMS for TOS
About This Appendix ..........................................................................A-1 Teradata RDBMS for UNIX Differences............................................A-2
Improved Performance and Added Features ............................A-3 Increased Number of Hash Buckets............................................A-3 Enhanced Row Evaluation ...........................................................A-4 File System Improvements ...........................................................A-4 Automatic Detection of Cylinder Fragmentation .....................A-5
Table of Contents
Additional General Improvements....................................................A-8 How the Teradata RDBMS for UNIX Differs
Glossary Glossary .................................................................................... Glossary-1
List of Figures
List of Figures
Chapter 1
RDBMS.............................................................................1-5
Chapter 3
Chapter 9
Fault Tolerance
Function of Time.........................................................10-15
xxviii Introduction to the Teradata RDBMS for UNIX
Revision Record
Date Description
May 29, 1998 Supports Teradata RDBMS for UNIX V2R2.1.0
xxx
Overview
Chapter 1
Introduction to the Teradata RDBMS for UNIX 1
-1
Introduction 1
This chapter presents an overview of the Teradata Relational Database Management System (RDBMS), including perspectives on its design and brief reviews of the hardware and software systems that comprise the Teradata RDBMS.
Design Perspectives 1
The topic on design perspectives for the Teradata RDBMS includes descriptions of the following:
Research ideas leading to the eventual design Design philosophy and goals Scalability Shared information architecture
Teradata Database Software 1
The topic on Teradata software includes descriptions of the following:
The structured query language (SQL) and its uses for application programming and interactive database queries
The Teradata database management system The Teradata file system and disk handling system
Client Software 1
The topic on client software includes descriptions of the following:
The request handler (Call Level Interface, or CLI) The data communications component (Teradata Director
Program, or TDP) Application development services, including:
A SQL preprocessor CLI Third party query front ends, gateways, and fourth
generation languages Data loading utilities The archive/restore utility
Overview Design Perspectives
Design Perspectives 1
Introduction 1
This topic describes the considerations that went into the design of the original Teradata Database System. The topic also explains the overall perspectives behind the system.
Charter for the Teradata Database System 1
The original charter for development of the Teradata RDBMS included the following goals:
Large capacity database system with thousands of MIPS capable of storing terabytes of data and billions of rows
Fault tolerance to ensure data integrity Network connectivity Manageable growth Relational database management system Faster than other relational systems Common access language Single data store for multiple clients in a client/server
architecture
Research Ideas Leading to the Design of the Teradata Database System 1
The hardware component of the first generation Teradata RDBMS was a database machine. The current generation machine is a general purpose massively parallel machine running the Teradata RDBMS as a trusted parallel application (TPA). The earliest database machines were comprised of specialized hardware components. These machines were very expensive to implement and did not provide improved performance.
The concept behind the Teradata RDBMS was to build an inexpensive system using mostly off-the-shelf hardware components that would meet and exceed the performance of conventional database management systems using relational database management.
The architecture incorporates a parallel, distributed architecture in which the distributed functions communicate by means of a fast interconnect structure. This proprietary interconnect structure in the current architecture is known as the BYNET (for MPP systems) or the Vnet (for SMP systems).
Shared Information Architecture 1
One of the principal goals for the design of the Teradata RDBMS was to provide a single data store for any number of client architectures. This Shared Information Architecture (SIA) eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, workstations, and personal
1-2 Introduction to the Teradata RDBMS for UNIX
Overview Design Perspectives
Figure 1-1 Teradata RDBMS Shared Information Architecture
Teradata RDBMS single data store
Unisys A-series
Overview Teradata Database Software
Teradata Database Software 1
Introduction 1
The Teradata Database Software is the foundation for the relational database server. Its purpose is to support SQL manipulations of the database.
The server software includes the following components:
Channel communications support LAN gateway communications support SQL parser Request dispatcher Session control Database manager File manager
Structured Query Language (SQL) 1
The structured query language (SQL) is a data sublanguage designed specifically for manipulating data in relational databases. SQL is the only language the Teradata RDBMS understands, so all database manipulations, whether embedded in an application program or resulting from an interactive query, use SQL and SQL only.
The figure shows a process flow of a SQL statement through the Teradata RDBMS on a channel-attached system.
Process flow in a network-attached system is somewhat different (substituting the micro operating system (MOSI) and micro Teradata Director Program (MTDP) for the TDP), but the basic idea is very similar.
1-4 Introduction to the Teradata RDBMS for UNIX
Overview Teradata Database Software
Figure 1-2 Process Flow of a SQL Statement Through the Teradata RDBMS
SQL query Results table
Overview Teradata Database Software
The following table describes the process flows illustrated by this picture.
Stage Process
1 A user generates an SQL query on the channel-attached client. The query can either be from a BTEQ session at an interactive terminal, from a compatible fourth generation language, or can originate from within an application program coded in a host language.
2 The request/results packaging component, CLI, packages the request and sends it to the TDP for routing to the server.
3 The TDP establishes a session, then routes the request across the communications channel to the parsing engine (PE).
4 The parser component of the PE opens the request package and parses the SQL code for processing, interprets it, checks its syntax, evaluates its semantics, and optimizes the access plan.
IF the SQL source code parses . . . THEN the . . .
without errors the parser decodes the request into a series of work steps and passes them to the dispatcher.
with errors the dispatcher receives the appropriate error message and returns it to the requester. Processing terminates.
The dispatcher sequences the steps and passes them on to the BYNET (or Vnet) with instructions about whether the steps are for one Access Module Process (AMP), an AMP group, or for all AMPs.
5 The BYNET (or Vnet on a single node system) distributes the execution steps to the appropriate AMP for processing.
6 The AMPs process the execution steps by performing select, insert, delete, and update operations on the database. The AMPs make these operations by making calls to the file system.
The AMPs also perform other functions such as journaling, space accounting, and index maintenance.
7 The file system performs primitive physical data block operations by locating the data blocks to be manipulated and then passing control to the disk subsystem.
1-6 Introduction to the Teradata RDBMS for UNIX
Overview Teradata Database Software
8 The disk subsystem retrieves the requested blocks for the file system.
9 The disk manager returns the requested blocks to the file system.
10 The file system returns the requested data to the database manager.
11 The database manager sends a message back to the dispatcher stating that the data is ready to be returned to the requesting user, then sorts and transmits the data to the interface engine over the BYNET.
12 The BYNET (or Vnet on a single node system) merges the sorted response and returns it to the requesting interface engine for packaging.
13 The dispatcher builds the response message and routes it to the communications channel driver for return to the requesting client system.
14 The TDP receives and unpacks the response messages and makes them available to CLI.
15 CLI passes the received data back to the requesting application in blocks.
16 The requesting application receives the response data in the form of a relational table.
Stage Process
Overview For More Information
For More Information 1
For more information on the topics presented in this chapter, see the following Teradata RDBMS manuals.
IF you want to learn more about . . . THEN see this manual . . .
Structured Query Language Teradata RDBMS for UNIX SQL Reference
Data flows through the Teradata RDBMS
Teradata RDBMS for UNIX Database Design and Administration
General aspects of the Teradata RDBMS
Teradata RDBMS for UNIX Database Design and Administration
1-8 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture
Chapter 2
Introduction to the Teradata RDBMS for UNIX 2
-1
About This Chapter 2
The hardware that supports the Teradata software is based on off- the-shelf microprocessor technology combined with a proprietary communications network connecting the microprocessor elements.
The purpose of this chapter is to briefly mention and describe these hardware components and to describe the software architecture they support. Details are provided in the appropriate reference manuals.
Hardware 2
This manual documents the basic hardware configurations for both the SMP and MPP hardware platforms.
Unlike earlier database server technology supporting the Teradata database management system, these machines do not have specialized hardware processors.
Instead, they run virtual processors called vprocs (virtual processors). These vprocs provide the parallel environment that enables the Teradata RDBMS to run on SMP and MPP systems.
Teradata RDBMS Architecture About This Chapter
The components of the SMP and MPP machines are:
Component Description Function
Node Basic hardware processing unit for the SMP and MPP machines.
Symmetric multiprocessing (SMP) hardware unit with Database software Client interface software UNIX operating system Multiprocessor shared-
memory processors RAID disk arrays Failsafe power provisions.
BYNET Interprocessor network to link nodes.
Note: single node configurations use the Vnet instead of the BYNET.
Connects processors by broadcast, multicast, or point-to-point communication, depending on the situation.
SMP and single-node MPP systems use a software emulation of the BYNET called Vnet.
2-2 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture About This Chapter
System Configuration 2
Base and range limits for the SMP systems are described in the following table.
Note: Specifications are subject to change.
System Component Minimum Maximum
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
5100S SMP Nodes 1 1
Pentium CPU 4 32
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
5100M Per Node
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
Client Software 2
The SMP and MPP hardware supports the Teradata RDBMS running both with and without a channel- or network-attached client.
The following table describes the available client software, recognizing that the “client” may be the 3500/4100/4500/5100 machine itself. These products can also be used to access a Teradata RDBMS for TOS running on an NCR 3600 or DBC/1012 platform.
Contact your NCR representative for information on supported platforms for each product and for custom ports to other platforms.
Software Description Supported Access
All channel- and network-attached clients
C Preprocessor Permits embedding SQL in C programs.
All channel- and network-attached clients
COBOL Preprocessor
Channel-attached clients
Channel-attached clients
Can be embedded in application programs using function calls.
All channel- and network-attached clients
TDP Data communication management.
Handles sessions, logging, recovery, restarts, physical I/O from the PEs, and security.
Channel-attached clients
Handles logging, recovery, restarts, and physical I/O from the PEs.
Session and security management are handled by the Gateway software on the server.
Network-attached clients
Teradata RDBMS Architecture About This Chapter
Archive/ Restore
Archives data to tape; restores taped data to Teradata RDBMS
Channel-attached clients
Archives data to tape; restores taped data to Teradata RDBMS
SMP and MPP platforms.
FastExport Extracts large volumes of data from the Teradata RDBMS.
All channel- and network-attached clients
FastLoad Performs high performance data loading into empty tables.
All channel- and network-attached clients
MultiLoad Performs high performance data loading, including inserts, updates, and deletions, against up to 5 existing tables.
All channel- and network-attached clients
Software Description Supported Access
Teradata RDBMS Architecture About This Chapter
Server Software 2
The server software includes all the following:
The Database Window The RDBMS Gateway A SQL parser and syntaxer A request dispatcher A session controller Facilities to control load balancing over the communications
network The Teradata database management software The Teradata file system Teradata Parallel Database Extensions (PDE) The UNIX operating system
A server may also contain data loading utilities such as MultiLoad and FastLoad, data export utilities like FastExport, and the SQL data access utility BTEQ.
2-8 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Virtual Processors
Virtual Processors 2
Introduction 2
The versatility of the Teradata RDBMS is based on virtual processors (vprocs), which eliminate dependency on specialized physical processors.
This is made possible by the Parallel Database Extensions (PDE) for UNIX. The PDE is an interface layer between the Teradata RDBMS and the standard UNIX operating system that runs on the NCR server.
A vproc is a collection of tasks running under the multitasking environment of the UNIX operating system. The tasks in a vproc share resources with other tasks in the same vproc. Multiple vprocs can run on an SMP platform or a node.
The vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from each other. This message communication is done using the Vnet software on single node platforms and using the BYNET and BYNET Driver Software on multinode platforms.
There are two types of vprocs:
Each type of vproc is described in the following passages.
PEs 2
Each Parsing Engine (PE) executes the database software that manages sessions and decomposes SQL into parallel steps.
The software, as shown in Figure 2-1, consists of the following elements:
Parser (including the Optimizer) Dispatcher Session Control
The Parser decomposes the SQL into relational data management processing steps.
Type Description
PE Performs session control and dispatching tasks as well as parsing functions.
AMP Manages the distribution and retrieval of data on the virtual disks (vdisks), which are defined at system configuration time with the pdeconfig utility.
Introduction to the Teradata RDBMS for UNIX 2-9
Teradata RDBMS Architecture Virtual Processors
The steps are passed to the Dispatcher, which sends the steps to the appropriate AMPs.
Session Control provides user session management such as establishing and terminating sessions.
Figure 2-1 PE Software Components
AMPs 2
Each Access Module Process (AMP) executes the database software that performs relational functions and data management.
Each AMP, as shown in Figure 2-2, is assigned a portion of the database to control.
Each AMP provides the following functions:
Data access Concurrency control Journaling Cache management Recovery functions.
Each AMP maintains its portion of the database tables stored on disks.
Figure 2-2 AMP Software Components
GG01A029
Teradata RDBMS Architecture The Parsing Engine
The Parsing Engine 2
Introduction 2
The Parsing Engine is the processor that communicates with the client system on one side and with the AMPs (via the BYNET or Vnet) on the other.
Each PE executes the database software that manages sessions, decomposes SQL statements into parallel steps, and returns the answer rows to the requesting client.
The major components of the PE are
Session Control SQL Parser Dispatcher.
Client Interface 2
The client interface provides handshaking across the communications channel between the server and its client or clients.
For a mainframe link, the connection is made by means of either:
Serial (ESCON) Parallel (Bus & Tag) Channel
implemented by means of the Teradata Channel Interface (TCI) protocol handler.
In the case of a network link, the connection is by means of a LAN connection using either:
TCP/IP ISO/OSI protocols
Session Control 2
Session numbers are assigned by the TDP and communicated to the server.
The PE establishes a session only if it can validate the username, password, and user type (application program, interactive BTEQ terminal, or third party software product). All subsequent traffic for the session are identified by their host id, session number, and request number.
Introduction to the Teradata RDBMS for UNIX 2-11
Teradata RDBMS Architecture The Parsing Engine
Input Data Conversion 2
The Teradata RDBMS is an ASCII machine. The parsing engine converts EBCDIC (and other non-ASCII) input to ASCII before processing it.
2-12 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The Parsing Engine
SQL Parser 2
The SQL parser handles all incoming SQL requests. It processes these requests as follows.
Stage Process
1 The Parser looks in the Request cache to determine if the request is already there.
IF the request is . . . THEN the Parser . . .
in the Request cache generates AMP steps and passes them to the gncApply software.
not in the Request cache begins processing the request with the Syntaxer.
2 The Syntaxer checks the syntax of an incoming request.
IF there are . . . THEN the Syntaxer . . .
no errors converts the request to a parse tree and passes it to the Resolver.
errors passes an error message back to the requestor.
3 The Resolver adds information from the Data Dictionary cache to convert database, table, view, and macro names to numeric identifiers, then produces lists of objects and access rights. The output is a Resolver tree, which the Resolver passes to a security checking mechanism.
4 The security module checks access rights in the Data Dictionary.
IF the access rights are . . . THEN the Security module . . .
valid passes the request to the Optimizer.
not valid aborts the request.
5 The Optimizer determines the most effective way to access the data needed by the request.
6 The Optimizer scans the request to determine where locks should be placed, then passes the optimized parse tree to the Generator.
Introduction to the Teradata RDBMS for UNIX 2-13
Teradata RDBMS Architecture The Parsing Engine
The Dispatcher 2
The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET (or Vnet on single node systems) to be distributed to the AMP database management software.
Note that AMP steps can be sent in any one of the following ways:
Between one PE and one AMP using the hashing algorithm Among a selected group of AMPS (referred to as a dynamic
BYNET (or Vnet) group Among all AMPs in the system.
7 The Generator transforms the optimized parse tree into plastic steps and passes them to the gncApply software.
Plastic steps are directives to the database management system that do not contain data values
8 gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps.
Concrete steps are directives to the database management system that contain user- and session-specific information as well as data parcels.
9 gncApply passes the concrete steps to the Dispatcher.
Stage Process
1 The Dispatcher receives concrete steps from gncApply.
2 The Dispatcher places the first step on the BYNET (or Vnet)— the Dispatcher tells the BYNET whether the step is for one AMP, several AMPS, or all AMPS—and waits for a completion response.
Whenever possible, the Teradata RDBMS performs steps in parallel to enhance performance.
3 The Dispatcher receives a completion response from one or several AMPS and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done.
Stage Process
Teradata RDBMS Architecture The Parsing Engine
Dispatching the Steps 2
The Dispatcher controls the sequence in which steps are executed and passes the steps onto the Vnet (single node systems) or BYNET (multinode systems). Once the steps are handed over to the Vnet or BYNET, they are referred to as AMP steps. The Dispatcher tells the Vnet or BYNET whether an AMP step is for one AMP, a group of AMPs, or all AMPs.
When the Dispatcher receives a completion response from an AMP or AMPs, the Dispatcher sends the next step via the Vnet or BYNET until all of the AMP steps associated with a request are complete.
The Vnet or BYNET software controls the transmission of messages to and from the AMPs. See Figure 2-3, where 12 rows of a table are distributed among disks attached to four AMPs.
If a request is for a single row, the PE transmits steps to a single AMP, as shown at PE 1 in Figure 2-3. If the request is for many rows (an all-AMP request), the PE causes the Vnet or BYNET to broadcast the steps to all AMPs as shown at PE 2 in Figure 2-3 . To minimize system overhead, the PE can send a step to a subset of AMPs.
Figure 2-3 PE Routing of Teradata SQL Request Messages
HD14A001
PE 2PE 1
Teradata RDBMS Architecture The Parsing Engine
As an example, consider the following two Teradata SQL statements from a table of checking account information:
1.SELECT * FROM Table_01 WHERE AcctNo = 129317 ; 2.SELECT * FROM Table_01 WHERE AcctBal > 1000 ;
In this example:
PEs 1 and 2 receive requests 1 and 2. The data for account 129317 is contained in table row R9 stored
on AMP 1 Information about all account balances is distributed evenly
among the disks of all four AMPs
The PE 1 Parser determines that its request is a primary-index retrieval, which calls for access and return of one specific row.
The Dispatcher in PE 1 then issues a message to the Vnet or BYNET containing an appropriate read step and R9/AMP 1 routing information. Once the desired record is received from AMP 1, PE 1 transmits the data back to the TDP.
The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the Vnet or BYNET containing the appropriate read step to be broadcast to all four AMPs.
Once results are received from the AMPs, PE 2 transmits the data back to the TDP.
To enhance system performance, the RDBMS executes steps in parallel whenever possible.
Parallel steps can work with multi-statement requests, macros, and single statements and can provide a significant improvement in response time.
For example, the response time of a multi-statement request consisting of four statements that can be independently executed may be cut in half.
Processing the Steps 2
The AMPs are responsible for obtaining the rows required to process the request.
The software on the AMPs does the following:
Processes AMP steps by performing select, insert, delete, and update operations on the data on the disks.
Performs other functions associated with AMP steps such as journaling, space accounting, index maintenance, and output data conversion.
Performs utilities to configure and reconfigure the RDBMS. (See Chapter 5, “Database Administration” for more information.)
2-16 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The Parsing Engine
Uses the file system software to perform primitive physical data block operations.
An AMP step can be sent in one of the following ways:
Between one PE and one AMP using hashing algorithm. Among a selected set of AMPs, called a dynamic Vnet or BYNET
group. Among all AMPs in the system.
An AMP step is broadcast to all AMPs when a full-table scan is requested or when the operation uses nonunique secondary indexes (NUSIs).
When an operation uses a unique primary index (UPI), nonunique primary index (NUPI), or unique secondary index (USI), the message includes the row hash value, which is used by the Vnet or BYNET to route the message to the correct vproc.
The sequence of AMP step processing is as follows:
Each AMP is associated with disks and uses its file system software to control the reading and writing of data on its disks.
The file system controls primitive physical data block reads, and translates AMP software row requests into physical data block requests.
Step Step Name Function
1 Lock Ensures that users who are concurrently trying to update the same rows do not violate the consistency of the data.
If the operation uses a UPI, NUPI, or USI, this step is incorporated into step 2.
2 Operation Performs the actual task required: select, delete, insert, update, sort.
There may be many operation steps.
3 End transaction
Required only for multiple AMP steps.
If the request is for a UPI, no end transaction step is necessary.
The end transaction step tells all AMPs that worked on the request that processing is complete.
Introduction to the Teradata RDBMS for UNIX 2-17
Teradata RDBMS Architecture Structured Query Language
Structured Query Language 2
This topic describes SQL, the Structured Query Language.
SQL is the only language the Teradata RDBMS understands. It is the ANSI standard language for relational database management.
Why SQL? 2
SQL has the advantage of being the most commonly used language for relational database management systems.
Because of this, both the data structures in the database and the commands for manipulating those structures are controlled using SQL. Additionally, all applications, whether written in a client language with embedded SQL, a macro, or an ad hoc SQL query, are written and executed using the same set of instructions and syntax.
Other database management systems use different sublanguages for data definition and data manipulation and do not permit ad hoc queries of the database. This means that you must use one language to define your data and yet another to query and update it. And you are restricted to running applications written by programmers. You have very little flexibility with nonrelational database management systems.
SQL Flagger 2
The Teradata RDBMS has an optional feature that detects non-ANSI SQL extensions (for entry level ANSI SQL92 only) and reports them back to the user (either to an embedded SQL program or to BTEQ) without terminating execution of the query.
2-18 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Structured Query Language
SQL Lexicon 2
Like any language, SQL has its rules for writing statements.
The following table describes the SQL lexicon.
Lexical Component Description
Word A character string of from 1 to 30 characters derived from the following character set: Roman characters (both cases) Digits $ # _ Keywords are a special category of words that are reserved for use in SQL statements. You cannot use keywords as object names.
Introduction to the Teradata RDBMS for UNIX 2-19
Teradata RDBMS Architecture Structured Query Language
Delimiter Special characters whose meaning depends on context. The Teradata SQL delimiters and their functions are as follows.
Delimiter Function
‘ Separates items in a list Acts as a date separator
: Prefixes a referenced parameter or client system variable Acts as a date separator
. Separates a database name from a table name Separates a table name from a column name Acts as the decimal point Acts as a date separator
; Separates statements in a request Terminates a request (BTEQ)
‘ Defines boundaries of character string constants Acts as a data separator
“ Defines the boundaries of nonstandard names
/ Used as a date separator
B Blank. Used as a date separator
- Used as a date separator
Constant Numerics, strings, and characters embedded in a statement.
Lexical Component Description
Teradata RDBMS Architecture Structured Query Language
Operator A set of symbolics used to express logical and arithmetic operations. Operators of the same precedence are evaluated from left to right. The following table shows the operators from highest to lowest precedence.
Result Type
numeric numeric + numeric numeric - numeric
string concatenation operator
logical value EQ value value NE value value GT value value LE value value LT value value GE value value IN set value NOT IN set value BETWEEN value AND value
charvalue LIKE charvalue
logical NOT logical
logical logical AND logical
logical logical OR logical
Lexical separator A character string that can exist between words, constants, and delimiters without changing the meaning of a statement. Valid lexical separators are: Comments Blanks Return characters (X’0D’)
Lexical Component Description
Teradata RDBMS Architecture Structured Query Language
Character Sets 2
The Teradata RDBMS supports multinational and multibyte character sets in several different environments.
Among the character sets supported are:
Kanji Katakana Hiragana European languages with characters using the umlaut, tilde, or
ring
The RDBMS provides multibyte support for the following operating systems:
MVS VM/CMS UNIX DOS/V
Multibyte support exists for the following Teradata software:
Server-based utilities Client-based utilities BTEQ Preprocessor2 (embedded SQL) TDP CLIv2
Users control the current character set and collation sequences using SQL statements.
Statement separator A character that separates each statement of a multistatement request. The Teradata SQL separator is the semicolon.
Request terminator A character that terminates a request in the body of a macro or that is entered from BTEQ. The Teradata SQL request terminator is the End of Text character for macros or the semicolon for BTEQ.
Lexical Component Description
Teradata RDBMS Architecture Query Facilities
Query Facilities 2
The Teradata RDBMS supports several different facilities for making interactive or batch queries of the database from a terminal.
These include:
Basic Teradata Query facility (BTEQ) Fourth generation languages
Because SQL is the only language the Teradata RDBMS understands, all application programming facilities ultimately make their queries against the database using the SQL language.
BTEQ 2
The Basic Teradata Query facility is a SQL formatter/report generator that allows you to create and perform SQL queries interactively or in batch mode from an interactive terminal.
BTEQ supports the following facilities:
Multiple Teradata SQL statements per request Read from and write to client data files Manage multiple sessions per job Format output and write sophisticated reports
BTEQ is supported on the following platforms:
Channel-attached client Network-attached client Teradata server
Introduction to the Teradata RDBMS for UNIX 2-23
Teradata RDBMS Architecture The BYNET
The BYNET 2
This topic explains the concepts behind the interprocessor network technology used by the Teradata RDBMS: the BYNET.
BYNET Functions 2
At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. This view does an injustice to the BYNET, however, because the capabilities of the network range far beyond those of a simple system bus.
The BYNET also possesses high speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions.
A multinode system has two BYNETs. This both creates a fault tolerant environment and provides for enhanced interprocessor communication. When BYNET traffic becomes particularly heavy, the two BYNETs can handle separate (rather than redundant) traffic. The machine provides load balancing software to optimize this process.
The total bandwidth for each network link to a processor node is 10 megabytes. Because there are two network links per node and because the bandwidth is linearly scalable, the total throughput available for each node is 20 megabytes.
For example, a 16-node 5100M system has 320 megabytes of bandwidth for point-to-point connections.
Total available broadcast bandwidth for any size 5100M system is 20 megabytes.
The BYNET software provides a standard TCP/IP interface for communication among the SMP nodes.
Figure 2-4 illustrates how the BYNET connects individual SMP nodes to create an MPP system in the 5100M configuration.
2-24 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The BYNET
Figure 2-4 How the BYNET connects individual SMP nodes
Virtual Processor Connectivity in Single Node Systems 2
Single node systems mimic the BYNET with a software emulation called the Vnet. Vnet represents “virtual network.”
GG01B002
Teradata RDBMS Architecture The Access Module Process
The Access Module Process 2
Introduction 2
The Access Module Process (AMP) is the heart of the Teradata RDBMS. The Access Module Process is a virtual processor (vproc) that provides a BYNET interface and performs many database and file management tasks.
AMPs control the management of the Teradata RDBMS and also provide control over the disk subsystem, with each virtual AMP being assigned to a virtual disk.
AMP Functions 2
BYNET (or Vnet) interface Database manager
Locking Joins Sorting Aggregation Output data conversion Disk space management Accounting Journaling
File system and disk management
Scalability and Performance 2
You can increase the performance of a Teradata RDBMS by adding SMP nodes to your system. Performance increases at a nearly linear rate with the addition of SMP nodes to a 5100M configuration.
The Disk Subsystem 2
Each AMP supports one virtual disk unit, using either RAID1 (mirroring) or RAID5 (parity striping) technology.
AMP Clusters 2
AMPs are grouped into logical clusters to enhance the fault tolerant capabilities of the Teradata RDBMS. This method of creating additional fault tolerance in your system is discussed further in Chapter 9, “Fault Tolerance.”
2-26 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Request Packaging and Unpackaging
Request Packaging and Unpackaging 2
Introduction 2
Any SQL statement must be packaged before being transmitted to the server-based database where it is executed. The returned response must then be unpackaged and presented to the requesting terminal or application program.
This topic discusses the mechanism for request handling used by the Teradata RDBMS.
Facilities for Packaging and Unpackaging SQL Requests and Results 2
The Call-Level Interface (CLI) is the primary mechanism the Teradata RDBMS uses to package and unpackage SQL requests and results. It is the principal API for the Teradata RDBMS.
The CLI packages queries into uniform blocks that are routed to the server by the Teradata Director Program (TDP) in IBM mainframe configurations or by the MTDP in other configurations.
Result tables returned to the requesting terminal or application are similarly routed by the TDP to the appropriate requester where they are unpackaged and presented as a results table.
Personal computers running Microsoft Windows® can use the Windows CLI (WinCLI) package to access the Teradata RDBMS. WinCLI uses the Dynamic Data Exchange (DDE) protocol to communicate with application programs.
The industry-standard ODBC driver to the Teradata RDBMS is another API for packaging and unpackaging SQL requests.
Introduction to the Teradata RDBMS for UNIX 2-27
Teradata RDBMS Architecture Data Communications Management in the Teradata RDBMS Environment
Data Communications Management in the Teradata RDBMS Environment 2
Introduction 2
This topic discusses the Teradata RDBMS component that handles all data communications management: the Teradata Director Program (TDP).
The TDP 2
SQL requests from a client-based user, whether made as an interactive query or from an application program, are transmitted in the form of CLI packet messages, as are the responses to the query.
These transmissions are managed by a data communications manager.
In the Teradata RDBMS, the data communications manager is called the Teradata Director Program, or TDP.
The TDP does all of the following:
Establishes and manages session control Routes requests Routes logons Verifies users Initiates recovery and restart processing Monitors and controls security
The Teradata RDBMS also provides facilities to enable the TDP to communicate with client application services.
The Micro TDP 2
Workstation clients run a version of the TDP called the Micro TDP (MTDP) and an additional component called the Micro Operating System Interface (MOSI), which contains libraries of procedures to handle operating system-dependent and communications protocol- dependent services.
The MTDP calls MOSI routines for system services like:
Interrupt processing I/O processing Network connection and processing
2-28 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Application Programming Facilities
Application Programming Facilities 2
This topic discusses the application programming facilities provided by the Teradata RDBMS software.
This software falls into several broad categories:
Embedded SQL Call Level Interface ODBC
Because SQL is the only language the Teradata RDBMS understands, all application programming facilities ultimately make their queries against the database using the SQL language.
Embedded SQL 2
The Teradata RDBMS provides a preprocessing facility that enables you to include ANSI-compliant SQL statements in your application programs.
The SQL preprocessor parses your application code for SQL statements, converts them to CLI calls, and then comments out the SQL statements. After the application code has been preprocessed by the Teradata RDBMS Preprocessor2, you can submit it to your client application language compiler.
Preprocessor2 supports the following client programming languages.
Call-Level Interface 2
The Call-Level Interface (CLI) is an application programming interface that provides facilities that enable any client application programming language that supports a CALL statement to query the Teradata RDBMS.
The CLI is also supported directly on NCR servers running the Teradata RDBMS.
A Windows®-based version of CLI, called WinCLI, is also available.
This programming language . . . Is supported on this platform . . .
PL/I IBM mainframe clients
C IBM mainframe clients UNIX clients
Introduction to the Teradata RDBMS for UNIX 2-29
Teradata RDBMS Architecture Application Programming Facilities
ODBC 2
Open Database Connectivity (ODBC) is an industry standard application programming interface you can use with Microsoft Windows®, Windows® NT, and Windows 95 to make SQL queries against a Teradata RDBMS database.
The ODBC Driver for Teradata RDBMS provides Core-level SQL and Extension-level 1 (with some Extension-level 2) function call capability using the Windows® Sockets (WinSock) TCP/IP communications software interface.
An additional software package, the Database Query Manager, permits the Teradata RDBMS to manage requests from applications running under Windows®, Windows® NT, and Windows® 95 using ODBC.
ODBC operates independently of CLI and WinCLI.
2-30 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Archiving and Data Loading Utilities
Archiving and Data Loading Utilities 2
Introduction 2
The Teradata RDBMS provides several utilities for archiving and restoring the database and for data loading. Data loading utilities are typically used in a decision support environment where the client machine gathers data during the day and dumps it to the server over night. This topic briefly describes these utilities.
Archive and Restore Utility and ASF2 2
The Archive and Restore utility and the Archive Storage Facility (ASF2) support archiving of databases, individual tables, or permanent journals to any of the following media:
3500/4500/5100 tape (ASF2 only) Client tape Client file
The utility also restores databases from those archival media to the Teradata RDBMS.
Archive and Restore is supported in the MVS and VM environments only.
BulkLoad 2
The BulkLoad utility permits batch insert, update, and delete operations on an existing database. The program moves large quantities of data from a client to the Teradata RDBMS on the server.
BulkLoad is supported in the MVS and VM environments only.
FastLoad 2
The FastLoad utility permits you to load unpopulated tables only. The program is similar to BulkLoad except that it runs much faster and does not support update and delete operations.
FastLoad is supported in both the client and server environments.
MultiLoad 2
The MultiLoad utility loads large quantities of data into unpopulated tables. MultiLoad also supports bulk inserts, updates, and deletions against populated tables.
MultiLoad is supported in both the client and server environments.
FastExport 2
The FastExport utility exports large quantities of data from the Teradata RDBMS to a client and is the functional complement of the FastLoad and MultiLoad utilities.
Introduction to the Teradata RDBMS for UNIX 2-31
Teradata RDBMS Architecture Administrative Workstation
Administrative Workstation 2
The Administrative Workstation (AWS) performs many of the functions of a system console for multinode Teradata RDBMS systems.
Single node systems do not have an AWS.
It is an intelligent workstation attached to an SMP node and its primary roles are to:
Monitor system performance Provide an input mechanism for the system administrator.
2-32 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Window
Database Window 2
Introduction 2
The console software for the Teradata RDBMS for UNIX is called the Database Window (DBW). It runs in the following X windows environments:
System console Administrative workstation (AWS) Remote workstation or PC
The Database Window provides an interface to all the following windows:
Supervisor Database Message Application Windows (including any currently active support
utilities)
Workstation Types and Available Platforms 2
Some of the workstation types are available only on specific platforms.
The following table shows which workstations are appropriate for the different platforms and how they are connected to the node.
Database Window Communication 2
The DBW communicates with the Teradata RDBMS through the console subsystem (CNS), which is part of the PDE. Because the DBW is managed by the CNS, you will occasionally see CNS messages in the DBW.
Type of Workstation Platform Description
System console SMP Connected directly to the SMP node
Administrative workstation MPP LAN-connected through an Ethernet card on the node. The AWS provides a single operational view of the multiple-node system.
Remote connection through LAN: UNIX workstation PC with X Windows server
Both LAN-connected through an Ethernet card on the node.
Introduction to the Teradata RDBMS for UNIX 2-33
Teradata RDBMS Architecture Database Window
Functions Provided by the Database Window 2
The system console provides all of the following functions:
Displays system status Displays the current system configuration Displays performance statistics Controls various AMP utilities
Supervisor Subwindow 2
The DBW has a main window and several subwindows. The principal subwindow, called the Supervisor Subwindow, permits an operator to run utilities and enter various commands.
Utilities Available from the Supervisor Subwindow 2
Many utilities used to control, monitor, and configure the RDBMS are available from the Supervisor subwindow. A partial list of the utilities invoked from the DBW is provided in Chapter 12, “System Administration,” in the section “System Utility Software.”
Supervisor Commands Available from the Database Window 2
The following table lists the commands available from the Supervisor Subwindow of the Database Window.
Command Function
CNSSET DBWTIMEOUT
Sets how often the CNS checks the connection between the CNS and the DBW.
CNSSET LINES Sets the number of lines that are saved and available to you in the output display area after a reconnect to the CNS.
CNSSET STATEPOLL
Sets how often the CNS checks the RDBMS state and substate.
CNSSET TIMEOUT Sets the interval between the time you type a request and the time the DBW rejects it because a program did not solicit the input.
DISABLE LOGONS Prevents new sessions from logging on.
ENABLE LOGONS Restores the ability of new sessions to log on.
GET CONFIG Displays the current system configuration.
GET LOGTABLE Displays the status of logging to the specified resource usage tables.
GET RESOURCE Displays the resource collection and logging rates, and the memory clearing rate of a vproc or node.
GET TIME Displays the current date and time.
GET VERSION Displays the PDE and RDBMS version numbers.
2-34 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Window
LOG Logs the specified text into the errorlog.
QUERY STATE Displays the current state of the RDBMS.
RESTART TPA Restarts the RDBMS.
SET LOGTABLE Enables or disables logging to the specified resource usage tables.
SET RESOURCE Sets the resource collection and logging rates, and the memory buffer clearing rate of a vproc or node.
START Starts a RDBMS utility in a DBW application subwindow.
STOP Stops a RDBMS utility in a DBW application subwindow.
Command Function
Teradata RDBMS Architecture RDBMS Gateway
RDBMS Gateway 2
The RDBMS Gateway maps the external network protocols onto the internal database message protocols. It is a server program that provides a pathway for applications running on a network- connected client to access the Teradata server.
The RDBMS Gateway also permits clients running locally to communicate with the Teradata RDBMS.
There is one RDBMS Gateway per machine, controlling up to 600 sessions per node.
2-36 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Utility Software
Database Utility Software 2
Database utilities are used to perform maintenance functions on the Teradata RDBMS.
They are invoked from the Database Window with the following exceptions:
The system utilities include:
Utility Name Runs under . . .
DIP control of BTEQ as well as the Database Window.
XPT UNIX as an application.
xperfstate UNIX as an application.
Utility Name Function
Config Specifies logical database configuration (AMPs and PEs).
XCTL Displays and modifies the fields of the Control Parameters Globally Distributed Objects (GDO) of the Parallel Database Extension (PDE) software.
Accessed from an xterm window.
DBSControl Specifies global runtime flags for database software.
VprocManager Provides status for vprocs and permits manipulation of their attributes.
GtwGlobal Manages LAN connections.
Teradata RDBMS Architecture Database Utility Software
Ferret Displays and sets various disk space utilization attributes without destroying the data for which the File System is responsible.
For new attributes, Ferret reconfigures the stored data dynamically to match them.
Utilities running under Ferret include the following:
Scandisk Showspace Packdisk
Filer Displays information used to correct problems within the File System.
pdeconfig Allocates PE and AMP vproces to physical resources, including all of the following: Configuring disk arrays Assigning logical units (LUNS) to the disks Allocating disks to AMPs Allocating LANs and channels to PEs Always run xmppconfig before running pdeconfig.
QryConfig Displays the current database software logical configuration.
QrySessn Displays session status information.
RcvManager Displays recovery status.
Rebuild Reconstructs tables from fallback copies (only works when fallback is used).
Reconfig Redistributes disk data automatically whenever AMP vprocs are added or removed.
Showlocks Displays host utility (HUT) locks on databases and tables.
SysInit Initializes the Teradata system tables and all user tables.
xmppconfig Sets up and updates configurations.
Use this utility to specify the physical configuration before running pdeconfig. Must be run prior to pdeconfig for MPP systems.
DIP Executes one or more of the standard DIP (Database Initialization Program) SQL scripts packaged with the RDBMS
Utility Name Function
Teradata RDBMS Architecture Database Utility Software
XPT Installs multiple copies of the same software across all nodes of an MPP system.
xperfstate Provides real time display of PDE system performance, including system-wide CPU utilization and disk utilization.
Utility Name Function
Teradata RDBMS Architecture Teradata Manager
Teradata Manager 2
Introduction 2
Teradata Manager is a PC-based package that provides easy access to resource usage information in the Teradata Data Dictionary. The PC supporting Teradata Manager must be running the Windows NT operating system.
Performance Analysis 2
The Teradata Manager Performance Monitor uses two commands to monitor the performance of the Teradata RDBMS.
The commands are:
MONITOR CONFIG MONITOR SUMMARY
You can specify date sampling rates and durations and the Teradata Manager collects and analyzes the data for you. Results of data analyses can be displayed in a text window.
The Locking Logger feature permits you to determine whether an application mix is causing delays because of database lock contention.
Session Information 2
Setting session rates Monitoring sessions Identifying sessions Aborting sessions
Statistical Information 2
Teradata Manager provides facilities for:
Detecting which tables have statistics Create statistics for columns and indexes Drop statistics by table or column/index Refresh statistics for:
Entire Teradata RDBMS Database Table Column/Index
2-40 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture For More Information
For More Information 2
For more information on the topics presented in this chapter, see the following Teradata RDBMS manuals.
IF you want to learn more about . . . THEN see this manual . . .
System process flows Teradata RDBMS for UNIX Database Design and Administration
Teradata SQL Teradata RDBMS for UNIX SQL Reference
General Teradata software architecture
The TDP Teradata TDP Reference
Preprocessor2 Teradata Application Programming Using Embedded SQL
Embedded SQL Teradata RDBMS for UNIX SQL Reference Teradata Application Programming Using Embedded SQL for C, COBOL, and PL/I
Teradata Manager Teradata Manager Reference Guide
ODBC Teradata ODBC Driver for Windows Installation and User’s Guide
Introduction to the Teradata RDBMS for UNIX 2-41
Teradata RDBMS Architecture For More Information
2-42 Introduction to the Teradata RDBMS for UNIX
The Relational Model
Chapter 3
Introduction to the Teradata RDBMS for UNIX 3
-1
About This Chapter 3
Introduction 3
This chapter reviews the relational model for database management. The chapter also describes issues like normalization, referential integrity, and macros.
The relational model for database management is based on concepts derived from the mathematical theory of sets. This chapter touches on the relational model from that viewpoint to establish its solid foundation in mathematics. By way of comparison, database management products based on the hierarchical, network, and object-oriented architectures are not based on rigorous theoretical foundations and so their behavior is not as predictable as are relational products.
Database management systems based on the hierarchical, network, and object-oriented models use different languages to define and manipulate the database, and none provides the capability for making ad hoc queries.
The chapter describes the process of further normalization of a database, then describes macros in the Teradata environment.
What is a Relational Database? 3
A relational database is a database that is perceived by its users as a collection of tables and nothing but tables. This deceptively simple concept permits information to be created and maintained without any kind of anomalies as well as providing users with a simple presentation of data which can, in turn, be manipulated with ease.
The freedom from anomalies is based on the fact that relational databases are based on the mathematics of set theory. Roughly speaking, set theory defines a table as a relation. Each entity in a relation is called a tuple and each column is an attribute. The number of tuples is the cardinality of the relation and the number of attributes its degree.
The following table presents these correspondences. Note that relational databases are a generalization of the mathematics of set theory relations and the correspondences between set theory and relational databases are not always direct.
Set theory term Relational database term
Relation Table
The Relational Model About This Chapter
Because the mathematical operations on relations are well-defined, any manipulation of a table in a relational database has a consistent, predictable outcome. This contrasts with all other database management systems, none of which is based on mathematical theory and none of which treats its data formally. Because the operations on relational databases are so well defined, users can perform ad hoc, interactive queries of the database-—unlike other database management systems that require a system programmer to predefine all links between files and all possible queries of the database.
Under the covers, the SQL optimizer uses relational algebra to build the most optimal access to the requested data. Because the definition of the database can change from time to time, the optimizer can readily adapt to any such changes and reoptimize access paths without programmer intervention.
Some Other Definitions 3
The following terms are defined now to make the discussion that follows easier to understand.
Term Definition
Primary key A unique identifier for a relation.
In set theory (and in relational database theory), duplicate rows are not allowed. However, commercially available relational databases often allow duplicate rows in relations. In those cases, the relation does not have a primary key.
Relations with a primary (or candidate) key do not permit duplicate rows.
The Teradata RDBMS permits enforcement of the no duplicates rule even when no primary key is specified.
Candidate key Any relation might have multiple unique identifiers. Each such unique identifier is called a candidate key.
A candidate key must satisfy the properties of uniqueness and minimality. That is, for any attribute, no two rows of the table have the same value for that attribute and if it is composite, no component can be eliminated without destroying the uniqueness property.
Alternate key Any candidate key not chosen as the primary key.
3-2 Introduction to the Teradata RDBMS for UNIX
The Relational Model About This Chapter
Foreign key A primary key in another relation that is also a column value in the current relation. Foreign keys are used to join tables and may participate in the primary key.
Functional dependence
Attribute X is functionally dependent on attribute Y if and only if each Y value in the relation has associated with it exactly one X value.
Full functional dependence
Attribute X is fully functionally dependent on attribute Y if and only if it is functionally dependent on Y and not functionally dependent on any proper subset of Y.
Transitive dependence
A state in which an attribute is fully functionally dependent, but by means of an intermediate attribute. Transitive dependence is a state that normalization seeks to eliminate.
Determinant Any attribute on which some other attribute is fully functionally dependent.
Multivalued dependence
Given a relation with attributes X, Y, and Z, the multivalued dependence holds if and only if the set of Y-values matching a given (X-value, Z-value) pair depends only on the X-value and is independent of the Z-value.
Join An operation in which data is retrieved from more than one table.
Join dependency
A relation satisfies join dependency if and only if it is equal to the join of its projections on its component attributes.
Term Definition
The Relational Model Normalization
Normalization 3
Introduction 3
The theory of normalization is at the root of the relational model of database management. Normalization theory is constructed around the concept of normal forms. These normal forms define a system of constraints. If a relation meets the constraints of a particular normal form, then it is said to be in that form.
You can think of the normal forms as an onion, with the outermost layer being the set of all relations, including unnormalized relations. The figure that follows illustrates this. As you work your way to the core of the onion, you must pass through each lower normal form. As a result, a relation that has achieved fifth normal form has also achieved first, second, third, and fourth normal forms.
Figure 3-1 Layers of normalization.
5NF relations
BCNF relations
3NF relations
2NF relations
1NF relations
All relations
The Relational Model Normalization
By definition, a relational database is always normalized because its field values are always atomic. But to simply leave it at that invites a number of problems with redundancy and potential update anomalies, and that is why the higher normal forms were developed. The next topics describe normal forms and how to achieve them.
Introduction to the Teradata RDBMS for UNIX 3-5
The Relational Model First, Second, and Third Normal Forms
First, Second, and Third Normal Forms 3
Introduction 3
This topic describes the first three normal forms: what they are, why they are needed, and how to achieve them.
The first three normal forms are stepping stones to Boyce-Codd normal form and, when appropriate, the higher normal forms.
The next topic describes Boyce-Codd (BCNF) and higher normal forms.
First Normal Form 3
First normal form (abbreviated 1NF) is definitive for a relational database. All relations in a relational database must be in first normal form by definition.
A relation is said to be in first normal form if all its fields (simple domains in mathematics) are atomic. This means that a field can contain one value and one value only. No hierarchies of data values are allowed. This concept is sometimes referred to as the elimination of repeating groups from a relation.
The formal definition is as follows: For a relation to be in first normal form, the relationship between the primary key of the relation and each of the other attributes must be one-to-one (in that direction). In other words, all underlying simple domains of the relation contain atomic values only.
The nonkey attributes are said to be functionally dependent on the key.
Note: a nonkey attribute is any attribute that is not part of the primary key for the relation.
Second Normal Form 3
Second normal form (abbreviated 2NF) deals with the elimination of circular dependencies from a relation.
A relation is said to be in second normal form if it is in 1NF and every nonkey attribute is fully dependent on the entire primary key.
The formal definition is as follows: For a relation to be in second normal form, the relationship between any portion of the primary key of a relation and each of the other columns must not be one-to- one (in that direction). In other words, the nonkey columns are fully functionally dependent on the key.
3-6 Introduction to the Teradata RDBMS for UNIX
The Relational Model First, Second, and Third Normal Forms
Third Normal Form 3
Third normal form (abbreviated 3NF) deals with the elimination of nonkey attributes that do not describe the primary key.
The formal definition is as follows: For a relation to be in third normal form, the relationship between any two nonprimary key columns or groups of columns in a relation must not be one-to-one in either direction. In other words, the nonkey columns are nontransitively dependent upon each other and the key. No transitive dependencies implies no mutual dependencies.
Attributes are said to be mutually independent if none of them is functionally dependent on any combination of the others. This mutual independence ensures that individual attributes can be updated without any danger of affecting any other attribute in a row.
Introduction to the Teradata RDBMS for UNIX 3-7
The Relational Model Boyce-Codd and Higher Normal Forms
Boyce-Codd and Higher Normal Forms 3
Introduction 3
When the relational model of database management was originally proposed, it only addressed the first three normal forms. Later work with the model showed that 3NF required further refinement to ensure that update anomalies would never occur.
This topic describes Boyce-Codd normal form and briefly mentions fourth and fifth normal forms for completeness.
Boyce-Codd Normal Form 3
Third normal form does not handle situations in which a relation has multiple composite candidate keys with overlapping attributes. To eliminate these problems, Codd developed the so-called Boyce- Codd normal form (BCNF), which reduces to 3NF whenever the special situation that defines this problem does not apply.
A relation is in BCNF if and only if every determinant is a candidate key. This means that only determinants are candidate keys.
Fourth Normal Form 3
A relation is said to be in fourth normal form (4NF) if and only if whenever there is a multivalued dependency in the relation (for example, say X multiply determines Y) then all attributes of the relation are also functionally dependent on X.
In practice, the need for 4NF is rarely seen.
Fifth Normal Form 3
So far it has been possible to normalize relations by decomposing them into two of its projections. In rare occasions, simple projections are not sufficient to decompose a nonnormal relation into two relations. In these rare instances, Fifth Normal Form (5NF) is used to decompose the unnormalized relation into three or more projections of the original relation.
A relation is said to be in fifth normal form (5NF - sometimes called projection-join normal form, or PJ/NF) if and only if every join dependency in the relation is a consequence of the candidate keys of the relation.
This makes 5NF the final possible normal form to be achieved by taking projections and using joins. It is guaranteed to be free of all anomalies that can be removed by taking projections, but not necessarily of all possible anomalies.
3-8 Introduction to the Teradata RDBMS for UNIX
The Relational Model Referential Integrity
Referential Integrity 3
Introduction 3
Referential integrity (RI) is a key concept for the relational model.
RI is defined by the Referential Integrity Rule, which states that a relational database cannot contain any unmatched foreign key values.
Enforcing RI in the Teradata RDBMS 3
To implement RI in the Teradata RDBMS, you have three choices:
Use the referential constraint checks supplied by the database software
Write your own, site-specific macros. Enforce constraints through application code.
Primary and Foreign Keys 3
For review, a primary (parent) key is the candidate key selected to identify each tuple in a relation uniquely.
A foreign key is a (possibly composite) attribute of one relation whose values are required to match those of the primary key of some other relation.
Indexes 3
An index is a special
BD10-4955-B 01.00.00 May 29, 1998
Introduction to the Teradata® RDBMS for UNIX® Version 2 Release 2.1
The product described in this book is a licensed product of NCR Corporation.
BYNET is a registered trademark of Teradata Corporation CICS, CICS/ESA, CICS/VS, DATABASE2, DB2, IBM, MVS/ESA, MVS/XA, QMS, RACF, SQL/DS, VM/XA, and VTAM are trademarks or registered trademarks of International Business Machines Corporation in the U. S. and other countries. DBC/1012 is a registered trademark of Teradata Corporation. DEC, VAX, MicroVax, and VMS are registered trademarks of Digital Equipment Corporation. EXCELAN is a trademark of Excelan, Incorporated. HEWLETT-PACKARD is a registered trademark of Hewlett-Packard Company, INTELLECT and KBMS are trademarks of Trinzic Corporation. INTERTEST is a registered trademark of Computer Associates International, Inc. ISO is a trademark of International Standards Organization. MICROSOFT, MS-DOS, DOS/V, Windows, Windows 95, and Windows NT are registered trademarks of Microsoft Corporation. SABRE is a trademark of Seagate Technology, Inc. SAS and SAS/C are registered trademarks of SAS Institute Inc. SUN and SUN OS are trademarks of Sun Microsystems, Incorporated. TCP/IP protocol is a United States Department of Defense Standard ARPANET protocol. TERADATA is a registered trademark of Teradata Corporation. UNIX is a registered trademark of UNIX System Laboratories. YNET is a registered trademark of Teradata Corporation. X/Open and the X device are trademarks of X/Open Company Limited. XNS is a trademark of Xerox Corporation.
It is the policy of NCR Corporation (NCR) to improve products as new technology, components, software, and firmware become available. NCR, therefore, reserves the right to change specifications without prior notice.
All features, functions, and operations described herein may not be marketed by NCR in all parts of the world. In some instances, photographs are of equipment prototypes. Therefore, before using this document, consult with your NCR representative or NCR office for information that is applicable and current.
To maintain the quality of our information products, we need your comments on the accuracy, clarity, organization, and value of this book. Please complete the User Feedback Form and mail or e-mail the form to:
[email protected]
Information Engineering NCR Corporation 100 North Sepulveda Boulevard El Segundo, CA 90245-4361 U.S.A.
Copyright © 1998 By NCR Corporation Dayton, Ohio U.S.A. All Rights Reserved Printed in U.S.A.
i
About This Book Preface
About This Book
Note: The name of the Teradata Database System (DBS) has been changed to the Teradata Relational Database Management System (RDBMS) to more accurately reflect the true nature of the product. This change will take place over a period of time in documentation, product names, and screen displays. In the meantime, all occurrences of “Teradata Database System,” “Teradata DBS,” or “DBS” should be read as referring to the “Teradata Relational Database Management System.”
Purpose This book provides an introduction to the Teradata RDBMS for UNIX.
Audience This book is intended for anybody who uses the Teradata RDBMS for UNIX.
How This Book Is Organized
This book contains thirteen chapters, one appendix and a glossary:
Chapter 1, “Overview,” introduces the Teradata RDBMS, including its design philosophy and goals, its shared information architecture, and its scalability.
Chapter 2, “Teradata RDBMS Architecture,” introduces the hardware and software architecture that supports the Teradata RDBMS, including both client and server software. System 3500, System 4500, and System 5100 (WorldMark) hardware is described.
Chapter 3, “The Relational Model,” presents an overview of the relational model for database management, including an introduction to normalization and a brief discussion of Teradata RDBMS macros.
Chapter 4, “Data Definition,” describes the data definition capabilities of Teradata SQL, the Structured Query Language, including how to create, change, and delete databases, tables, indexes, and macros.
Chapter 5, “Data Manipulation,” describes the data manipulation capabilities of Teradata SQL, including the SELECT, INSERT, UPDATE, and DELETE statements.
Chapter 6, “Views,” introduces the concept of the view, emphasizing that views are virtual, not base tables. The chapter also describes why views are the recommended means by which to present base table information to end users.
Preface About This Book
Chapter 7, “Data Dictionary,” describes the Data Dictionary (DD), the system catalog for the Teradata RDBMS. The DD includes definitions for the database objects, user characteristics, and much more.
Chapter 8, “Application Development,” introduces application development in the Teradata RDBMS environment, including the use of embedded SQL and CLI calls in client programming languages.
Chapter 9, “Fault Tolerance,” describes fault tolerance in the Teradata RDBMS, including both hardware and software elements.
Chapter 10, “Concurrency Control and Recovery,” introduces the topic of concurrency control and transactions. Object locking, serializability of transactions, and the two-phase commit protocol for distributed databases are among the subjects described.
Chapter 11, “Security and Integrity,” discusses security and integrity in the Teradata RDBMS environment.
Chapter 12, “System Administration,” introduces system administration of the Teradata RDBMS. Topics include user and space allocation, accounting, monitoring, and server-resident utilities.
Chapter 13, “Operating and Configuration Specifications,” describes the capacities of and requirements for the Teradata RDBMS.
Appendix A, “How the Teradata RDBMS for UNIX Differs from the Teradata RDBMS for TOS,” describes the differences between Version 1 and Version 2 Teradata database management systems.
The “Glossary” defines frequently used terms in the Teradata RDBMS environment.
Prerequisites You should be familiar with basic computer technology, NCR system hardware, the Teradata RDBMS, the system console environment, and X Windows.
It may be helpful to review the following books:
Introduction to Teradata RDBMS for UNIX Teradata RDBMS for UNIX Support Utilities Reference
ii Introduction to the Teradata RDBMS for UNIX
Preface Changes to This Book
Changes to This Book
Changes made to the Introduction to the Teradata RDBMS for UNIX are focused on DR maintenance and include:
Join Index
DR 37060
Join Index represents a new type of indexing structure. For introductory information on Join Index see page 4-10 and page 4-17.
For general information on Join Index, see the Teradata RDBMS for UNIX V2R2.1 Base System Release Definition and Transmittal Document. For usage information see the section on Join Index in the Teradata RDBMS for UNIX Database Design and Administration Manual.
RFC to provide ESCON mainframe channel connectivity
DCR 7030
This DR addresses changes to the mainframe physical connection to the Teradata server. Pages in this document that are impacted: page 2-3, page 2-4, page 2-5 and page 2-11.
Hash Join
DR 39131
Hash Join is an alternative join scheme and is introduced on page 5-13.
Decimal 18 Default is Regression Problem
DR 39789
The increase of the maximum Decimal value for TotalDigits from 15 to 18 has caused regression problems some customer applications and third party vendor processes. This DR is addressed in page 4-2 and page 4-2.
Introduction to the Teradata RDBMS for UNIX iii
Preface Changes to This Book
Minor wording changes include:
DR 38139
Throughout this reference, there is frequent mention of the DATE parameter in a 2-digit year format ‘YY/MM/DD’.
Teradata RDBMS V2R2.1 introduces the use of a system-wide default called the CenturyBreak parameter which the RDBMS software will use to internally convert 2-digit dates (‘YY’) to the correct 4-digit date (‘XXYY’). This new parameter is a new general field in the DBS control record.
For more information on the CenturyBreak parameter see Chapter 14, “Setting Up, Creating, and Modifying the Database Structure,” of the Teradata RDBMS for UNIX Database Design and Administration Manual.
iv Introduction to the Teradata RDBMS for UNIX
Preface List of Acronyms
List of Acronyms
The following acronyms, listed in alphabetical order, are used in this book:
1NF First Normal Form
2NF Second Normal Form
API Application Programming Interface
ASF2 Archive Storage Facility 2
AWS Administrative Workstation
CMS Conversational Monitor System
FIPS Federal Information Processing Standards
Introduction to the Teradata RDBMS for UNIX v
Preface List of Acronyms
I/O Input/Output
MOSI Micro Operating System Interface
MPP Massively Parallel Processing
MVS Multiple Virtual Storage
NUPI Nonunique Primary Index
NUSI Nonunique Secondary Index
ODBC Open Database Connectivity
OS/VS Operating System/Virtual Storage
PDE Parallel Database Extensions
RI Referential Integrity
SMP Symmetric Multi-Processing
TDP Teradata Director Program
TOS Teradata Operating System
TPA Trusted Parallel Application
TSO Time Sharing Option
UPI Unique Primary Index
USI Unique Secondary Index
vi Introduction to the Teradata RDBMS for UNIX
Preface List of Acronyms
VM/SP Virtual Machine/System Product
Preface Teradata RDBMS for UNIX Library
Teradata RDBMS for UNIX Library
Titles of publications in the Teradata RDBMS for UNIX library begin with Teradata RDBMS for UNIX. The following publications, listed in alphabetical order, apply to Teradata RDBMS for UNIX, Version 2 Release 2.1, and will be available May 29, 1998:
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata RDBMS for UNIX publications, please visit our Internet site at:
http://www.info.ncr.com
BD10-5060-B Database Window Reference
BD10-5061-E Field Support Guide
BD10-4956-A Master Index, Bibliography, and Glossary
BD10-5062-D Messages Reference
BD10-5013-A Performance Monitor Reference
BD10-5064-C Resource Usage Macros and Tables
BD10-5052-B Security Administration Guide
B035-1507-048B SQL Quick Reference
BD10-5067-D Utilities Reference
B035-1902-048D Teradata RDBMS for UNIX V2R2.1 and Client 9801 User Documentation CD-ROM
viii Introduction to the Teradata RDBMS for UNIX
Preface Client Reference Library
The following publications, listed in alphabetical order, apply to Teradata Client 9801 products:
Product ID Publication Title
BD10-4971-B Robotic Library Manager Installation and User Guide
B035-3032-097B Robotic Library Manager Reference Card
BD10-4952-C Teradata Application Programming With Embedded SQL for C, COBOL, and PL/I
BD10-5069-C Teradata Archive/Recovery Reference for Channel-Attached Systems
BD10-5087-B Teradata Archive Storage Facility 2 (ASF2) Administration and Operations
BD10-5086-B Teradata Archive Storage Facility 2 (ASF2) Command Language Reference Manual
BD10-5091-C Teradata BTEQ Reference
B035-2401-038A Teradata Client Command Summary
BD10-5084-C Teradata Client for MVS Installation Guide
BD10-5095-C Teradata Client for NCR UNIX MP-RAS Installation Guide
BD10-5085-B Teradata Client for VM Installation Guide
BD10-5024-B Teradata Data Definition Language Processor Reference
B035-3027-107A Teradata Database Query Manager (DBQM) Administrator’s Guide
B035-3029-107A Teradata Database Query Manager (DBQM) Programmer’s Guide
Introduction to the Teradata RDBMS for UNIX ix
Preface Client Reference Library
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata Client publications, please visit our Internet site at:
http://www.info.ncr.com
BD10-5094-B Teradata Enhanced Call-Level Interface Reference
BD10-5079-C Teradata FastExport Reference
BD10-4954-D Teradata FastLoad Reference
BD10-5075-A Teradata ITEQ User’s Guide for Channel-Attached Systems
BST0-2122-30 Teradata ITEQ Keypad Template
BST0-2122-34 Teradata ITEQ Keypad Template (3270 PC)
BST0-2126-20 Teradata ITEQ Reference
BD10-5076-C Teradata MultiLoad Reference
BST0-2141-00 Teradata ODBC Driver for Windows Installation and User’s Guide
B035-3021-018A Teradata Parallel Data Pump (TPump) Reference
BD10-5062-D Teradata RDBMS for UNIX Messages Reference
BD10-4966-C Teradata TDP Reference
BD10-5083-B Teradata TS/API Installation Guide
BD10-5082-B Teradata TS/API System & Database Administration Guide
BD10-5081-B Teradata TS/API User’s Guide
BD10-5090-A Teradata WinCLI Application Developer’s Guide
BD10-5093-A Teradata WinCLI Installation Guide
B035-1902-048D Teradata RDBMS for UNIX V2R2.1 and Client 9801 User Documentation CD-ROM
Product ID Publication Title
Preface How to Order Teradata Publications
How to Order Teradata Publications
You may always order Teradata publications through your NCR Sales Representative, or you may use one of the methods listed below.
Order Form To order Teradata publications, use the Information Products Order Form (form number IPP-WD02001).
Ordering Address Send orders to the following address:
Electronic Versions of Teradata Publications
To obtain the latest version of Teradata publications, please visit our Internet site at:
http://www.info.ncr.com
Non- U.S. Orders
NCR IPP-BRUSSELS-OTC Rue de la Fusee 50 B-1130 Brussels Belgium
FAX: 32-2-727-95-50 PHONE: 32-2-727-95-49 or 32-2-727-95-71 E-MAIL: [email protected]
Introduction to the Teradata RDBMS for UNIX xi
Preface How to Order Teradata Publications
xii Introduction to the Teradata RDBMS for UNIX
Contents
Preface
About This Chapter...............................................................................2-1 Introduction .....................................................................................2-1 Hardware .........................................................................................2-1 System Configuration.....................................................................2-3 Client Software ................................................................................2-6 Server Software ...............................................................................2-8
Table of Contents
Table of Contents
About This Chapter...............................................................................3-1 Introduction .....................................................................................3-1 What is a Relational Database? .....................................................3-1 Some Other Definitions..................................................................3-2
Table of Contents
Table of Contents
Introduction ...................................................................................4-19 Dropping a Table ..........................................................................4-19 Dropping an Index........................................................................4-19
The SELECT Statement .........................................................................5-2 Introduction .....................................................................................5-2 Relational Algebra ..........................................................................5-2 Teradata SQL Expressions.............................................................5-3 Arithmetic Operators .....................................................................5-3 Aggregate Operators ......................................................................5-4 Comparison Operators...................................................................5-4 Logical Operators............................................................................5-5 Partial String Matching Operator .................................................5-5 Set Operators ...................................................................................5-6 Other Operators ..............................................................................5-6 Arithmetic Functions......................................................................5-7
Using Fully Qualified Names to Reference Databases and Tables in Teradata SQL......................................................................5-8
Introduction .....................................................................................5-8 Fully Qualified Names ...................................................................5-8
Select Specific Rows ...............................................................5-11 Specifying Order in the Results Table........................................5-12 Defining Groups............................................................................5-12 Including Information from More Than
Table of Contents
Using Teradata SQL in Application Programs ...............................5-20 Introduction ...................................................................................5-20 Embedded SQL and Client Programming Languages............5-20 Cursors ...........................................................................................5-21
Restrictions on DML Operations on Views .......................................6-6 Introduction .....................................................................................6-6 Views with Aggregates ..................................................................6-6 Views with Joins..............................................................................6-6
For More Information ...........................................................................6-7
Table of Contents
Using Macros as SQL Applications ....................................................8-4 Introduction .....................................................................................8-4 Creating a Macro.............................................................................8-4 Using a Macro..................................................................................8-5 Modifying a Macro .........................................................................8-5 Deleting a Macro .............................................................................8-5
Using the EXPLAIN Statement As a Tool To Optimize Your SQL Code..................................................................8-6
Introduction .....................................................................................8-6
Table of Contents
Using EXPLAIN: First Example....................................................8-7 Using EXPLAIN: Second Example...............................................8-8
Introduction ...................................................................................8-11 TS/API Products...........................................................................8-11 Compatible Third Party Software Products..............................8-11
Table of Contents
For More Information .........................................................................9-13
About This Chapter.............................................................................10-1 Introduction ...................................................................................10-1 Concurrency Control ....................................................................10-1 Recovery .........................................................................................10-1
Table of Contents
About This Chapter.............................................................................11-1 Introduction ...................................................................................11-1 Definition of Security ...................................................................11-1 Definition of Integrity ..................................................................11-1 Tools for Enforcing System Security ..........................................11-1 Tools for Enforcing System Integrity .........................................11-2
Table of Contents
Table of Contents
About This Chapter.............................................................................13-1 Introduction ...................................................................................13-1
For More Information .........................................................................13-6
Appendix A How the Teradata RDBMS for UNIX Differs from the Teradata RDBMS for TOS
About This Appendix ..........................................................................A-1 Teradata RDBMS for UNIX Differences............................................A-2
Improved Performance and Added Features ............................A-3 Increased Number of Hash Buckets............................................A-3 Enhanced Row Evaluation ...........................................................A-4 File System Improvements ...........................................................A-4 Automatic Detection of Cylinder Fragmentation .....................A-5
Table of Contents
Additional General Improvements....................................................A-8 How the Teradata RDBMS for UNIX Differs
Glossary Glossary .................................................................................... Glossary-1
List of Figures
List of Figures
Chapter 1
RDBMS.............................................................................1-5
Chapter 3
Chapter 9
Fault Tolerance
Function of Time.........................................................10-15
xxviii Introduction to the Teradata RDBMS for UNIX
Revision Record
Date Description
May 29, 1998 Supports Teradata RDBMS for UNIX V2R2.1.0
xxx
Overview
Chapter 1
Introduction to the Teradata RDBMS for UNIX 1
-1
Introduction 1
This chapter presents an overview of the Teradata Relational Database Management System (RDBMS), including perspectives on its design and brief reviews of the hardware and software systems that comprise the Teradata RDBMS.
Design Perspectives 1
The topic on design perspectives for the Teradata RDBMS includes descriptions of the following:
Research ideas leading to the eventual design Design philosophy and goals Scalability Shared information architecture
Teradata Database Software 1
The topic on Teradata software includes descriptions of the following:
The structured query language (SQL) and its uses for application programming and interactive database queries
The Teradata database management system The Teradata file system and disk handling system
Client Software 1
The topic on client software includes descriptions of the following:
The request handler (Call Level Interface, or CLI) The data communications component (Teradata Director
Program, or TDP) Application development services, including:
A SQL preprocessor CLI Third party query front ends, gateways, and fourth
generation languages Data loading utilities The archive/restore utility
Overview Design Perspectives
Design Perspectives 1
Introduction 1
This topic describes the considerations that went into the design of the original Teradata Database System. The topic also explains the overall perspectives behind the system.
Charter for the Teradata Database System 1
The original charter for development of the Teradata RDBMS included the following goals:
Large capacity database system with thousands of MIPS capable of storing terabytes of data and billions of rows
Fault tolerance to ensure data integrity Network connectivity Manageable growth Relational database management system Faster than other relational systems Common access language Single data store for multiple clients in a client/server
architecture
Research Ideas Leading to the Design of the Teradata Database System 1
The hardware component of the first generation Teradata RDBMS was a database machine. The current generation machine is a general purpose massively parallel machine running the Teradata RDBMS as a trusted parallel application (TPA). The earliest database machines were comprised of specialized hardware components. These machines were very expensive to implement and did not provide improved performance.
The concept behind the Teradata RDBMS was to build an inexpensive system using mostly off-the-shelf hardware components that would meet and exceed the performance of conventional database management systems using relational database management.
The architecture incorporates a parallel, distributed architecture in which the distributed functions communicate by means of a fast interconnect structure. This proprietary interconnect structure in the current architecture is known as the BYNET (for MPP systems) or the Vnet (for SMP systems).
Shared Information Architecture 1
One of the principal goals for the design of the Teradata RDBMS was to provide a single data store for any number of client architectures. This Shared Information Architecture (SIA) eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, workstations, and personal
1-2 Introduction to the Teradata RDBMS for UNIX
Overview Design Perspectives
Figure 1-1 Teradata RDBMS Shared Information Architecture
Teradata RDBMS single data store
Unisys A-series
Overview Teradata Database Software
Teradata Database Software 1
Introduction 1
The Teradata Database Software is the foundation for the relational database server. Its purpose is to support SQL manipulations of the database.
The server software includes the following components:
Channel communications support LAN gateway communications support SQL parser Request dispatcher Session control Database manager File manager
Structured Query Language (SQL) 1
The structured query language (SQL) is a data sublanguage designed specifically for manipulating data in relational databases. SQL is the only language the Teradata RDBMS understands, so all database manipulations, whether embedded in an application program or resulting from an interactive query, use SQL and SQL only.
The figure shows a process flow of a SQL statement through the Teradata RDBMS on a channel-attached system.
Process flow in a network-attached system is somewhat different (substituting the micro operating system (MOSI) and micro Teradata Director Program (MTDP) for the TDP), but the basic idea is very similar.
1-4 Introduction to the Teradata RDBMS for UNIX
Overview Teradata Database Software
Figure 1-2 Process Flow of a SQL Statement Through the Teradata RDBMS
SQL query Results table
Overview Teradata Database Software
The following table describes the process flows illustrated by this picture.
Stage Process
1 A user generates an SQL query on the channel-attached client. The query can either be from a BTEQ session at an interactive terminal, from a compatible fourth generation language, or can originate from within an application program coded in a host language.
2 The request/results packaging component, CLI, packages the request and sends it to the TDP for routing to the server.
3 The TDP establishes a session, then routes the request across the communications channel to the parsing engine (PE).
4 The parser component of the PE opens the request package and parses the SQL code for processing, interprets it, checks its syntax, evaluates its semantics, and optimizes the access plan.
IF the SQL source code parses . . . THEN the . . .
without errors the parser decodes the request into a series of work steps and passes them to the dispatcher.
with errors the dispatcher receives the appropriate error message and returns it to the requester. Processing terminates.
The dispatcher sequences the steps and passes them on to the BYNET (or Vnet) with instructions about whether the steps are for one Access Module Process (AMP), an AMP group, or for all AMPs.
5 The BYNET (or Vnet on a single node system) distributes the execution steps to the appropriate AMP for processing.
6 The AMPs process the execution steps by performing select, insert, delete, and update operations on the database. The AMPs make these operations by making calls to the file system.
The AMPs also perform other functions such as journaling, space accounting, and index maintenance.
7 The file system performs primitive physical data block operations by locating the data blocks to be manipulated and then passing control to the disk subsystem.
1-6 Introduction to the Teradata RDBMS for UNIX
Overview Teradata Database Software
8 The disk subsystem retrieves the requested blocks for the file system.
9 The disk manager returns the requested blocks to the file system.
10 The file system returns the requested data to the database manager.
11 The database manager sends a message back to the dispatcher stating that the data is ready to be returned to the requesting user, then sorts and transmits the data to the interface engine over the BYNET.
12 The BYNET (or Vnet on a single node system) merges the sorted response and returns it to the requesting interface engine for packaging.
13 The dispatcher builds the response message and routes it to the communications channel driver for return to the requesting client system.
14 The TDP receives and unpacks the response messages and makes them available to CLI.
15 CLI passes the received data back to the requesting application in blocks.
16 The requesting application receives the response data in the form of a relational table.
Stage Process
Overview For More Information
For More Information 1
For more information on the topics presented in this chapter, see the following Teradata RDBMS manuals.
IF you want to learn more about . . . THEN see this manual . . .
Structured Query Language Teradata RDBMS for UNIX SQL Reference
Data flows through the Teradata RDBMS
Teradata RDBMS for UNIX Database Design and Administration
General aspects of the Teradata RDBMS
Teradata RDBMS for UNIX Database Design and Administration
1-8 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture
Chapter 2
Introduction to the Teradata RDBMS for UNIX 2
-1
About This Chapter 2
The hardware that supports the Teradata software is based on off- the-shelf microprocessor technology combined with a proprietary communications network connecting the microprocessor elements.
The purpose of this chapter is to briefly mention and describe these hardware components and to describe the software architecture they support. Details are provided in the appropriate reference manuals.
Hardware 2
This manual documents the basic hardware configurations for both the SMP and MPP hardware platforms.
Unlike earlier database server technology supporting the Teradata database management system, these machines do not have specialized hardware processors.
Instead, they run virtual processors called vprocs (virtual processors). These vprocs provide the parallel environment that enables the Teradata RDBMS to run on SMP and MPP systems.
Teradata RDBMS Architecture About This Chapter
The components of the SMP and MPP machines are:
Component Description Function
Node Basic hardware processing unit for the SMP and MPP machines.
Symmetric multiprocessing (SMP) hardware unit with Database software Client interface software UNIX operating system Multiprocessor shared-
memory processors RAID disk arrays Failsafe power provisions.
BYNET Interprocessor network to link nodes.
Note: single node configurations use the Vnet instead of the BYNET.
Connects processors by broadcast, multicast, or point-to-point communication, depending on the situation.
SMP and single-node MPP systems use a software emulation of the BYNET called Vnet.
2-2 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture About This Chapter
System Configuration 2
Base and range limits for the SMP systems are described in the following table.
Note: Specifications are subject to change.
System Component Minimum Maximum
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
5100S SMP Nodes 1 1
Pentium CPU 4 32
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
5100M Per Node
EDAC Memory 256 megabytes
Serial (ESCON) and/or parallel (Bus & Tag) channel connection
1 64
CD-ROM drive 1
Teradata RDBMS Architecture About This Chapter
Client Software 2
The SMP and MPP hardware supports the Teradata RDBMS running both with and without a channel- or network-attached client.
The following table describes the available client software, recognizing that the “client” may be the 3500/4100/4500/5100 machine itself. These products can also be used to access a Teradata RDBMS for TOS running on an NCR 3600 or DBC/1012 platform.
Contact your NCR representative for information on supported platforms for each product and for custom ports to other platforms.
Software Description Supported Access
All channel- and network-attached clients
C Preprocessor Permits embedding SQL in C programs.
All channel- and network-attached clients
COBOL Preprocessor
Channel-attached clients
Channel-attached clients
Can be embedded in application programs using function calls.
All channel- and network-attached clients
TDP Data communication management.
Handles sessions, logging, recovery, restarts, physical I/O from the PEs, and security.
Channel-attached clients
Handles logging, recovery, restarts, and physical I/O from the PEs.
Session and security management are handled by the Gateway software on the server.
Network-attached clients
Teradata RDBMS Architecture About This Chapter
Archive/ Restore
Archives data to tape; restores taped data to Teradata RDBMS
Channel-attached clients
Archives data to tape; restores taped data to Teradata RDBMS
SMP and MPP platforms.
FastExport Extracts large volumes of data from the Teradata RDBMS.
All channel- and network-attached clients
FastLoad Performs high performance data loading into empty tables.
All channel- and network-attached clients
MultiLoad Performs high performance data loading, including inserts, updates, and deletions, against up to 5 existing tables.
All channel- and network-attached clients
Software Description Supported Access
Teradata RDBMS Architecture About This Chapter
Server Software 2
The server software includes all the following:
The Database Window The RDBMS Gateway A SQL parser and syntaxer A request dispatcher A session controller Facilities to control load balancing over the communications
network The Teradata database management software The Teradata file system Teradata Parallel Database Extensions (PDE) The UNIX operating system
A server may also contain data loading utilities such as MultiLoad and FastLoad, data export utilities like FastExport, and the SQL data access utility BTEQ.
2-8 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Virtual Processors
Virtual Processors 2
Introduction 2
The versatility of the Teradata RDBMS is based on virtual processors (vprocs), which eliminate dependency on specialized physical processors.
This is made possible by the Parallel Database Extensions (PDE) for UNIX. The PDE is an interface layer between the Teradata RDBMS and the standard UNIX operating system that runs on the NCR server.
A vproc is a collection of tasks running under the multitasking environment of the UNIX operating system. The tasks in a vproc share resources with other tasks in the same vproc. Multiple vprocs can run on an SMP platform or a node.
The vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from each other. This message communication is done using the Vnet software on single node platforms and using the BYNET and BYNET Driver Software on multinode platforms.
There are two types of vprocs:
Each type of vproc is described in the following passages.
PEs 2
Each Parsing Engine (PE) executes the database software that manages sessions and decomposes SQL into parallel steps.
The software, as shown in Figure 2-1, consists of the following elements:
Parser (including the Optimizer) Dispatcher Session Control
The Parser decomposes the SQL into relational data management processing steps.
Type Description
PE Performs session control and dispatching tasks as well as parsing functions.
AMP Manages the distribution and retrieval of data on the virtual disks (vdisks), which are defined at system configuration time with the pdeconfig utility.
Introduction to the Teradata RDBMS for UNIX 2-9
Teradata RDBMS Architecture Virtual Processors
The steps are passed to the Dispatcher, which sends the steps to the appropriate AMPs.
Session Control provides user session management such as establishing and terminating sessions.
Figure 2-1 PE Software Components
AMPs 2
Each Access Module Process (AMP) executes the database software that performs relational functions and data management.
Each AMP, as shown in Figure 2-2, is assigned a portion of the database to control.
Each AMP provides the following functions:
Data access Concurrency control Journaling Cache management Recovery functions.
Each AMP maintains its portion of the database tables stored on disks.
Figure 2-2 AMP Software Components
GG01A029
Teradata RDBMS Architecture The Parsing Engine
The Parsing Engine 2
Introduction 2
The Parsing Engine is the processor that communicates with the client system on one side and with the AMPs (via the BYNET or Vnet) on the other.
Each PE executes the database software that manages sessions, decomposes SQL statements into parallel steps, and returns the answer rows to the requesting client.
The major components of the PE are
Session Control SQL Parser Dispatcher.
Client Interface 2
The client interface provides handshaking across the communications channel between the server and its client or clients.
For a mainframe link, the connection is made by means of either:
Serial (ESCON) Parallel (Bus & Tag) Channel
implemented by means of the Teradata Channel Interface (TCI) protocol handler.
In the case of a network link, the connection is by means of a LAN connection using either:
TCP/IP ISO/OSI protocols
Session Control 2
Session numbers are assigned by the TDP and communicated to the server.
The PE establishes a session only if it can validate the username, password, and user type (application program, interactive BTEQ terminal, or third party software product). All subsequent traffic for the session are identified by their host id, session number, and request number.
Introduction to the Teradata RDBMS for UNIX 2-11
Teradata RDBMS Architecture The Parsing Engine
Input Data Conversion 2
The Teradata RDBMS is an ASCII machine. The parsing engine converts EBCDIC (and other non-ASCII) input to ASCII before processing it.
2-12 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The Parsing Engine
SQL Parser 2
The SQL parser handles all incoming SQL requests. It processes these requests as follows.
Stage Process
1 The Parser looks in the Request cache to determine if the request is already there.
IF the request is . . . THEN the Parser . . .
in the Request cache generates AMP steps and passes them to the gncApply software.
not in the Request cache begins processing the request with the Syntaxer.
2 The Syntaxer checks the syntax of an incoming request.
IF there are . . . THEN the Syntaxer . . .
no errors converts the request to a parse tree and passes it to the Resolver.
errors passes an error message back to the requestor.
3 The Resolver adds information from the Data Dictionary cache to convert database, table, view, and macro names to numeric identifiers, then produces lists of objects and access rights. The output is a Resolver tree, which the Resolver passes to a security checking mechanism.
4 The security module checks access rights in the Data Dictionary.
IF the access rights are . . . THEN the Security module . . .
valid passes the request to the Optimizer.
not valid aborts the request.
5 The Optimizer determines the most effective way to access the data needed by the request.
6 The Optimizer scans the request to determine where locks should be placed, then passes the optimized parse tree to the Generator.
Introduction to the Teradata RDBMS for UNIX 2-13
Teradata RDBMS Architecture The Parsing Engine
The Dispatcher 2
The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET (or Vnet on single node systems) to be distributed to the AMP database management software.
Note that AMP steps can be sent in any one of the following ways:
Between one PE and one AMP using the hashing algorithm Among a selected group of AMPS (referred to as a dynamic
BYNET (or Vnet) group Among all AMPs in the system.
7 The Generator transforms the optimized parse tree into plastic steps and passes them to the gncApply software.
Plastic steps are directives to the database management system that do not contain data values
8 gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps.
Concrete steps are directives to the database management system that contain user- and session-specific information as well as data parcels.
9 gncApply passes the concrete steps to the Dispatcher.
Stage Process
1 The Dispatcher receives concrete steps from gncApply.
2 The Dispatcher places the first step on the BYNET (or Vnet)— the Dispatcher tells the BYNET whether the step is for one AMP, several AMPS, or all AMPS—and waits for a completion response.
Whenever possible, the Teradata RDBMS performs steps in parallel to enhance performance.
3 The Dispatcher receives a completion response from one or several AMPS and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done.
Stage Process
Teradata RDBMS Architecture The Parsing Engine
Dispatching the Steps 2
The Dispatcher controls the sequence in which steps are executed and passes the steps onto the Vnet (single node systems) or BYNET (multinode systems). Once the steps are handed over to the Vnet or BYNET, they are referred to as AMP steps. The Dispatcher tells the Vnet or BYNET whether an AMP step is for one AMP, a group of AMPs, or all AMPs.
When the Dispatcher receives a completion response from an AMP or AMPs, the Dispatcher sends the next step via the Vnet or BYNET until all of the AMP steps associated with a request are complete.
The Vnet or BYNET software controls the transmission of messages to and from the AMPs. See Figure 2-3, where 12 rows of a table are distributed among disks attached to four AMPs.
If a request is for a single row, the PE transmits steps to a single AMP, as shown at PE 1 in Figure 2-3. If the request is for many rows (an all-AMP request), the PE causes the Vnet or BYNET to broadcast the steps to all AMPs as shown at PE 2 in Figure 2-3 . To minimize system overhead, the PE can send a step to a subset of AMPs.
Figure 2-3 PE Routing of Teradata SQL Request Messages
HD14A001
PE 2PE 1
Teradata RDBMS Architecture The Parsing Engine
As an example, consider the following two Teradata SQL statements from a table of checking account information:
1.SELECT * FROM Table_01 WHERE AcctNo = 129317 ; 2.SELECT * FROM Table_01 WHERE AcctBal > 1000 ;
In this example:
PEs 1 and 2 receive requests 1 and 2. The data for account 129317 is contained in table row R9 stored
on AMP 1 Information about all account balances is distributed evenly
among the disks of all four AMPs
The PE 1 Parser determines that its request is a primary-index retrieval, which calls for access and return of one specific row.
The Dispatcher in PE 1 then issues a message to the Vnet or BYNET containing an appropriate read step and R9/AMP 1 routing information. Once the desired record is received from AMP 1, PE 1 transmits the data back to the TDP.
The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the Vnet or BYNET containing the appropriate read step to be broadcast to all four AMPs.
Once results are received from the AMPs, PE 2 transmits the data back to the TDP.
To enhance system performance, the RDBMS executes steps in parallel whenever possible.
Parallel steps can work with multi-statement requests, macros, and single statements and can provide a significant improvement in response time.
For example, the response time of a multi-statement request consisting of four statements that can be independently executed may be cut in half.
Processing the Steps 2
The AMPs are responsible for obtaining the rows required to process the request.
The software on the AMPs does the following:
Processes AMP steps by performing select, insert, delete, and update operations on the data on the disks.
Performs other functions associated with AMP steps such as journaling, space accounting, index maintenance, and output data conversion.
Performs utilities to configure and reconfigure the RDBMS. (See Chapter 5, “Database Administration” for more information.)
2-16 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The Parsing Engine
Uses the file system software to perform primitive physical data block operations.
An AMP step can be sent in one of the following ways:
Between one PE and one AMP using hashing algorithm. Among a selected set of AMPs, called a dynamic Vnet or BYNET
group. Among all AMPs in the system.
An AMP step is broadcast to all AMPs when a full-table scan is requested or when the operation uses nonunique secondary indexes (NUSIs).
When an operation uses a unique primary index (UPI), nonunique primary index (NUPI), or unique secondary index (USI), the message includes the row hash value, which is used by the Vnet or BYNET to route the message to the correct vproc.
The sequence of AMP step processing is as follows:
Each AMP is associated with disks and uses its file system software to control the reading and writing of data on its disks.
The file system controls primitive physical data block reads, and translates AMP software row requests into physical data block requests.
Step Step Name Function
1 Lock Ensures that users who are concurrently trying to update the same rows do not violate the consistency of the data.
If the operation uses a UPI, NUPI, or USI, this step is incorporated into step 2.
2 Operation Performs the actual task required: select, delete, insert, update, sort.
There may be many operation steps.
3 End transaction
Required only for multiple AMP steps.
If the request is for a UPI, no end transaction step is necessary.
The end transaction step tells all AMPs that worked on the request that processing is complete.
Introduction to the Teradata RDBMS for UNIX 2-17
Teradata RDBMS Architecture Structured Query Language
Structured Query Language 2
This topic describes SQL, the Structured Query Language.
SQL is the only language the Teradata RDBMS understands. It is the ANSI standard language for relational database management.
Why SQL? 2
SQL has the advantage of being the most commonly used language for relational database management systems.
Because of this, both the data structures in the database and the commands for manipulating those structures are controlled using SQL. Additionally, all applications, whether written in a client language with embedded SQL, a macro, or an ad hoc SQL query, are written and executed using the same set of instructions and syntax.
Other database management systems use different sublanguages for data definition and data manipulation and do not permit ad hoc queries of the database. This means that you must use one language to define your data and yet another to query and update it. And you are restricted to running applications written by programmers. You have very little flexibility with nonrelational database management systems.
SQL Flagger 2
The Teradata RDBMS has an optional feature that detects non-ANSI SQL extensions (for entry level ANSI SQL92 only) and reports them back to the user (either to an embedded SQL program or to BTEQ) without terminating execution of the query.
2-18 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Structured Query Language
SQL Lexicon 2
Like any language, SQL has its rules for writing statements.
The following table describes the SQL lexicon.
Lexical Component Description
Word A character string of from 1 to 30 characters derived from the following character set: Roman characters (both cases) Digits $ # _ Keywords are a special category of words that are reserved for use in SQL statements. You cannot use keywords as object names.
Introduction to the Teradata RDBMS for UNIX 2-19
Teradata RDBMS Architecture Structured Query Language
Delimiter Special characters whose meaning depends on context. The Teradata SQL delimiters and their functions are as follows.
Delimiter Function
‘ Separates items in a list Acts as a date separator
: Prefixes a referenced parameter or client system variable Acts as a date separator
. Separates a database name from a table name Separates a table name from a column name Acts as the decimal point Acts as a date separator
; Separates statements in a request Terminates a request (BTEQ)
‘ Defines boundaries of character string constants Acts as a data separator
“ Defines the boundaries of nonstandard names
/ Used as a date separator
B Blank. Used as a date separator
- Used as a date separator
Constant Numerics, strings, and characters embedded in a statement.
Lexical Component Description
Teradata RDBMS Architecture Structured Query Language
Operator A set of symbolics used to express logical and arithmetic operations. Operators of the same precedence are evaluated from left to right. The following table shows the operators from highest to lowest precedence.
Result Type
numeric numeric + numeric numeric - numeric
string concatenation operator
logical value EQ value value NE value value GT value value LE value value LT value value GE value value IN set value NOT IN set value BETWEEN value AND value
charvalue LIKE charvalue
logical NOT logical
logical logical AND logical
logical logical OR logical
Lexical separator A character string that can exist between words, constants, and delimiters without changing the meaning of a statement. Valid lexical separators are: Comments Blanks Return characters (X’0D’)
Lexical Component Description
Teradata RDBMS Architecture Structured Query Language
Character Sets 2
The Teradata RDBMS supports multinational and multibyte character sets in several different environments.
Among the character sets supported are:
Kanji Katakana Hiragana European languages with characters using the umlaut, tilde, or
ring
The RDBMS provides multibyte support for the following operating systems:
MVS VM/CMS UNIX DOS/V
Multibyte support exists for the following Teradata software:
Server-based utilities Client-based utilities BTEQ Preprocessor2 (embedded SQL) TDP CLIv2
Users control the current character set and collation sequences using SQL statements.
Statement separator A character that separates each statement of a multistatement request. The Teradata SQL separator is the semicolon.
Request terminator A character that terminates a request in the body of a macro or that is entered from BTEQ. The Teradata SQL request terminator is the End of Text character for macros or the semicolon for BTEQ.
Lexical Component Description
Teradata RDBMS Architecture Query Facilities
Query Facilities 2
The Teradata RDBMS supports several different facilities for making interactive or batch queries of the database from a terminal.
These include:
Basic Teradata Query facility (BTEQ) Fourth generation languages
Because SQL is the only language the Teradata RDBMS understands, all application programming facilities ultimately make their queries against the database using the SQL language.
BTEQ 2
The Basic Teradata Query facility is a SQL formatter/report generator that allows you to create and perform SQL queries interactively or in batch mode from an interactive terminal.
BTEQ supports the following facilities:
Multiple Teradata SQL statements per request Read from and write to client data files Manage multiple sessions per job Format output and write sophisticated reports
BTEQ is supported on the following platforms:
Channel-attached client Network-attached client Teradata server
Introduction to the Teradata RDBMS for UNIX 2-23
Teradata RDBMS Architecture The BYNET
The BYNET 2
This topic explains the concepts behind the interprocessor network technology used by the Teradata RDBMS: the BYNET.
BYNET Functions 2
At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. This view does an injustice to the BYNET, however, because the capabilities of the network range far beyond those of a simple system bus.
The BYNET also possesses high speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions.
A multinode system has two BYNETs. This both creates a fault tolerant environment and provides for enhanced interprocessor communication. When BYNET traffic becomes particularly heavy, the two BYNETs can handle separate (rather than redundant) traffic. The machine provides load balancing software to optimize this process.
The total bandwidth for each network link to a processor node is 10 megabytes. Because there are two network links per node and because the bandwidth is linearly scalable, the total throughput available for each node is 20 megabytes.
For example, a 16-node 5100M system has 320 megabytes of bandwidth for point-to-point connections.
Total available broadcast bandwidth for any size 5100M system is 20 megabytes.
The BYNET software provides a standard TCP/IP interface for communication among the SMP nodes.
Figure 2-4 illustrates how the BYNET connects individual SMP nodes to create an MPP system in the 5100M configuration.
2-24 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture The BYNET
Figure 2-4 How the BYNET connects individual SMP nodes
Virtual Processor Connectivity in Single Node Systems 2
Single node systems mimic the BYNET with a software emulation called the Vnet. Vnet represents “virtual network.”
GG01B002
Teradata RDBMS Architecture The Access Module Process
The Access Module Process 2
Introduction 2
The Access Module Process (AMP) is the heart of the Teradata RDBMS. The Access Module Process is a virtual processor (vproc) that provides a BYNET interface and performs many database and file management tasks.
AMPs control the management of the Teradata RDBMS and also provide control over the disk subsystem, with each virtual AMP being assigned to a virtual disk.
AMP Functions 2
BYNET (or Vnet) interface Database manager
Locking Joins Sorting Aggregation Output data conversion Disk space management Accounting Journaling
File system and disk management
Scalability and Performance 2
You can increase the performance of a Teradata RDBMS by adding SMP nodes to your system. Performance increases at a nearly linear rate with the addition of SMP nodes to a 5100M configuration.
The Disk Subsystem 2
Each AMP supports one virtual disk unit, using either RAID1 (mirroring) or RAID5 (parity striping) technology.
AMP Clusters 2
AMPs are grouped into logical clusters to enhance the fault tolerant capabilities of the Teradata RDBMS. This method of creating additional fault tolerance in your system is discussed further in Chapter 9, “Fault Tolerance.”
2-26 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Request Packaging and Unpackaging
Request Packaging and Unpackaging 2
Introduction 2
Any SQL statement must be packaged before being transmitted to the server-based database where it is executed. The returned response must then be unpackaged and presented to the requesting terminal or application program.
This topic discusses the mechanism for request handling used by the Teradata RDBMS.
Facilities for Packaging and Unpackaging SQL Requests and Results 2
The Call-Level Interface (CLI) is the primary mechanism the Teradata RDBMS uses to package and unpackage SQL requests and results. It is the principal API for the Teradata RDBMS.
The CLI packages queries into uniform blocks that are routed to the server by the Teradata Director Program (TDP) in IBM mainframe configurations or by the MTDP in other configurations.
Result tables returned to the requesting terminal or application are similarly routed by the TDP to the appropriate requester where they are unpackaged and presented as a results table.
Personal computers running Microsoft Windows® can use the Windows CLI (WinCLI) package to access the Teradata RDBMS. WinCLI uses the Dynamic Data Exchange (DDE) protocol to communicate with application programs.
The industry-standard ODBC driver to the Teradata RDBMS is another API for packaging and unpackaging SQL requests.
Introduction to the Teradata RDBMS for UNIX 2-27
Teradata RDBMS Architecture Data Communications Management in the Teradata RDBMS Environment
Data Communications Management in the Teradata RDBMS Environment 2
Introduction 2
This topic discusses the Teradata RDBMS component that handles all data communications management: the Teradata Director Program (TDP).
The TDP 2
SQL requests from a client-based user, whether made as an interactive query or from an application program, are transmitted in the form of CLI packet messages, as are the responses to the query.
These transmissions are managed by a data communications manager.
In the Teradata RDBMS, the data communications manager is called the Teradata Director Program, or TDP.
The TDP does all of the following:
Establishes and manages session control Routes requests Routes logons Verifies users Initiates recovery and restart processing Monitors and controls security
The Teradata RDBMS also provides facilities to enable the TDP to communicate with client application services.
The Micro TDP 2
Workstation clients run a version of the TDP called the Micro TDP (MTDP) and an additional component called the Micro Operating System Interface (MOSI), which contains libraries of procedures to handle operating system-dependent and communications protocol- dependent services.
The MTDP calls MOSI routines for system services like:
Interrupt processing I/O processing Network connection and processing
2-28 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Application Programming Facilities
Application Programming Facilities 2
This topic discusses the application programming facilities provided by the Teradata RDBMS software.
This software falls into several broad categories:
Embedded SQL Call Level Interface ODBC
Because SQL is the only language the Teradata RDBMS understands, all application programming facilities ultimately make their queries against the database using the SQL language.
Embedded SQL 2
The Teradata RDBMS provides a preprocessing facility that enables you to include ANSI-compliant SQL statements in your application programs.
The SQL preprocessor parses your application code for SQL statements, converts them to CLI calls, and then comments out the SQL statements. After the application code has been preprocessed by the Teradata RDBMS Preprocessor2, you can submit it to your client application language compiler.
Preprocessor2 supports the following client programming languages.
Call-Level Interface 2
The Call-Level Interface (CLI) is an application programming interface that provides facilities that enable any client application programming language that supports a CALL statement to query the Teradata RDBMS.
The CLI is also supported directly on NCR servers running the Teradata RDBMS.
A Windows®-based version of CLI, called WinCLI, is also available.
This programming language . . . Is supported on this platform . . .
PL/I IBM mainframe clients
C IBM mainframe clients UNIX clients
Introduction to the Teradata RDBMS for UNIX 2-29
Teradata RDBMS Architecture Application Programming Facilities
ODBC 2
Open Database Connectivity (ODBC) is an industry standard application programming interface you can use with Microsoft Windows®, Windows® NT, and Windows 95 to make SQL queries against a Teradata RDBMS database.
The ODBC Driver for Teradata RDBMS provides Core-level SQL and Extension-level 1 (with some Extension-level 2) function call capability using the Windows® Sockets (WinSock) TCP/IP communications software interface.
An additional software package, the Database Query Manager, permits the Teradata RDBMS to manage requests from applications running under Windows®, Windows® NT, and Windows® 95 using ODBC.
ODBC operates independently of CLI and WinCLI.
2-30 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Archiving and Data Loading Utilities
Archiving and Data Loading Utilities 2
Introduction 2
The Teradata RDBMS provides several utilities for archiving and restoring the database and for data loading. Data loading utilities are typically used in a decision support environment where the client machine gathers data during the day and dumps it to the server over night. This topic briefly describes these utilities.
Archive and Restore Utility and ASF2 2
The Archive and Restore utility and the Archive Storage Facility (ASF2) support archiving of databases, individual tables, or permanent journals to any of the following media:
3500/4500/5100 tape (ASF2 only) Client tape Client file
The utility also restores databases from those archival media to the Teradata RDBMS.
Archive and Restore is supported in the MVS and VM environments only.
BulkLoad 2
The BulkLoad utility permits batch insert, update, and delete operations on an existing database. The program moves large quantities of data from a client to the Teradata RDBMS on the server.
BulkLoad is supported in the MVS and VM environments only.
FastLoad 2
The FastLoad utility permits you to load unpopulated tables only. The program is similar to BulkLoad except that it runs much faster and does not support update and delete operations.
FastLoad is supported in both the client and server environments.
MultiLoad 2
The MultiLoad utility loads large quantities of data into unpopulated tables. MultiLoad also supports bulk inserts, updates, and deletions against populated tables.
MultiLoad is supported in both the client and server environments.
FastExport 2
The FastExport utility exports large quantities of data from the Teradata RDBMS to a client and is the functional complement of the FastLoad and MultiLoad utilities.
Introduction to the Teradata RDBMS for UNIX 2-31
Teradata RDBMS Architecture Administrative Workstation
Administrative Workstation 2
The Administrative Workstation (AWS) performs many of the functions of a system console for multinode Teradata RDBMS systems.
Single node systems do not have an AWS.
It is an intelligent workstation attached to an SMP node and its primary roles are to:
Monitor system performance Provide an input mechanism for the system administrator.
2-32 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Window
Database Window 2
Introduction 2
The console software for the Teradata RDBMS for UNIX is called the Database Window (DBW). It runs in the following X windows environments:
System console Administrative workstation (AWS) Remote workstation or PC
The Database Window provides an interface to all the following windows:
Supervisor Database Message Application Windows (including any currently active support
utilities)
Workstation Types and Available Platforms 2
Some of the workstation types are available only on specific platforms.
The following table shows which workstations are appropriate for the different platforms and how they are connected to the node.
Database Window Communication 2
The DBW communicates with the Teradata RDBMS through the console subsystem (CNS), which is part of the PDE. Because the DBW is managed by the CNS, you will occasionally see CNS messages in the DBW.
Type of Workstation Platform Description
System console SMP Connected directly to the SMP node
Administrative workstation MPP LAN-connected through an Ethernet card on the node. The AWS provides a single operational view of the multiple-node system.
Remote connection through LAN: UNIX workstation PC with X Windows server
Both LAN-connected through an Ethernet card on the node.
Introduction to the Teradata RDBMS for UNIX 2-33
Teradata RDBMS Architecture Database Window
Functions Provided by the Database Window 2
The system console provides all of the following functions:
Displays system status Displays the current system configuration Displays performance statistics Controls various AMP utilities
Supervisor Subwindow 2
The DBW has a main window and several subwindows. The principal subwindow, called the Supervisor Subwindow, permits an operator to run utilities and enter various commands.
Utilities Available from the Supervisor Subwindow 2
Many utilities used to control, monitor, and configure the RDBMS are available from the Supervisor subwindow. A partial list of the utilities invoked from the DBW is provided in Chapter 12, “System Administration,” in the section “System Utility Software.”
Supervisor Commands Available from the Database Window 2
The following table lists the commands available from the Supervisor Subwindow of the Database Window.
Command Function
CNSSET DBWTIMEOUT
Sets how often the CNS checks the connection between the CNS and the DBW.
CNSSET LINES Sets the number of lines that are saved and available to you in the output display area after a reconnect to the CNS.
CNSSET STATEPOLL
Sets how often the CNS checks the RDBMS state and substate.
CNSSET TIMEOUT Sets the interval between the time you type a request and the time the DBW rejects it because a program did not solicit the input.
DISABLE LOGONS Prevents new sessions from logging on.
ENABLE LOGONS Restores the ability of new sessions to log on.
GET CONFIG Displays the current system configuration.
GET LOGTABLE Displays the status of logging to the specified resource usage tables.
GET RESOURCE Displays the resource collection and logging rates, and the memory clearing rate of a vproc or node.
GET TIME Displays the current date and time.
GET VERSION Displays the PDE and RDBMS version numbers.
2-34 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Window
LOG Logs the specified text into the errorlog.
QUERY STATE Displays the current state of the RDBMS.
RESTART TPA Restarts the RDBMS.
SET LOGTABLE Enables or disables logging to the specified resource usage tables.
SET RESOURCE Sets the resource collection and logging rates, and the memory buffer clearing rate of a vproc or node.
START Starts a RDBMS utility in a DBW application subwindow.
STOP Stops a RDBMS utility in a DBW application subwindow.
Command Function
Teradata RDBMS Architecture RDBMS Gateway
RDBMS Gateway 2
The RDBMS Gateway maps the external network protocols onto the internal database message protocols. It is a server program that provides a pathway for applications running on a network- connected client to access the Teradata server.
The RDBMS Gateway also permits clients running locally to communicate with the Teradata RDBMS.
There is one RDBMS Gateway per machine, controlling up to 600 sessions per node.
2-36 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture Database Utility Software
Database Utility Software 2
Database utilities are used to perform maintenance functions on the Teradata RDBMS.
They are invoked from the Database Window with the following exceptions:
The system utilities include:
Utility Name Runs under . . .
DIP control of BTEQ as well as the Database Window.
XPT UNIX as an application.
xperfstate UNIX as an application.
Utility Name Function
Config Specifies logical database configuration (AMPs and PEs).
XCTL Displays and modifies the fields of the Control Parameters Globally Distributed Objects (GDO) of the Parallel Database Extension (PDE) software.
Accessed from an xterm window.
DBSControl Specifies global runtime flags for database software.
VprocManager Provides status for vprocs and permits manipulation of their attributes.
GtwGlobal Manages LAN connections.
Teradata RDBMS Architecture Database Utility Software
Ferret Displays and sets various disk space utilization attributes without destroying the data for which the File System is responsible.
For new attributes, Ferret reconfigures the stored data dynamically to match them.
Utilities running under Ferret include the following:
Scandisk Showspace Packdisk
Filer Displays information used to correct problems within the File System.
pdeconfig Allocates PE and AMP vproces to physical resources, including all of the following: Configuring disk arrays Assigning logical units (LUNS) to the disks Allocating disks to AMPs Allocating LANs and channels to PEs Always run xmppconfig before running pdeconfig.
QryConfig Displays the current database software logical configuration.
QrySessn Displays session status information.
RcvManager Displays recovery status.
Rebuild Reconstructs tables from fallback copies (only works when fallback is used).
Reconfig Redistributes disk data automatically whenever AMP vprocs are added or removed.
Showlocks Displays host utility (HUT) locks on databases and tables.
SysInit Initializes the Teradata system tables and all user tables.
xmppconfig Sets up and updates configurations.
Use this utility to specify the physical configuration before running pdeconfig. Must be run prior to pdeconfig for MPP systems.
DIP Executes one or more of the standard DIP (Database Initialization Program) SQL scripts packaged with the RDBMS
Utility Name Function
Teradata RDBMS Architecture Database Utility Software
XPT Installs multiple copies of the same software across all nodes of an MPP system.
xperfstate Provides real time display of PDE system performance, including system-wide CPU utilization and disk utilization.
Utility Name Function
Teradata RDBMS Architecture Teradata Manager
Teradata Manager 2
Introduction 2
Teradata Manager is a PC-based package that provides easy access to resource usage information in the Teradata Data Dictionary. The PC supporting Teradata Manager must be running the Windows NT operating system.
Performance Analysis 2
The Teradata Manager Performance Monitor uses two commands to monitor the performance of the Teradata RDBMS.
The commands are:
MONITOR CONFIG MONITOR SUMMARY
You can specify date sampling rates and durations and the Teradata Manager collects and analyzes the data for you. Results of data analyses can be displayed in a text window.
The Locking Logger feature permits you to determine whether an application mix is causing delays because of database lock contention.
Session Information 2
Setting session rates Monitoring sessions Identifying sessions Aborting sessions
Statistical Information 2
Teradata Manager provides facilities for:
Detecting which tables have statistics Create statistics for columns and indexes Drop statistics by table or column/index Refresh statistics for:
Entire Teradata RDBMS Database Table Column/Index
2-40 Introduction to the Teradata RDBMS for UNIX
Teradata RDBMS Architecture For More Information
For More Information 2
For more information on the topics presented in this chapter, see the following Teradata RDBMS manuals.
IF you want to learn more about . . . THEN see this manual . . .
System process flows Teradata RDBMS for UNIX Database Design and Administration
Teradata SQL Teradata RDBMS for UNIX SQL Reference
General Teradata software architecture
The TDP Teradata TDP Reference
Preprocessor2 Teradata Application Programming Using Embedded SQL
Embedded SQL Teradata RDBMS for UNIX SQL Reference Teradata Application Programming Using Embedded SQL for C, COBOL, and PL/I
Teradata Manager Teradata Manager Reference Guide
ODBC Teradata ODBC Driver for Windows Installation and User’s Guide
Introduction to the Teradata RDBMS for UNIX 2-41
Teradata RDBMS Architecture For More Information
2-42 Introduction to the Teradata RDBMS for UNIX
The Relational Model
Chapter 3
Introduction to the Teradata RDBMS for UNIX 3
-1
About This Chapter 3
Introduction 3
This chapter reviews the relational model for database management. The chapter also describes issues like normalization, referential integrity, and macros.
The relational model for database management is based on concepts derived from the mathematical theory of sets. This chapter touches on the relational model from that viewpoint to establish its solid foundation in mathematics. By way of comparison, database management products based on the hierarchical, network, and object-oriented architectures are not based on rigorous theoretical foundations and so their behavior is not as predictable as are relational products.
Database management systems based on the hierarchical, network, and object-oriented models use different languages to define and manipulate the database, and none provides the capability for making ad hoc queries.
The chapter describes the process of further normalization of a database, then describes macros in the Teradata environment.
What is a Relational Database? 3
A relational database is a database that is perceived by its users as a collection of tables and nothing but tables. This deceptively simple concept permits information to be created and maintained without any kind of anomalies as well as providing users with a simple presentation of data which can, in turn, be manipulated with ease.
The freedom from anomalies is based on the fact that relational databases are based on the mathematics of set theory. Roughly speaking, set theory defines a table as a relation. Each entity in a relation is called a tuple and each column is an attribute. The number of tuples is the cardinality of the relation and the number of attributes its degree.
The following table presents these correspondences. Note that relational databases are a generalization of the mathematics of set theory relations and the correspondences between set theory and relational databases are not always direct.
Set theory term Relational database term
Relation Table
The Relational Model About This Chapter
Because the mathematical operations on relations are well-defined, any manipulation of a table in a relational database has a consistent, predictable outcome. This contrasts with all other database management systems, none of which is based on mathematical theory and none of which treats its data formally. Because the operations on relational databases are so well defined, users can perform ad hoc, interactive queries of the database-—unlike other database management systems that require a system programmer to predefine all links between files and all possible queries of the database.
Under the covers, the SQL optimizer uses relational algebra to build the most optimal access to the requested data. Because the definition of the database can change from time to time, the optimizer can readily adapt to any such changes and reoptimize access paths without programmer intervention.
Some Other Definitions 3
The following terms are defined now to make the discussion that follows easier to understand.
Term Definition
Primary key A unique identifier for a relation.
In set theory (and in relational database theory), duplicate rows are not allowed. However, commercially available relational databases often allow duplicate rows in relations. In those cases, the relation does not have a primary key.
Relations with a primary (or candidate) key do not permit duplicate rows.
The Teradata RDBMS permits enforcement of the no duplicates rule even when no primary key is specified.
Candidate key Any relation might have multiple unique identifiers. Each such unique identifier is called a candidate key.
A candidate key must satisfy the properties of uniqueness and minimality. That is, for any attribute, no two rows of the table have the same value for that attribute and if it is composite, no component can be eliminated without destroying the uniqueness property.
Alternate key Any candidate key not chosen as the primary key.
3-2 Introduction to the Teradata RDBMS for UNIX
The Relational Model About This Chapter
Foreign key A primary key in another relation that is also a column value in the current relation. Foreign keys are used to join tables and may participate in the primary key.
Functional dependence
Attribute X is functionally dependent on attribute Y if and only if each Y value in the relation has associated with it exactly one X value.
Full functional dependence
Attribute X is fully functionally dependent on attribute Y if and only if it is functionally dependent on Y and not functionally dependent on any proper subset of Y.
Transitive dependence
A state in which an attribute is fully functionally dependent, but by means of an intermediate attribute. Transitive dependence is a state that normalization seeks to eliminate.
Determinant Any attribute on which some other attribute is fully functionally dependent.
Multivalued dependence
Given a relation with attributes X, Y, and Z, the multivalued dependence holds if and only if the set of Y-values matching a given (X-value, Z-value) pair depends only on the X-value and is independent of the Z-value.
Join An operation in which data is retrieved from more than one table.
Join dependency
A relation satisfies join dependency if and only if it is equal to the join of its projections on its component attributes.
Term Definition
The Relational Model Normalization
Normalization 3
Introduction 3
The theory of normalization is at the root of the relational model of database management. Normalization theory is constructed around the concept of normal forms. These normal forms define a system of constraints. If a relation meets the constraints of a particular normal form, then it is said to be in that form.
You can think of the normal forms as an onion, with the outermost layer being the set of all relations, including unnormalized relations. The figure that follows illustrates this. As you work your way to the core of the onion, you must pass through each lower normal form. As a result, a relation that has achieved fifth normal form has also achieved first, second, third, and fourth normal forms.
Figure 3-1 Layers of normalization.
5NF relations
BCNF relations
3NF relations
2NF relations
1NF relations
All relations
The Relational Model Normalization
By definition, a relational database is always normalized because its field values are always atomic. But to simply leave it at that invites a number of problems with redundancy and potential update anomalies, and that is why the higher normal forms were developed. The next topics describe normal forms and how to achieve them.
Introduction to the Teradata RDBMS for UNIX 3-5
The Relational Model First, Second, and Third Normal Forms
First, Second, and Third Normal Forms 3
Introduction 3
This topic describes the first three normal forms: what they are, why they are needed, and how to achieve them.
The first three normal forms are stepping stones to Boyce-Codd normal form and, when appropriate, the higher normal forms.
The next topic describes Boyce-Codd (BCNF) and higher normal forms.
First Normal Form 3
First normal form (abbreviated 1NF) is definitive for a relational database. All relations in a relational database must be in first normal form by definition.
A relation is said to be in first normal form if all its fields (simple domains in mathematics) are atomic. This means that a field can contain one value and one value only. No hierarchies of data values are allowed. This concept is sometimes referred to as the elimination of repeating groups from a relation.
The formal definition is as follows: For a relation to be in first normal form, the relationship between the primary key of the relation and each of the other attributes must be one-to-one (in that direction). In other words, all underlying simple domains of the relation contain atomic values only.
The nonkey attributes are said to be functionally dependent on the key.
Note: a nonkey attribute is any attribute that is not part of the primary key for the relation.
Second Normal Form 3
Second normal form (abbreviated 2NF) deals with the elimination of circular dependencies from a relation.
A relation is said to be in second normal form if it is in 1NF and every nonkey attribute is fully dependent on the entire primary key.
The formal definition is as follows: For a relation to be in second normal form, the relationship between any portion of the primary key of a relation and each of the other columns must not be one-to- one (in that direction). In other words, the nonkey columns are fully functionally dependent on the key.
3-6 Introduction to the Teradata RDBMS for UNIX
The Relational Model First, Second, and Third Normal Forms
Third Normal Form 3
Third normal form (abbreviated 3NF) deals with the elimination of nonkey attributes that do not describe the primary key.
The formal definition is as follows: For a relation to be in third normal form, the relationship between any two nonprimary key columns or groups of columns in a relation must not be one-to-one in either direction. In other words, the nonkey columns are nontransitively dependent upon each other and the key. No transitive dependencies implies no mutual dependencies.
Attributes are said to be mutually independent if none of them is functionally dependent on any combination of the others. This mutual independence ensures that individual attributes can be updated without any danger of affecting any other attribute in a row.
Introduction to the Teradata RDBMS for UNIX 3-7
The Relational Model Boyce-Codd and Higher Normal Forms
Boyce-Codd and Higher Normal Forms 3
Introduction 3
When the relational model of database management was originally proposed, it only addressed the first three normal forms. Later work with the model showed that 3NF required further refinement to ensure that update anomalies would never occur.
This topic describes Boyce-Codd normal form and briefly mentions fourth and fifth normal forms for completeness.
Boyce-Codd Normal Form 3
Third normal form does not handle situations in which a relation has multiple composite candidate keys with overlapping attributes. To eliminate these problems, Codd developed the so-called Boyce- Codd normal form (BCNF), which reduces to 3NF whenever the special situation that defines this problem does not apply.
A relation is in BCNF if and only if every determinant is a candidate key. This means that only determinants are candidate keys.
Fourth Normal Form 3
A relation is said to be in fourth normal form (4NF) if and only if whenever there is a multivalued dependency in the relation (for example, say X multiply determines Y) then all attributes of the relation are also functionally dependent on X.
In practice, the need for 4NF is rarely seen.
Fifth Normal Form 3
So far it has been possible to normalize relations by decomposing them into two of its projections. In rare occasions, simple projections are not sufficient to decompose a nonnormal relation into two relations. In these rare instances, Fifth Normal Form (5NF) is used to decompose the unnormalized relation into three or more projections of the original relation.
A relation is said to be in fifth normal form (5NF - sometimes called projection-join normal form, or PJ/NF) if and only if every join dependency in the relation is a consequence of the candidate keys of the relation.
This makes 5NF the final possible normal form to be achieved by taking projections and using joins. It is guaranteed to be free of all anomalies that can be removed by taking projections, but not necessarily of all possible anomalies.
3-8 Introduction to the Teradata RDBMS for UNIX
The Relational Model Referential Integrity
Referential Integrity 3
Introduction 3
Referential integrity (RI) is a key concept for the relational model.
RI is defined by the Referential Integrity Rule, which states that a relational database cannot contain any unmatched foreign key values.
Enforcing RI in the Teradata RDBMS 3
To implement RI in the Teradata RDBMS, you have three choices:
Use the referential constraint checks supplied by the database software
Write your own, site-specific macros. Enforce constraints through application code.
Primary and Foreign Keys 3
For review, a primary (parent) key is the candidate key selected to identify each tuple in a relation uniquely.
A foreign key is a (possibly composite) attribute of one relation whose values are required to match those of the primary key of some other relation.
Indexes 3
An index is a special