Embed Size (px)
Transcript of TERADATA PPT
TERADATA OVERVIEW & UTILITIESKNOWLEDGE SHARING ON TERADATA
Introduction to TERADATA Flow through TERADATA UTILITIES with Lab ExamplesBteq Fast LoadFast ExportMuilti LoadTpumpComparative study of the Teradata loading utilities.
Teradata is a Relational Database Management System (RDBMS) that drives a companys data warehouse.Teradata is an open system, compliant with industry ANSI standards. It is currently available for the following operating systems: UNIX MP-RAS Windows 2000The ability to manage terabytes of data is accomplished using the concept of parallelism, wherein many individual processors perform smaller tasks concurrently to accomplish an operation against a huge repository of data. To date, only parallel architectures can handle databases of this size.
What is Teradata?
There are many reasons to choose Teradata as the preferred platform for enterprise data warehousing:Supports easy scalability from a small (10 GB) to a massive (100+TB) database.Automatic and even data distribution eliminates complex indexing schemes or time-consuming reorganizations.Designed and built with parallelism from day one .Single operational view of the entire MPP system and single point of control for the DBA (Teradata Manager).Teradata has been doing data warehousing longer than any other vendor.Why Teradata?
Start Smaller and Grow: One Experience
200-300 users Over 7500 users30 concurrent users Over 2000 concurrent users300 GB disk space Over 50 TB user data1.7 billion-row table Over 7.5 billion -row table200 queries per day Over 20,000 queries per day30M-row batch per night Over 500M-row batch per night1 main application Over 30 applications
BUT ONE REMAINS CONSTANSTScalability in a Production Environment
i . e, Through Put.ADVANTAGE of TERADATAEase of setup and maintenanceNo reorganization of data neededMost robust utilities in the industryLow cost of disk to data ratioEase in expanding the system
ADVANTAGE of TERADATA
SMP & MPP platformsBYNETDisk ArraysCliquesHot Standby nodesVirtual processorsRequest processingTeradata Database RASUI
TERADATA Architecture(Shared Nothing)
Each Teradata Node is made up of hardware and softwareEach node has CPUs, system disk, memory and adaptersEach node runs copy of OS and database SWNode Architecture(Shared Nothing)
BYNETThe BYNET performs the internal communication of the Teradata RDBMSAll communication between PEs and AMPs is done via the BYNETBYNET in TERADATABoardless BYNETSingle-node SMP systems use Board less Bynet ( or virtual BYNET software to simulate the Bynet hardware driver.
Disk ArraysA Disk Array is a configuration of disk drives that utilizes specialized controllers to manage and distribute data and parity across the Disks while providing fast access and data integrity.
CliqueA Clique is a set of Teradata nodes that share a common set of disk arrays.In the event of failure, all virtual processors can migrate to another available node in the clique.All nodes in the clique must have access to the same disk arrays.Disk Arrays & Clique
Hot Standby Nodes
The hot standby Node feature allows spare nodes to be incorporated into the production environment so that Teradata Database can take advantage of the presence of the spare nodes to improve availability and maintain performance levels.
Is Member of a Clique
Does not normally participate in the trusted parallel application(TPA).
Can be brought into the TPA to compensate for the loss of a node in the Clique
The versatility of Teradata Database is based on virtual processors(vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under Teradata Parallel Database Extensions(PDE) within the multitasking environment of the operating system.
The two types of vprocsPE Parsing Engine)AMP Access module processor
AMPThe AMP is a virtual processor designed for and dedicated to managing a portion of the entire database.An AMP will control some portion of each table on the system.It performs all database management functions such as sorting, aggregating and formatting data.The AMP receives data from the PE, formats rows and distributes them to the disk storage units it controls.The AMP also retrieves the rows requested by the PE.VprocPEA Parsing Engine (PE) is a virtual processor that manages the dialogue between the client application and the RDBMS.It interprets the SQL requests, receives input records and passes data.It is made of the following software components: Session Control, the Parser, the Optimizer and the Dispatcher
Data Store on Disks
Drag the side handles to change the width of the text block.
Answer Set Response
The major Teradata utility that assists in data warehousing management and maintenance along with the Teradata RDBMS are BTEQ FASTLOAD FAST EXPORT MULTILOAD TPUMP
TERADATA UTILITIES INTRODUCTION
General-purpose, command-based program that allows users on a workstation to communicate with one or more Teradata Database systems.A set of SQL statements used to inserts updates or deletes in Teradata tables.Imports data to Teradata database from a file. Exports data from table and formats the results and returns them to the screen, a file, or to a designated printer.Do report the error occurs but will not capture it as log.
BTEQ - Basic Teradata Query
Enter Teradata SQL statements to view, add, modify, and delete data.Enter operating system commands. Create and use Teradata stored proceduresBTEQ supports Teradata-specific SQL functions for doing complex analytical querying and data miningAll database requests in BTEQ are expressed in Teradata SQL. BTEQ also supports the conditional logic (i.e., "IF.THEN...") based on activity count or error code. It is useful for batch mode export / import processing. Error handling is applicable in BTEQ. We can assign error level for each error code and make decisions based on the level assigned.
Capabilities in BTEQ
Interactive modeyou start a BTEQ session by entering BTEQ logon at the system prompt on your terminal and submit SQL commands to the database as needed. Format of logon cmd: bteq .logon server name/user_name, passwordBatch modeIn batch mode, you prepare BTEQ scripts or macros, and then submit them to BTEQ from a scheduler or manually for processing.A BTEQ script is a set of SQL statements and BTEQ commands saved in a file with the extension ".bteq". The BTEQ script can be run using the following command (in UNIX or Windows)
OPERATING MODES in BTEQ
Export BTEQ by default delivers a response to all SQL queries that includes a helpful message along with helpful diagnostic information about the time taken to perform the query. If all of this information is captured in a single output file, this mixed output typically renders the data unsuitable for some other purposes. So the .EXPORT feature provides the ability to separate the report or output data to a separate file. The output file of this script will contain only the messages and not the data. It is exported to a file which can be used for some other purposes also.Export types are export record , export data, export reset , export indicdata, export dif
EXPORT in BTEQ
Import data from host to Teradata as a series of inserts updates and deletes.
Import types supported are import dataimport recordimport indicdata.
IMPORT in BTEQ
All the BTEQ commands must be preceded by a dot . character and also BTEQ commands may or may not end with a semicolon ;.
They are of four types asSession controlFile controlFormat controlSequence control commands
Ad hoc query tool .
Database administration .
Best for small data volumes.BTEQ Advantages
Lab.sh#! /bin/sh.logon tdprd/username, pwd; .Export report File=lab.txt .set record vartext "|";.BEGIN LOADING emp ERRORFILES Error_1, Error_2; DEFINE empno(VARCHAR (50)), empname (VARCHAR (50)), doj(VARCHAR (30)) FILE = /ngs/app/asrdedwp/SCRIPTS/emp.txt; .Set Underline Off;.Set Titledashes Off;.Set Errorout Stdout;.Set Width 4000;select * From table_name where ;Delete from table_name where.;Insert into table name values(.);Update table name set where..;Call macro, procedure etc .if errorcode 0 then .exit 2.export reset.logoff.quit
Bteq Lab Exercise
FastLoad- Fload or FL is a multi - sessioned parallel load utility for initial table load in bulk mode on a Teradata Database.
It is a command-driven utility to load large data into an empty table on a Teradata RDBMS with no secondary indexes.
It uses multiple database sessions to load data.
Full Restart capability.Checkpoints provided for restart.Checkpoints slow fast load processing. Set the checkpoint large enough to be taken every 10 to 15 minutes.Two Error tables and Error Limits, accessible using SQL.In one Error table, rows which failed due to constraints or translation errors are loaded. In another table duplicate rows for UPIs are captured.Error table is loaded with one row at a time, so errors slow down the performance of fastload.