1.1 Teradata Architecture

44
TERADATA Swapnil Mahalle (176191) [email protected]

description

1.1 Teradata Architecture

Transcript of 1.1 Teradata Architecture

Page 1: 1.1 Teradata Architecture

TERADATASwapnil Mahalle (176191)[email protected]

Page 2: 1.1 Teradata Architecture

What is Teradata? Teradata is a relational database management

system (RDBMS) that drives a company's data warehouse

Teradata is an ideal foundation for many applications, including:

Enterprise data warehousing Active data warehousing Customer relationship management Internet and E-Business Data Marts

Page 3: 1.1 Teradata Architecture

Enterprise Data Warehousing Teradata Database is ideal for enterprise

data warehousing, which is commonly characterized by: Multiple subject areas Many concurrent users Many concurrent queries, including ad-hoc

queries Large quantity of tables Hundreds of gigabytes (and terabytes) of detail

data Historical data stored (months or years)

Page 4: 1.1 Teradata Architecture

Active Data Warehousing Active Data Warehousing is the technical ability to

capture transactions when they change, and integrate them into the warehouse.

Active data warehouse must deliver performance, scalability, availability, and data freshness.

The Teradata Warehouse supports active data warehousing with: Capability to handle thousands of additional users and

mixed workloads. High availability and reliability to support mission-

critical applications. Scalability to accommodate an increase in the amount

of data, the number of data sources, and the number of applications supported in the data warehouse environment.

Page 5: 1.1 Teradata Architecture

Customer Relationship Management

CRM is the common terminology used to describe the managing of prospects all the way through the entire sales process. CRM is often an entire data system that can either be manipulated manually.

Teradata Database's detailed data and analysis capabilities to identify and optimize business relationships with the highest potential of profitability and growth.

Teradata's CRM solution, Teradata Relationship Manager, consists of software, professional and customer services, and the Teradata Database to create, maintain, and enhance customer relationships.

Page 6: 1.1 Teradata Architecture

Internet and E-Business The Teradata Database provides a single repository for

customer information that helps E-Businesses build and maintain one-to-one customer relationships that are critical to their success on the Internet.

The Teradata Database allows E-Businesses to: Capture massive amounts of click-stream data. Enable multiple users to ask complex questions of the

customer' click-stream data with near real-time response.

Protect customers' privacy with consumer opt-in/opt-out preferences and ability for consumers to check and revise their information stored on the Teradata Database through the Internet or a company call center.

Page 7: 1.1 Teradata Architecture

Data marts A data mart is a special purpose subset of a

company's enterprise data used by a particular department, function, or application.

Often, these single-subject area data marts contain data that was aggregated or transformed in some way to better handle the requests of a specific user community.

The Teradata Database is ideal for the logical data mart environment, where different user communities view subsets of a single repository of enterprise data.

Page 8: 1.1 Teradata Architecture

Unique Features Single data store Scalability Unconditional parallelism (parallel

architecture) Ability to model the business Mature, parallel-aware Optimizer

Page 9: 1.1 Teradata Architecture

Single Data Store

Page 10: 1.1 Teradata Architecture

Scalability"Linear scalability" means that as you add components to the system, the performance increase is linear.

HardwareSMP: Symmetric Multiprocessing PlatformMPP: Massively parallel processing systems

ComplexityTeradata is adept at complex data models that satisfy the information needs throughout an enterprise. It has the ability to perform large aggregations during query run time and can perform up to 64 joins in a single query.

Concurrent UsersTeradata can handle the most concurrent users, who are often running multiple, complex queries.

Page 11: 1.1 Teradata Architecture

Unconditional Parallelism

Teradata provides exceptional performance to achieve a single answer faster than a non-parallel system. Parallelism uses multiple processors working together to accomplish a task quickly.

Page 12: 1.1 Teradata Architecture

Ability To Model Business

A data warehouse built on a business model (truly normalized) contains information from across the enterprise. Individual departments can use their own assumptions and views of the data for analysis, yet these varying perspectives have a common basis for a "single version of the truth."

Page 13: 1.1 Teradata Architecture

Mature, Parallel-Aware Optimizer

Teradata's Optimizer is the most robust in the industry, able to handle:

Multiple complex queries Multiple Joins per query Unlimited ad-hoc processing

Page 14: 1.1 Teradata Architecture

DATAWAREHOUSING

Page 15: 1.1 Teradata Architecture

Evolution

Page 16: 1.1 Teradata Architecture

Various Stages of DW Reporting: The initial stage typically focuses on reporting from a single source

of truth to drive decision-making across functional and/or product boundaries. Analyzing: Users perform ad-hoc analysis, slicing and dicing the data at a detail

level, and are concerned with drilling down beneath the numbers on a report. Predicting: Sophisticated analysts heavily utilize the system to leverage

information to predict what will happen next in the business to proactively manage the organization's strategy. This stage requires data mining tools and building predictive models using historical detail.

Operationalizing: Providing access to information for immediate decision-making, in the field enters the realm of active data warehousing. Stages 1 to 3 focus on strategic decision-making within an organization. Stage 4 focuses on tactical decision support. Tactical decision support is not focused on developing corporate strategy, but rather on supporting the people in the field who execute it.

Active Warehousing: The larger the role an ADW plays in the operational aspects of decision support, the more incentive the business has to automate the decision processes. As technology evolves, more and more decisions become executed with event-driven triggers to initiate fully automated decision processes.

Page 17: 1.1 Teradata Architecture

Evolution of Data ProcessingAn RDBMS is used in the following main

processing environments: OLAP OLTP DSS

Page 18: 1.1 Teradata Architecture

Environments

Page 19: 1.1 Teradata Architecture

Data Marts A data mart is a special purpose subset of enterprise data used by

a particular department, function or application. Data marts may have both summary and detail data for a particular use rather than for general use.

Independent Data Marts Logical Data Marts Dependent Data Marts

Page 20: 1.1 Teradata Architecture

Data Marts

Page 21: 1.1 Teradata Architecture

A Teradata System A Teradata system contains one or more nodes. A node is a term for a processing unit under the control of a single

operating system.

Page 22: 1.1 Teradata Architecture

Node Components

Page 23: 1.1 Teradata Architecture

Software Components A Teradata node requires three distinct pieces of software: TPA, PDE, OS Parallel Database Extensions (PDE) The Parallel Database Extensions (PDE) software layer was added to

the operating system by NCR to support the parallel software environment.

Trusted Parallel Application (TPA) A Trusted Parallel Application (TPA) uses PDE to implement virtual

processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata TPA are:

AMP PE Channel Driver Teradata Gateway

Page 24: 1.1 Teradata Architecture

Parsing Engine A Parsing Engine (PE) is a vproc that manages the dialogue

between a client application and the Teradata Each PE can support a maximum of 120 sessions. Session Control Parser Optimizer Dispatcher

Page 25: 1.1 Teradata Architecture

AMP The AMP is a vproc that controls its portion of the data on the

system. The AMPs work in parallel, each AMP managing the data rows stored on its vdisk.

Data Distribution When data is loaded, inserted, and updated, the AMP:

Receives incoming data from the PE. Formats rows and distributes them on its vdisk. Data Access Returns responses over BYNET to dispatcher

Page 26: 1.1 Teradata Architecture

BYNET The BYNET (pronounced, "bye-net") is a high-speed interconnect

(network) that enables multiple nodes in the system to communicate.

Features: Scalable High performance Fault tolerant Load balanced

Page 27: 1.1 Teradata Architecture

Communication Between Nodes

Page 28: 1.1 Teradata Architecture

Communication Between Vprocs

Point-to-point Multicast Broadcast

Point-to-Point Messages

Page 29: 1.1 Teradata Architecture

Communication Between Vprocs Multicast Messages Broadcast Messages

Page 30: 1.1 Teradata Architecture

Cliques A clique (pronounced, "kleek") is a group of nodes that share

access to the same disk arrays. Each multi-node system has at least one clique. The cabling determines which nodes are in which cliques -- the nodes of a clique are connected to the disk array controllers of the same disk arrays.

Page 31: 1.1 Teradata Architecture

Database In Teradata, a "database" provides a

logical grouping of information.

Databases Tables Views Macros Triggers Stored procedures

Page 32: 1.1 Teradata Architecture

USER User: A Special Kind of Database

Page 33: 1.1 Teradata Architecture

Tables: A table in a relational database management system is a two-dimensional structure made up of columns and physical rows stored in data blocks on the disk drives.

Views: A view is like a "window" into tables that allows multiple users to look at portions of the same base data. A view may access one or more tables, and may show only a subset of columns from the table(s).

Macros: Macros are pre-defined, stored sets of one or more SQL commands and/or report-formatting (BTEQ) commands. Macros can also contain comments.

Triggers: A trigger is a set of SQL statements usually associated with a column or table that are programmed to be run (or "fired") when specified changes are made to the column or table. The pre-defined change is known as a triggering event, which causes the SQL statements to be processed.

Stored Procedures: A stored procedure is a pre-defined set of statements invoked through a single CALL statement in SQL. While a stored procedure may seem like a macro, it is different in that it can contain: Teradata SQL data manipulation statements (non-procedural)Procedural statements (in Teradata, referred to as Stored Procedure Language)

Teradata Objects

Page 34: 1.1 Teradata Architecture

Creating Databases and Users In Teradata, Databases (including special

category of Databases called Users) have attributes assigned to them:

Access Rights Perm Space Spool Space Temp Space

Page 35: 1.1 Teradata Architecture

Space Management Perm Space:

Here objects (database, tables, users, macro) are created and physically stored.

Evenly distributed among all the AMPs to ensure reasonable data distribution.

At the time of object creation, Teradata does not allocate space, rather assigned limit for perm space which is used dynamically by the objects.

Once the objects gets dropped or data gets deleted, perm space is freed.

Page 36: 1.1 Teradata Architecture

Space Management Spool Space:

Amount of space on the system not allocated to any object, which is used to store intermediate results for further processing , during a Teradata query execution.

Defining spool space is not required during the object creation, but is recommended to avoid consumption of all the available spaces by one query.

Once the query processing completes, spool space is freed.

Page 37: 1.1 Teradata Architecture

Space Management Temporary Space: The amount of space taken by global

temporary table during query processing. Perm space ,not yet occupied is used as Temp

space in Teradata. The result or inserted data remains available only

through the session. Temp space is freed upon session completion.

Page 38: 1.1 Teradata Architecture

Data Dictionary The Data Dictionary is a set of relational tables that

contains information about the RDBMS and database objects within it. It is the metadata or "data about the data" for a Teradata Database. The Data Dictionary resides in Database DBC. Some of the major items it tracks are:

Disk space Access authorizations Ownership Data definitions

Page 39: 1.1 Teradata Architecture

Data Protection LOCKS

Exclusive, Write, Read and Access RAID: Redundant Array of Inexpensive

Disks FALLBACK JOURNALS

Permanent and Recovery

Page 40: 1.1 Teradata Architecture

Locks Database Locks: Apply to all tables and views in the

database. Table Locks: Apply to all rows in the table. Row Hash Locks: Apply to a group of one or more rows in

a table.

Types : Exclusive Locks Write Read Access

Page 41: 1.1 Teradata Architecture

RAID 1 & RAID 5

Page 42: 1.1 Teradata Architecture

FALLBACK

Page 43: 1.1 Teradata Architecture

JOURNALS Permanent Journals

Optional, user specified, system maintained(Rollback transaction / recovery of DB)

DBA/User intervention required for recovery.

Recovery Journals An interrupted transaction (Transient Journal) An AMP failure (Down-AMP Recovery Journal)

Page 44: 1.1 Teradata Architecture

THANK YOUSwapnil Mahalle (176191)[email protected]