Teradata - hu-berlin.de

Click here to load reader

  • date post

    11-Feb-2017
  • Category

    Documents

  • view

    266
  • download

    1

Embed Size (px)

Transcript of Teradata - hu-berlin.de

  • Teradata

    Architecture, Technology, Scalabilty, Performance and Vision for Active Enterprise Data Warehousing

    Dr. Barbara SchulmeisterTeradata a Division of NCR

    [email protected]

    28. 6. 2005

  • Agenda

    History Definitions Hardware Architecture Fault Tolerance and High Availability Coexistence Operational System Tools and Utilities Data Distribution SQL Parser Active Data Warehouse Scalability

  • Born to be parallel!

    1984 1985 1986 1987 1988 1989 1990 1991 1992 1994...19931979...

    Teradata Corp.

    Founded

    DBCModel 1:

    First MPP System!

    First 100GB

    System!

    Product of the Year Forbes

    First 500GB

    System!

    First 700GB

    System!

    DBC Model 3

    Fastest Growing Small

    Company INC

    Magazine

    Fastest Growing Electronic Company

    Electronic Business

    DBC Model 4

    Leader in Commercial

    Parallel Processing

    Gartner Group

    First Terabyte System!

    more

    3+ TBSystem!

    Teradata Timeline Overview

    First Beta system shipped Christmas to Wells Fargo Bank

    Joint Venture with NCR for next generation systems

    Initial public offering on Wall Street

  • Teradata Timeline (II)

    Over 500 Production

    Data Warehouses Worldwide!

    First Vendorto Publish1TB TPC-D

    Benchmark!

    Teradata V2 on

    WorldMark 4300

    DWI VLDB Best Practice Award w/ ATT BMD:

    Data Warehouse and

    the Web

    DB Expo RealwareAward w/

    Union Pacific: Data Warehouse

    Innovations

    Only Vendorto PublishMulti-user TPC-Ds!

    #1 in MPP IDC

    Survey in Computer-

    world

    Teradata Version 2on NCR

    3555 SMP

    24TBData Warehousein Production!

    Demonstrated Worlds Largest Data Warehouse

    Database at 11TB!

    Teradata V2on WorldMark

    5100SMP & MPP

    100GBTPC-D

    Benchmark Leader!

    ...only NCRs Teradata V2 RDBMS hasproven it can

    scale Gartner

    Group

    1996 1997 ......1995

    more

  • Teradata Timeline (III)

    Database Programming

    and Design Award

    IT Award of Excellence

    V2R5 Teradata

    TDWI Solution Provider Best Practices in Data Warehousing

    TDWI Leadership in Data Warehousing Award

    DM Review World-Class Solution Award for business Intelligence

    IT Times Award DM Review 100 Award DM Review Readership Award Intelligent Enterprise Real Ware Award

    the commitment continues

    64 bit Teradata

    Industry leading TPCH at 1TB and 3TB

    Largest Data Ware-house system (176 node, 130 TB disk)

    Industry leading TPC-D benchmark for all volumesTeradata V2

    ported to Microsoft

    Windows NTTeradata attains

    99.98% availability

    1999...1998 2000 2001 2002 2003

    V2R6 Teradata

    2004 2005

    Linux

  • Alternative Approaches to Enterprise Analytics

    Sources

    Users

    DW

    Sources

    Users

    DW

    Marts

    Sources

    Users

    Marts

    Sources

    Users

    Middleware

    Data Mart Centric

    Virtual,Distributed,Federated

    Hub-and-Spoke DataWarehouse

    EnterpriseData

    Warehouse

    Requires corporate leadership and vision

    Business Enterprise view challenging

    Redundant data costs High DBA and

    operational costs Data latency

    Only viable for low volume access

    Meta data issues Network bandwidth and

    join complexity issues Workload typically placed

    on workstation

    Business Enterprise view unavailable

    Redundant data costs High ETL costs High App costs High DBA and

    operational costs

    Cons

    Single Enterprise Business View

    Data reusability Consistency Low Cost of Ownership

    Allows easier customization of user interfaces & reports

    No need for ETL No need for separate

    platform

    Easy to Build Organizationally

    Limit Scope Easy to Build

    Technically

    Pros

    Centralized Integrated Data With

    Direct Access

    Hub-and-Spoke Data Warehouse

    Leave Data Where it Lies

    Independent Data Marts

  • A Spectrum of Data Warehouse Architectures

    Sources

    Users

    DW

    Sources

    Users

    DW

    Marts

    Sources

    Users

    Marts

    Sources

    Users

    Middleware

    Data Mart Centric

    Virtual,Distributed,Federated

    Hub-and-SpokeData

    Warehouse

    EnterpriseData

    Warehouse

    Teradatas Advocated Data Warehouse

    Approach for 20 years, Since 1984!

    The goal: Any question, on any data, at any time.

  • Most time consuming steps:

    ll Full scan of big tablesFull scan of big tablesll Complexe joinsComplexe joinsll AggregationAggregationll Sorting Sorting

    Frequency of steps OLTP or DSS

    OLTP DSS

    Diffentiating OLTP - DSS

  • NCR Server

    Provide customers with growth opportunities and investment protection> Coexistence is enabled

    across five generations NCR 5400E & 5400H

    Servers NCR 4980 & 5380 Servers NCR 4950 & 5350 Servers NCR 4900 & 5300 Servers NCR 485X & 525X Servers

    NCR Server Generations

    485X&

    525X

    4900&

    5300

    4950&

    5350

    4980&

    5380

    5400E&

    5400H

    BYNET V2 / V3

  • NCR 5400 Server SMP

    5400E > 1 - 4 nodes> BYNET V2> ESCON & FICON for 3 and

    4 node configurations> Field Upgradeable to

    5400H1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    3rd Node

    4th Node

    Internal BYNET

    switches

    3GSM

    1st Node1st Node2nd Node2nd Node

    Ethernet Switches

    Three UPS Modules

    Up to 4 nodes within each cabinet

    Server Management

  • NCR 5400 Server MPP

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    1

    3

    Ethernet Switches

    BYNET V3 Switches

    Five UPS Modules

    Up to 10 nodes within each cabinet

    Server Management

    Continued rapid adoption of latest Intel Technology> Dual Intel Pentium Xeon EM 64T 3.6

    GHz processors with Hyper-Threading(32-bit and 64-bit capability)

    > 800 MHz front side bus Industry Standard Form Factor

    > Up to 10 nodes per cabinet> Integrated BYNET V3 (provides the

    capability to physical separate systems between 300-600 meters)

    > Integrated Server Management> N+1 UPS> Dual AC

    Multi-Generation Coexistence > Investment protection

  • Industry CPU Performance per Core

    0

    500

    1000

    1500

    2000

    2500

    3000

    2004 2005 2006 2007

    Year

    Itanium 2 1.6 Ghz130nm

    Power 4+1.45Ghz130 nm

    Ultrasparc 3130 nm1.6Ghz

    Power 5~1.9Ghz130 nm

    Xeon3.0Ghz 1M130nm

    Xeon 3.6 Ghz 90nm

    54000Xeon 2M L2 3.6 Ghz 90nm

    Xeon2M L2>3.6 Ghz 90nm

    Dual Core65 nm

    Next Gen Arch. Dual Core65 nm

    Multi Core45 nm

    Itanium 29M130nm

    Power 5+~2.5Ghz90 nm

    Montecito90nm

    TukwillaCommon Platform65nm

    Rock90nm

    Power 6~3Ghz65nm

    Relative CPU Performance per Core

    Symmetric Multi Threading (Hyper Threading)

    Dual Core

    Xeon

    Itanium

    Power

    Sparc

    Multicore, Multithreaded

    Relative CPU Performance based on multi-threading and multi-core roadmap capabilities

    www.spec.org: benchmarks SPECint2000 and SPECint_rate2000

  • Gartner Product Ranking 2004 ASEM

    FUJITSU HP HP HP IBM NCR SUN

    P

    rim

    e-p

    ow

    er

    HP

    9000

    Inte

    gri

    ty

    Pro

    lian

    t

    pS

    erie

    s

    Ter

    adat

    a

    Su

    nfi

    re

    PRODUCT 43 45 46 29 45 54 40

    The Product category (which was called Technology in previous ASEM updates) focuses on the performance and reliability/availability aspects of each platform. In this category Teradata received a very strong 93.5% of total possible points and leads the IBM pSeries with 74.35% by 44 points or 19%.

    Source Gartner 2004 ASEM Report

  • NCR Enterprise Storage 6842

    NCR Enterprise Storage 6842 Features> Two array modules per cabinet> 56, 73GB, 15K drives

    greater than 8 Terabytes of spinning disk per cabinet

    > Dual Quad Fibre Channel Controllers per array for performance and availability

    > Typical configuration is 4 NCR 5400 Server nodes per 3 6842 arrays 1.2 Terabytes of database space per node (RAID 1)

    > Supports RAID 1 and RAID 5> Support for MP-RAS and Microsoft Windows Server 2003

    environments

  • 192

    MP-RAS and Windows

    RAID-1 Only

    MPP: supports 2, 3, or 4 nodes per cabinet

    73GB 15K RPM

    DMX 2000 M2

    RAID -1 OnlyRAID Options

    MP-RAS and WindowsOperating Environment

    MPP: supports 1 or 2 nodes per cabinetTeradata Use

    73GB 15K RPMDisks

    96Maximum Teradata disks

    DMX 1000 M2EMC Model

    EMC Symmetrix DMX

    Enterprise Fit Storage Standardization Extended storage life

    through Redeployment

  • Assumption: Compute and Storage Balance

    A balanced configuration is one where the storage I/O subsystem for each compute node is configured with enough disk spindles, disk controllers, and connectivity so that the disk subsystem can satisfy the CPU demand from that node.

    A supersaturated configuration also can satisfy the CPU demand from that node although the extra I/O may be underutilized.> This is useful for investment protection on certain

    upgrade paths.

    All system configurations discussed in this presentation are based on balanced or supersaturated compute nodes.

  • Node CPU and Storage I/O Balance

    Qu

    ery

    Resp

    on

    se T

    ime

    Best Query Response Time

    Effective NodeUtilization

    # of Disk Drives/Storage Capacity

    I/O

    Ban

    dw

    idth

    M

    B/

    sec

    OptimumNode/Storage

    Balance and

    Response Time

    Query R

    esponse Time

    Band

    widt

    h

    95% Node Utilization

    Industry wide, disk drive capacity is increasing at a faster rate than disk drive performance

  • Common Upgrades Applied

    Query Response Time Increases

    because you didnt add more compute power to support

    the additional raw data volume.

    SYSTEM with CURRENT Nodes

    Raw Data Volume

    Qu

    ery

    Resp

    on

    se T

    ime

    GROW RAW DATA VOLUME Performance more than adequate: Add more data to all nodes

  • LINEAR GROWTHMaintain Query Performance with more nodes

    Typical System Expansion

    SYSTEM with Current Nodes

    SYSTEM with More or Faster Nodes

    Query Response Time Remains Constant

    because you add proportionally more raw data volume as

    compute power.

    Scale out with Teradata by adding compute nodes,

    interconnect, storage arrays, and disks.

    aka horizontal scalability

    Raw Data Volume

    Qu

    ery

    Resp

    on

    se T

    ime

  • Common Upgrades Applied

    Query Response Time Decreases

    because you didnt add more raw data volume to

    offset the increase in compute power.

    Scale vertically with Teradata by increasing

    compute power.

    SYSTEM with Current Nodes

    SYSTEM with More or Faster Nodes

    Raw Data Volume

    Qu

    ery

    Resp

    on

    se T

    ime

    GROW QUERY PERFORMANCE Raw Data Volume adequate: Upgrade to faster CPUs

  • SYSTEM with Current Nodes

    SYSTEM with More or Faster Nodes

    Scale to Target query performance and data volume by increasing compute power

    and adding storage.

    Adjust Query Performance and Data Volume to match service level agreement

    Raw Data Volume

    Qu

    ery

    Resp

    on

    se T

    ime

    Combo: Upgrade Nodes and Increase Storage Per Node

  • Improve query performance and adjust data volume by

    reducing storage per node and adding more nodes.

    SYSTEM with Current Nodes

    SYSTEM with More Nodes

    Improve Query Performance and Adjust Data Volume to match service level agreement

    Raw Data Volume

    Qu

    ery

    Resp

    on

    se T

    ime

    Scaling by Reconfiguration and Expansion

  • CPU(s)

    CPU(s)

    CPU(s)

    CPU(s)

    CPU(s)

    CPU(s)

    CPU(s)

    CPU(s)

    Cache

    Cache

    Cache

    Cache

    Cache

    Cache

    Cache

    Cache

    Memory

    Memory

    Memory

    Memory

    Memory

    Memory

    Memory

    Memory

    Disk Storage

    Disk Storage

    Disk Storage

    Disk Storage

    Disk Storage

    Disk Storage

    Disk Storage

    Disk Storage

    BY

    NE

    T F

    ab

    rics

    Teradata Shared-Nothing MPP Designed for Slope of 1 Linear Scaling Optimized for very high data rates to/from disk Excellent performance and efficiency for data

    warehousing

    Architecture Determines Scalability

    Interconnect used only for database messages, no I/O or memory traffic

    All memory accesses are local

    CPU uses independent direct I/O path to Disk

  • Teradata MPP Architecture

    Nodes> Incrementally scalable to

    1024 nodes> Windows or Unix

    Storage> Independent I/O> Scales per node

    BYNET Interconnect> Fully scalable bandwidth

    Connectivity> Fully scalable> Channel ESCON/FICON> LAN, WAN

    Server Management> One console to view

    the entire system

    SMP Node1 SMP Node2 SMP Node3 SMP Node4

    Server Management

    CPU1 CPU2 CPU1 CPU2 CPU1 CPU2 CPU1 CPU2

    Dual BYNET Interconnects

    Memory Memory Memory Memory

  • Teradata Node SW Architecture (SMP) 4-Node MPP Clique

    Perfectly Tuned Nodes Working in Parallel for Scalability and Availability

    Node Software ArchitectureD

    isk A

    rray

    VP

    RO

    CS

    BYNET

    VPROCsAMP & PE

    VPROCsAMP & PE

    VPROCsAMP & PE

    VPROCsAMP & PE

    CommunicationInterfaces

    Access Module

    ProcessorVPROCS

    Parsing EngineVirtual

    Processors(VPROCS)

    UNIX, Windows 2000

    Parallel Database Extensions (PDE)

    PE2

    PE1 AMP1 AMP5

    AMP2 AMP6

    AMP3 AMP7

    AMP4 AMP8

    Channel Gateway

    LAN Gateway

    PEs recieve the queries and figure out the query planAMPs interact with the disk arrays and process the data

    Parser

    PE VProc

    Optimizer

    DispatcherSessionControl

    Relational Database Management

    File System / Data Management

    VAMP

  • Specifically designed for data warehousing workloads

    The Scalable BYNET Interconnect

    Multiple SimultaneousPoint-to-Point Messaging

    BroadcastMessaging

    Node

    Node

    Node

    Node

    Node

    Node

    Node

    Node

    Node

    The Teradata Optimizer

    chooses between Point-to-Pointand Broadcast

    Messaging to select the most effective communication.

    Bandwidth scales linearly to 1,024 nodes Redundant, fault tolerant network Guaranteed message delivery

    Node

  • Traditional Configuration

    X

    Built-In Integrated Fail Over

    Teradata provides built-in node failover.> Cost effective> Easy to deploy

    Work migrates to the remaining nodes in the cliques. System performance degradation up to 33%.

  • Node Node Node Node Node Node Node Node

    Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array

    86% SystemPerformance

    Continuity

    Large Cliques

    Double the number of nodes in a clique up to 8. Work distributed across a greater number of nodes. Minimize system performance impacts may not be

    noticeable to end-users.

    XFibre Channel

    Switches

  • Node Node Node Hot Standby

    Disk Array Disk Array Disk Array

    100% System Performance

    Continuity

    Hot Standby Nodes

    Work re-directed to a Hot Standby Node. No system performance impacts. Teradata restart can be postponed to a maintenance

    window.

    X

  • Node Node Node Node Node NodeNode Hot Standby

    Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array

    100% System Performance

    Continuity

    Large Clique + Hot Standby Node

    Same performance benefits of Hot Standby node. Reduced costs for larger system implementations.

    Fibre Channel Switches

    X

  • BYNET

    Server Nodes

    DiskArray Subsystem

    Clique

    High Availability

    Case Hardware Teradata

    Power FailureUPS (redundant),

    Dual ACNode Failure VPROC Migration (VAMP, PE)Bynet failure Redundant BYNET

    Disc failureRAID-1/-5/-S in Disc

    SubsystemMore than one Disc Failure Fallback-Option

    Clique Failure Fallback-Option

  • Coexistence Considerations

    VAMPs manage the same amount of data Coexistence enables the faster nodes to be realized by

    running more VAMPs per node

    1x 1.5x 2.0xPerformance Factor

    Generation xVAMPS

    AMP AMP AMPAMP

    AMP AMP AMPAMP

    Generation xVAMPS

    AMP AMP AMPAMP

    AMP AMP AMPAMP

    AMP AMP

    AMP AMP

    Generation xVAMPS

    AMP AMP AMPAMP

    AMP AMP AMPAMP

    AMP AMP AMPAMP

    AMP AMP AMPAMP

  • SMP Node1 SMP Node2 SMP Node3 SMP Node4

    AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP

    System Expansion with Teradata Coexistence

    The utilization of multiple generations of hardware within a single Teradata MPP system

    SMP Node1 SMP Node2 SMP Node3 SMP Node4

    Server Management

    AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP

    Dual BYNET Interconnects

    5380 - 9 AMPs per node 5400 - 12 AMPs per node

  • 11/2000Original Footprint

    8 Node 5250

    Customer Example (72 Nodes, 4 Generations)

    Generation A

    1/2001 Expansion 1

    12 Node 5250

    11/2001 Expansion 28 Node 5255

    6/2002 Expansion 312 Node 5300

    Generation B

    6/2003 Expansion 416 Node 5350

    Generation C

    6/2004 Expansion 5

    16 Node 5380

    Generation D

    2005Future 5400Expansion

    Generation E

  • Database and Operating System

    Database> Teradata V2R6> Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3)

    Unix> MP-RAS 3.03 required for Teradata Database on 5400> MP-RAS 3.02 still supported on previous server generations

    Microsoft Windows> Microsoft Windows Server 2003 recommended for new and expanding 5400 motions> Microsoft Windows 2000 supported in 2Q 2005

    485x/525x 4900/5300 4950/5350 4980/5380 5400

    V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02

    V2R6 on WS03 (2Q 2005)

    V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03

    V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005)

  • Analytic extensions such as Extended Windows Functions & Multiple Aggregate Distincts

    Random Stratified Sampling

    Join Elimination Extended Transitive

    Closure Partial Group By Early Group By Derived Table Rewrite Very Large SQL

    Tactical & Event-DrivenDecision Making Partial Covering Join Index Global Index Sparse Index Join Index Extensions ODS Workload Optimization Stored Procedures Enhancements

    Single Version ofthe Truth

    Cylinder Read Partitioned Tables (PPI) Value List Compression 2000 Columns, 64 Columns per Index Identity Column

    Index WizardStatistics Wizard & Collect Statistics Improvements

    Query LogExtreme Workload Management & Administration

    Roles and ProfilesSQL Assistant/Web Edition

    AvailabilityPerformance Dashboard & Reporting

    Trusted,Integrated Environment

    Strategic Decision Making Extended Grouping Inner Join Optimization Eliminate Unnecessary

    Outer Joins Hash Joins UDFs for Complex Analysis

    and Unstructured Data

    V2R5.0 Features

    Security enhancements (Encryption) DBQL enhancements Database Object Level Use Count ROLES enhancements Priority Scheduler enhancements TDQM enhancements No Auto Restart After Disk-Array Power

    Failure Cancel Rollback Incompatible Package Warning Disk I/O Integrity Check

    Enhancements to Triggers Extra FK-PK Joins in Join Index UDFs for XML Processing etc.

    Enhancement to Identity Column UTF16 Support PPI Dynamic Partition Elimination Large Objects (LOBs)

    + V2R5.1 Features

    Continuous Update Performance & Manageability

    Faster Join Index Update Join Update Performance Bulk Update Performance Teradata Warehouse Builder Full Functionality

    & Platform Support

    Continuous Update Performance & Manageability

    Faster Join Index Update Join Update Performance Bulk Update Performance Teradata Warehouse Builder Full Functionality

    & Platform Support

    Data FreshnessData Freshness

    UDFs for Data Transformation and Scoring

  • V2R6.0 Feature List

    Remove 1MB limit on plan cache size Increase response buffer to 1MB Table header expansion Improve Random AMP Sampling Top N Row Operation Recursive Queries

    Tactical & Event-Driven DecisionMaking Improve Primary Index Operations\ Improved IN-list processing External Stored Procedures Trigger calling a Stored Procedure Stored Procedure Internals Enhancements Queue tables

    Single View of Your Business

    Stored Procedure LOB support External Table Function Partition level BAR Eliminate indexed row IDs (PPI) PPI Join performance improvement DBS Information consolidation

    Trusted,Integrated Environment

    Strategic Decision Making

    Teradata Dynamic Workload Management

    Extensible User Authentication Directory Integration Global deadlock logging Faster Rollbacks

    Data FreshnessData Freshness

    Replication Services Array support Priority Scheduler enhancements Reduce restart time

  • Teradata Utility PakTeradata AdministratorTeradata SQL AssistantTeradata SQL Assistant/Web Edition BTEQODBCJDBCCLIOLE DB Provider

    Teradata Tools & Utilities (1)

    Teradata Database

    Any Query,Any Time

    .com

    Database ManagementTeradata ManagerTeradata Dynamic Query Manager Teradata System Emulation ToolTeradata Visual ExplainTeradata Index WizardTeradata Statistics Wizard

    MetadataTeradata Meta Data Services

    Load/UnloadTeradata Warehouse Builder FastLoad, MultiLoad & FastExportTeradata TPumpAccess Modules

    Mainframe ConnectivityMainframe Channel ConnectTS/API, CICS, HUTCNS & IMS/DC

  • Teradata Warehouse

    ParallelIn

    ParallelOut

    Technical Differentiator: Database Utilities

    Teradata Utilities Are Robust and Mature

    Teradata utilities are fully parallel. Teradata utilities have checkpoint restart capability. Data loads directly from the source into the database.

    > No manual data partitioning.> No file splitting.> No intermediary file transfers.> No separate data conversion step.

  • Teradata Tools & Utilities (2)

    Teradata Data Profiler Teradata CRM Teradata Warehouse Miner Teradata Demand Chain Management Teradata Supply Chain Management .....

    (LDM = Logical Data Model)

    Financial Solution LDM Retail LDM Communication LDM Insurance/Healthcare LDM Manufacturing LDM Government LDM Media and Entertainment LDM Travel/Transportation LDM

  • Two Basic Software Architecture ModelsTask Centric and Data Centric

    Uniform and shared accessto all platform resources(disk, etc) is REQUIRED

    Exclusive access to a subset of resources

    Data

    Data Data

    Data

    Parallel Unit Parallel Unit

    ParallelOptimizerTask

    Request

    DATA DATA

    Task

    Request

    SharedMemory

    Request Request

  • Data Centric Software: Teradata Virtual AMP

    Each virtual AMP has rows from every table Each virtual AMP works independently on its rows Goal: Database rows are equally distributed across

    multiple tables

    16151413121110987654321

    282726252423222120191817

    Table A Table B Table C

    2521171395

    26221814106

    27231915117

    28242016128

    1 2 3 4

    P

    DM

    P

    DM

    P

    DM

    P

    DM

    AMP 1 AMP 2 AMP 3 AMP 4

    Tables AMP - Balanced collection of three abstracted platform resources

    P

    DMMemory Disk

    Processor

  • Data distribution by Primary Index

    Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4

    Dual BYNET Interconnects

    Row Hash (32 bits)

    Primary Index value for a row

    Hashing Algorithm

    Hash Map

    Current configuration PrimaryCurrent configuration FallbackReconfiguration PrimaryReconfiguration Fallback

    Destination Selection Word(DSW) first 16 bits

  • Teradata Hashing

    7225

    32 bit Row Hash Remaining16 bitsDSW #

    Hashing Algorithm

    0000 0000 0001 1010 1100 0111 0101 1011

    AMP 1 AMP 2 AMP x

    7225 2 4/13 O

    AMP 3

    V V V V0 0 1 A

    (Hexadecimal)

    000 01 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 0

    HASH MAP

    0 1 2 3 4 5 6 7 8 9 A B C D E F

    001002003004005

    Bucket Number

    SELECT * FROM ORDERWHERE order_number = 7225;

    Table ORDERO r d e r

    N u m b e rC u s to m e r

    N u m b e rO r d e rD a te

    O r d e rS t a t u s

    P K

    U P I

    7 3 2 57 3 2 47 4 1 57 1 0 37 2 2 57 3 8 47 4 0 27 1 8 87 2 0 2

    231121312

    4 / 1 34 / 1 34 / 1 34 / 1 04 / 1 54 / 1 24 / 1 64 / 1 34 / 0 9

    OOCOCCCCC

  • Primary Index Choice Criteria

    ACCESSMaximize one_AMP operations: choose the column most frequently

    used for access

    DISTRIBUTIONOptimize parallel processing: choose a column that provides good

    distribution

    VolatilityReduce maintenance resource overhead (I/O): choose a column

    with stable data values

  • Data distribution by Primary Index - 2

    48 bit table ID 32 bit row hash value Index value

    Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4

    Dual BYNET Interconnects

    SQL requestParser algorithm

    Logical block identifier

  • SQL Parser Overview

    Request Parcel

    AMP Steps

    Statistics

    Data Parcel

    Generator

    GNCApply

    Resolver

    Optimizer

    Syntaxer

    Security

    Cached?

    DDDBase, AccRightsTVM, TVFieldsIndexes

    Costs

    Serial stepsParallel stepsIndividual and common steps (MSR)Additional:Triggers, check constraints, references, foreign keys, join indexescollected statistics or dynamic sampling

  • Statistics Summary

    Collect statistics on all non-unique indexes UPI of any table with less than x rows per AMP

    (dependent on available number of AMPs) All indexes of a join index any non-indexed column used for join constraints indexes of global temporary tables

    Collected statistics are not automatically updated by the system

    Refresh statistics when 5-10% of the table rows have changed

  • Database Workload Continuum

    EnterpriseData

    Warehouse

    Strategic Decision RepositoriesTransactional Repositories

    OLTPi OLTPnOLTP1 ODS1

    Tactical Decision Repositories

    ODS2

    Workload Continuum

    Transactional (OLTP)

    User Profiles Customers Clerks

    Services: Transactions Bookkeeping

    Access Profile: Frequent updates Occasional lookup

    Data: Current state data Limited history Narrow Scope

    Strategic (EDW)

    User Profiles Back Office Services Management Trading Partners

    Services: Strategic decisions Analytics (e.g. scoring)

    Access Profile: Bulk Inserts Some Updates Frequent complex analytics

    Data Model: Periodic state data Deep history Enterprise integrated view

    Tactical (ODS)

    User Profiles Front Line Services Customers - Indirectly

    Services: Lookups Tactical decisions Analytics (e.g. scoring)

    Access Profile: Continuous updates Frequent lookups

    Data Model: Current state data Recent history Integrated business areas

  • Database Workload Continuum

    ActiveData

    Warehouse

    Tactical and Strategic Decision RepositoriesTransactional Repositories

    OLTPi OLTPnOLTP1

    Workload Continuum

    Transactional (OLTP)

    User Profiles Customers Clerks

    Services: Transactions Bookkeeping

    Access Profile: Frequent updates Occasional lookup

    Data: Current state data Limited history Narrow Scope

    Strategic (EDW)

    User Profiles Back Office Services Management Trading Partners

    Services: Strategic decisions Analytics (e.g. scoring)

    Access Profile: Bulk Inserts Some Updates Frequent complex analytics

    Data Model: Periodic state data Deep history Enterprise integrated view

    Tactical (ODS)

    User Profiles Front Line Services Customers - Indirectly

    Services: Lookups Tactical decisions Analytics (e.g. scoring)

    Access Profile: Continuous updates Frequent lookups

    Data Model: Current state data Recent history Integrated business areas

  • Data Sophistication

    Continuous update and time-sensitive queries

    become important

    OPERATIONALIZINGWHAT IS happening?

    Event-based triggering takes hold

    ACTIVATINGMAKE it happen!

    Primarily batch andsome ad hoc reports

    Increase in ad hoc analysis

    ANALYZINGWHY

    did it happen?

    REPORTINGWHAT

    happened?Analytical modeling

    grows

    PREDICTINGWHAT WILL

    happen?

    Batch

    Ad Hoc

    Analytics

    Continuous Update/Short Queries

    Event-Based Triggering

    Data Warehouse Needs Will Evolve

    Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow

    Single View of the Business Better, Faster Decisions Drive Business Growth

    Wo

    rklo

    ad

    Co

    mp

    lexit

    y

    SimultaneousWorkloads:

    Strategic, tactical, loading

    Increasing depth and breadth of

    users and queries

    Increasing depth and breadth of data

  • Data Sophistication

    Continuous update and time-sensitive queries

    become important

    OPERATIONALIZINGWHAT IS happening?

    Event-based triggering takes hold

    ACTIVATINGMAKE it happen!

    Primarily batch andsome ad hoc reports

    Increase in ad hoc analysis

    ANALYZINGWHY

    did it happen?

    REPORTINGWHAT

    happened?Analytical modeling

    grows

    PREDICTINGWHAT WILL

    happen?

    Batch

    Ad Hoc

    Analytics

    Continuous Update/Short Queries

    Event-Based Triggering

    Data Warehouse Needs Will Evolve

    Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow

    Single View of the Business Better, Faster Decisions Drive Business Growth

    Wo

    rklo

    ad

    Co

    mp

    lexit

    y

    Measure

    Chasm from static to dynamic decision-

    making

    Optimize

    Execute

    Automate

    Understand

  • Database Requirement:Data Warehouse

    Foundation must handle multi-dimensional growth!

    Data Sophistication

    Continuous update and time-sensitive queries

    become important

    OPERATIONALIZINGWHAT IS happening?

    Event-based triggering takes hold

    ACTIVATINGMAKE it happen!

    Primarily batch andsome ad hoc reports

    Increase in ad hoc analysis

    ANALYZINGWHY

    did it happen?

    REPORTINGWHAT

    happened?Analytical modeling

    grows

    PREDICTINGWHAT WILL

    happen?

    Batch

    Ad Hoc

    Analytics

    Continuous Update/Short Queries

    Event-Based Triggering

    Data Warehouse Needs Will Evolve

    Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow

    Single View of the Business Better, Faster Decisions Drive Business Growth

    Wo

    rklo

    ad

    Co

    mp

    lexit

    y

  • The Multi-Temperature Warehouse

    Customers desire deep historical data in the warehouse.> The access frequency or average temperature of data

    varies. HOT, WARM, COOL, dormant

    > Seamless management required.

    Teradata systems can address this need through a combination of technologies, such as:> Partitioned primary index (PPI).> Multi-value compression.> Priority scheduler.

  • Priority Scheduler

    Three Tiers of Workload Management

    ADW

    Database Query Log

    Teradata DynamicQuery Manager

    Pre-Execution

    Post-Execution

    Control what and how much is allowed to begin execution

    Manage the level of resources allocated to different prioritiesof executing work

    (prioritized queues)

    Analyse query performance and behavior after completion

  • Teradata Dynamic Query Manager

  • Indexes

    PI (UPI and NUPI) SI (USI and NUSI) Join Index

    single table indexmulti table indexaggregated indexsparse index (where clause used)partial covering global

    Materialized Views (join index)

  • An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension

    Data Volume(Raw, User Data)

    SchemaSophistication

    QueryFreedom

    QueryComplexity

    DataFreshness

    Query Data Volume

    QueryConcurrency

    MixedWorkload

  • An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension

    Data Volume(Raw, User Data)

    SchemaSophistication

    QueryFreedom

    QueryComplexity

    DataFreshness

    Query Data Volume

    QueryConcurrency

    MixedWorkload

    TheTeradata

    Difference

  • Data Volume(Raw, User Data)

    Customers Need to Evaluate Real Life

    Workloads

    SchemaSophistication

    QueryFreedom

    QueryComplexity

    DataFreshness

    Query Data Volume

    The Teradata DifferenceMulti-dimensional Scalability

    QueryConcurrency

    MixedWorkload

    Good Example?TPC-H

    Benchmark

  • Companies generating >80% of the industry revenue utilize Teradata Data Warehousing

    Teradata Experiencein the Communications Industry

  • Some of Teradatas Retail Customers Worldwide

  • 65% of Top Global Retailers

    60% of Top Most Admired

    Global Companies

    80% of Top Global Telco Firms

    70% of Top Global Airlines

    50% of the Top Transportation

    Logistic Firms

    Teradata is Well-Positioned in the Top Global 3000 Industries

    Leading industries> Banking > Government> Insurance & Healthcare> Manufacturing > Retail> Telecommunications> Transportation Logistics> Travel

    World class customer list> More than 750 customers> Over 1200 installations

    Global presence > Over 100 countries

    FORTUNE Global Rankings, April 2005

  • Industry Leaders Use Teradata

    Leading industries> Banking > Government> Insurance & Healthcare> Manufacturing > Retail> Telecommunications> Transportation Logistics> Travel

    World class customer list> More than 750 customers> Over 1200 installations

    Global presence > Over 100 countries

    Teradata Global 400 Customers

    32% ofFinancial Services

    Industry

    54% of Retailers

    50% ofTransportation

    Industry

    50% of Telco Industry

    19% of Manufacturers

  • Data Volume(Raw, User Data)

    SchemaSophistication

    QueryFreedom

    QueryComplexity

    DataFreshness

    Query Data Volume

    QueryConcurrency

    MixedWorkload

    TheTeradata

    Difference

    www.teradata.com