Teradata - Architecture of Teradata

download Teradata - Architecture of Teradata

of 36

  • date post

    14-Apr-2017
  • Category

    Technology

  • view

    216
  • download

    5

Embed Size (px)

Transcript of Teradata - Architecture of Teradata

  • Introduction To Teradata

    *NCR Teradata Internal Use only*

  • Teradata Company HighlightsFounded 1979 West LAFirst product to market 1984First Terabyte system 1987Acquired by AT&T and merged with acquired NCR 1992Tri-vested as part of NCR - 1997Teradata Corporation (re)Launched October 1, 2007Global Leader in Enterprise Data WarehousingEDW/ADW Database TechnologyAnalytic SolutionsPositioned in Gartners Leaders Quadrant in data warehousing since 1999Top 10 U.S. publicly-traded software companyS&P 500 MemberListed NYSE: TDC2007 - $1.7B revenue

    *NCR Teradata Internal Use only*

    [Enter any extra notes here; leave the item ID line at the bottom]Avitage Item ID: {{E3648B2F-FB1B-499B-B91B-8871943BA5EE}}

  • Continuous (R)evolution

    Hardware+ Database+ Consulting+ Data models and reports+ Analytic applications

    *NCR Teradata Internal Use only*

  • Continuous (R)evolution

    Sell the HW, give everything else awaySell the SW with some HW to run on Sell solving business problems and technology to solve themSell applications with consulting, SW and HW inside

    *NCR Teradata Internal Use only*

  • Continuous (R)evolution

    90% R&D 10% integration8028670% R&D 30% integrationi48620% R&D 80% integrationPentium10% R&D 90% integrationXeon Quad Core

    *NCR Teradata Internal Use only*

  • ScaleEvery dimension of the technology must scale to meet todays requirementsData, Data model complexity, Users, Performance, queries, Data loading, What is a big Data Warehouse?Total spinning disk?2.5 PetabytesBig table?150 billion rowsNumber of tables?300,000Insert/Update per day?5 billion recordsIdentified users?100,000Queries per day?5 millionData Turnover rate?1TB per 5 seconds

  • The Problem* > 09/2009Accts. Payable

    Accts. Receivable

    Invoicing

    Sales/Orders

    Finance G/L

    Customer Support

    HR

    Payroll

    Purchasing

    Order Fulfillment

    Manufacturing

    Inventory MarketingSupply ChainFinanceRisk ManagementMaintenanceSalesOperationsInventoryCall Center

    Operational Systems Decision Makers

    *NCR Teradata Internal Use only*

  • The EDW SolutionAccts. Payable

    Accts. Receivable

    Invoicing

    Sales/Orders

    Finance G/L

    Customer Support

    HR

    Payroll

    Purchasing

    Order Fulfillment

    Manufacturing

    Inventory

    Enterprise Data Warehouse(EDW)MarketingSupply ChainFinanceRisk ManagementMaintenanceSalesOperationsInventoryCall Center Operational Systems Decision Makers

    *NCR Teradata Internal Use only*

  • Active Enterprise IntelligenceAn Obvious Trend: More Speed, More Users

    DaysSeconds

    Strategic IntelligenceOperational IntelligenceEnterprise Data WarehouseBI Tools & reportsAnalysis & visualizationPredictive AnalyticsEDW Enterprise IntegrationMixed workload managementSOA, BPMS, IDEsPortals/composite applications

  • Active Enterprise Intelligence enabled by anActive Data Warehouse

    *NCR Teradata Internal Use only*

  • Active Enterprise Intelligence in Retail Detecting Retail Fraud

    SituationThieves make copies of cash register receipts, walk into the store, pick up merchandise, and return items for cash. ProblemAssociates in returns department did not have historical POS receipt retrieval access to verify against previously returned receipts or to do returns without receipts.SolutionAssociates query Teradata to quickly check if a return has already occurred on that receipt number. Also used by analysts to understand and prevent excessive returns.

    Impact(for 500-store chain)100% ROI in 5 monthsStopped a crime ring on the first day of rolloutCost savings have been huge

    *NCR Teradata Internal Use only*Retail Fraud is a $16 B year problem in the USA alone. With web receipts and better copying capabilities, thieves can make multiple copies of a single receipt and make multiple returns for cash or other merchandise. Or they can bring back shoplifted items and try to exchange for cash.

    The problem is that often the associates in Returns department dont have access to past sales information and cant keep track easily of returned merchandise. This is especially problematic if the policy is to make returns without receipts.

    So the solution is straightforward: hook up the Point of Sale systems so within seconds, the Teradata data warehouse is updated with sales, return, exchange, and void data, and provide the Returns department with the entire history of purchases by that customer,, so they can ensure that a sold product can only be returned once.

    The impact? Huge, according to one Teradata customer who has already built this system. They stopped a crime ring in the first day of their rollout, a group that had defrauded the company of thousands of dollars. They saw a 100% payback on their investment in just 5 months, and continue to reap the benefits of this example use of Active Enterprise Intelligence.

  • Active Enterprise Intelligence in RetailSingle View of the Customer Across All Channels

    SituationNeeded to add Web channel for selling shoes. ProblemToo much time and cost to keep multiple customer systems synchronized. Realized they needed just one customer database, not one more for the Web, in addition to Call Center, and POS/Store databases.SolutionAdopted an ADW strategy, moved all customer data to one Teradata system, revised data models to cover all channels, added web channel for commerce, used web services, added TASM to handle multiple workload types

    Impact1M tactical hits to the EDW per day from the POS, Call Center, and Web with 0.11 sec response timeRuns simultaneously with back-office BI, reports, and ETL workloadsEliminated all other customer data systems

    *NCR Teradata Internal Use only**

  • What is the Measure of a Great Architecture?Handle huge changes of underlying technologies and dependent components while continuing to deliver the key value proposition.

  • *NCR Teradata Internal Use only*

  • Processor RoadmapCPU power radically increasing200320052009201190nm process 45nmprocess65nmprocess 32nmprocess22nmprocessHyper-ThreadingDual CoreMulti Core20002008+SPECInt20005XSINGLE-COREPERFORMANCEDUAL/MULTI-COREPERFORMANCE2004

    *NCR Teradata Internal Use only*

  • What Does Shared Nothing Mean?1985 Every hardware part, every line of software pure shared nothing1995 Multiple units of parallelism sharing CPU, memory2004 Multiple units of parallelism sharing multiple cores, memory2009 Multiple units of parallelism sharing same physical spindles but still not sharing dataFuture Multiple units of parallelism in Virtual machines/cloud not even knowing what physical machine it is on or sharing

    * > 09/2009Copyright Teradata 2007-2009 All rights Reserved

    Copyright Teradata 2007-2009 All rights Reserved

  • Teradata MPP Server ArchitectureNodesIncrementally scalable to 1024 nodesOperating SystemLinux, Windows, UnixStorageIndependent I/OScales per nodeBYNET InterconnectFully scalable bandwidthConnectivityFully scalableChannel ESCON/FICONLAN, WANServer ManagementOne console to view the entire system

    SMP Node1SMP Node2SMP Node3SMP Node4

    Server ManagementDual BYNET Interconnects

    *NCR Teradata Internal Use only*

    [Enter any extra notes here; leave the item ID line at the bottom]

    Avitage! Item ID: {{33DC1405-7316-423E-B269-8F92054D20CE}}

  • Shared Nothing - Dividing the WorkVirtual processors (vprocs) do the workTwo typesAMP: owns and operates on the dataPE: handles SQL and external interactionConfigure multiple vprocs per hardware nodeTake full advantage of SMP CPU and memoryEach vproc has many threads of executionMany operations executing concurrentlyEach thread can do work for any user, transactionSoftware is equivalent regardless of configurationNo user changes as system grows from small SMP to huge MPP

    *NCR Teradata Internal Use only*

  • Shared Nothing - Dividing the WorkBasis of Teradata scalabilityEach AMP owns an equal slice of the diskOnly that AMP reads that sliceNo single point of control for any operationI/O, Buffers, Locking, Logging, DictionaryNothing centralizedExponential communication costs avoided

    AMPs

    LogsLocksBuffersI/O

    # NodesCoordination costTeradata

    *NCR Teradata Internal Use only*

  • Teradata Data DistributionRows automatically distributed evenly by hash partitioningEven distribution results in scalable performanceDone in real-time as data are loaded, appended, or changed.Hash map defined and maintained by the system2**32 hash codes, 64K buckets distributed to AMPsPrime Index (PI) column(s) are hashedHash is always the same - for the same valuesNo reorgs, repartitioning, space management

    Table A Table B Table C

    *NCR Teradata Internal Use only*

  • Disk Capacity Exploding with Little Increase in Performance

    *NCR Teradata Internal Use only*(CLICK)In this chart, we have 3 different disk drive sizes, and you can see that per generation, disk drive bandwidth hasnt increased very much.(CLICK)As disk capacities get larger (36 GB 73 GB 146 GB) the performance per capacity ratio (Capacity vs. Disk Bandwidth on right side of chart) declines significantly.

    The key metric on this slide is performance per capacity (MB/ SEC/ GB)

    Look at this slide! Capacity is doubling, but throughput is diminishing! If you fill all the drives up with data, you will not have enough I/O or bandwidth!

    Choosing twice as much storage capacity in a configuration, but not increasing the number of physical disks (to keep I/O constant), wi