Teradata On Microsoft Azure Innovating Together Teradata Vantage Teradata Viewpoint (Multiple...

download Teradata On Microsoft Azure Innovating Together Teradata Vantage Teradata Viewpoint (Multiple Systems)

of 52

  • date post

    15-Mar-2020
  • Category

    Documents

  • view

    9
  • download

    0

Embed Size (px)

Transcript of Teradata On Microsoft Azure Innovating Together Teradata Vantage Teradata Viewpoint (Multiple...

  • 1

    Teradata On Microsoft Azure

    Innovating Together

    Analyze Anything, Anywhere

    Larry Quinn, Senior Solution Architect

    larry.quinn@teradata.com

  • 2

    • Ongoing joint engineering, collaboration,

    and alignment

    • 20 Engineer-Engineer Reviews

    • Best practices for Azure features

    • Security, data movement, hybrid, VNet

    interconnects

    • Heavy investment in training

    • >105 Azure certifications / accred.

    • >1,000 Azure additional by Q2 2019

    Teradata associates are committed!

    Teradata is a Microsoft Gold Partner

  • 3

    Teradata Refresher

  • 4

    Teradata MPP Server Architecture

    • Nodes • Incrementally scalable to 2048 nodes

    • Operating System • Linux (SUSE)

    • Storage • Independent I/O • Scales per node

    • BYNET Interconnect • Fully scalable bandwidth

    • Connectivity • Fully scalable • Traffic spread across all nodes • Channel – ESCON/FICON • LAN, WAN

    • Server Management • One console to view the entire system

    © 2016 Teradata Property of Teradata

    Shared- Nothing: MPP Architecture

    MPP Architecture

    CPU1 CPUn

    Memory

    Operating Sys

    CPU1 CPUn

    Memory

    Operating Sys

    CPU1 CPUn

    Memory

    Operating Sys

    CPU1 CPUn

    Memory

    Operating Sys

    Server Management

    Dual BYNET Interconnects

    LAN/WAN/Channel

  • 5

    • Even distribution results in scalable

    performance

    • Done in real-time as data are loaded,

    appended, or changed.

    • Hash map defined and maintained by the

    system

    • 2**32 hash codes, 1,048,576 buckets distributed to AMPs

    Teradata Data Management

    © 2016 Teradata Property of Teradata

    Rows automatically distributed evenly by hash partitioning

    • Primary Index (PI) column(s) are hashed

    • Hash is always the same - for the same

    values

    • No reorgs, repartitioning, space

    management

    Data Management

    AMP1

    Table A Table B Table C

    AMP2 AMP3 AMP4 AMPn…………………………………………

    Teradata Parallel Hash Function RowHash (Hash Bucket) Data Fields

    Primary Index

  • 6

    Query Execution The Life of a Teradata Query

    Parsing Engine

    Parser

    Gateway

    Request / Data Parcel

    Dispatcher

    AMP

    DBMS

    I/O MGT

    AMP

    DBMS

    I/O MGT

    AMP

    DBMS

    I/O MGT

    AMP

    DBMS

    I/O MGT

    Internet / Intranet

    Application

    ODBC

    JDBC

    BYNET

    Request/Data

    Parcel

    Response

    Parcel TCP/IP

    Query Steps

    Response

    Parcel

    Response

    Step x Step x Step x Step x

    Step x Response

    Response Response Response Response

    Response Response ResponseStep xStep xStep x

    © 2016 Teradata Property of Teradata

    Query

    Execution

    Step 1 Step 1 Step 1 Step 1

    Step 1Step 1Step 1Step 1

    Resp 1 Resp 1 Resp 1 Resp 1

    Resp 1 Resp 1 Resp 1 Resp 1Resp 1

    Step N Step N Step N Step N

    Step NStep NStep NStep N

    Resp N Resp N Resp N Resp N

    Resp N Resp N Resp N Resp NResp N

  • 7

    We’re Ranked #1… (and #1… and #1…)

    #1 Logical

    Data Warehouse

    #1 Real-time

    Data Warehouse

    #1 Traditional

    Data Warehouse

    2018 Gartner Critical Capabilities

    View source

    https://www.gartner.com/doc/reprints?id=1-4P8TVEJ&ct=180119&st=sb

  • 8

    Teradata Vantage – Any Language, Any Tool, All Your Data

    QueryGrid External Data Store Access

    NewSQL

    Python

    R

    SAS

    Java

    NewSQL

    DATA STORE H

    IG H

    S P

    E E

    D F

    A B

    R IC

    STORAGE ENGINES LANGUAGES

    Machine Learning

    Graph

    QueryGrid External Analytic Engine Access

    APP FRAMEWORK

    NOTEBOOKS and IDEs

    BI and VISUALIZATION

    ANALYTICS

    TOOLS

    Dataiku

    SAS

    AppCenter

    RStudio

    Jupyter

    Studio

    IBM Cognos

    MicroStrategy*

    Oracle

    Power BI

    Qlik

    Tableau

    TIBCO Spotfire

    Planned ML/G availability for IntelliCloud: AWS = March 2019 | Azure = May 2019 | Teradata Cloud = Sept 2019

  • 9

    Machine Learning Engine Functions

    Statistics Statistics (cont.) Path, Pattern and

    Time Series

    Association Text Graph

    AdaBoost

    Approximate Distinct Count

    Approximate Percentile

    CMAVG

    ConfusionMatrix

    ConfusionMatrixPlot

    Correlation

    CoxPH

    CoxPredict

    CoxSurvFit

    Cross Validation

    Distribution Matching

    EMAVG

    Enhanced Histogram

    Fmeasure

    GLM

    GLMPredict

    Hidden Markov Model

    Histogram

    KNN

    LARS Functions

    LinReg

    LRTEST

    Non-linear Kernel SVM

    Percentile

    Principal Component Analysis

    Random Sample

    ROC Curve

    Sample

    Shapley Value

    SMAVG

    Sparsesvm_predict

    Sparsesvm_trainer

    Sparse_model_printer

    Support Vector Machines

    VectorDistance

    VWAP

    WMAVG

    Basket_Generator

    Cfilter

    FPGrowth

    KNN Recommender

    WSRecommender

    Chinese Text Segmentation

    LDA Functions

    Levenshtein Distance

    Named Entity Recognition

    (CRF Model)

    Named Entity Recognition (Max

    Entropy Model)

    nGram

    PoSTagger

    Sentenizer

    Sentiment Extraction Functions

    Text Classifier

    Text_Parser

    TextChunker

    TextMorph

    TextTagging

    TextTokenizer

    TF_IDF

    AllPairsShortestPath

    Betweenness

    Closeness

    EigenvectorCentrality

    gTree

    LocalClusteringCoefficient

    LoopyBeliefPropagation

    Modularity

    nTree

    PageRank

    PersonalizedSALSA

    RandomWalkSample

    Arima

    ArimaPredictor

    Attribution

    Burst

    ChangePointDetection

    Causality Detection

    DTW

    DWT

    DWT2D

    FrequentPaths

    IDWT

    IDWT2D

    Interpolator

    nPath

    Path_Analyzer

    Path_Generator

    Path_Start

    Path_Summarizer

    SAX

    SAX2

    SeriesSplitter

    Sessionization

    Shapelets

    TimeSeriesOrders

    Unsupervised Shapelets

    VARMAX

    Data Transformation Cluster

    Antiselect

    Apache Log Parser

    Fast Fourier Transform

    FellegiSunterTrainer

    FellegiSunterPredict

    IdentityMatch

    IpGeo

    Inverse Fast Fourier Transform

    JSONParser

    Multicase

    MurmurHash

    Number as Categories

    OutlierFilter

    Pack

    Pivot

    PSTParserAFS

    Scale Functions

    StringSimilarity

    Unpack

    Unpivot

    URIPack

    URIUnpack

    XMLParser

    XMLRelation

    Canopy

    Categorical Clustering

    Gaussian Mixture Model

    KMeans

    KMeansPlot

    Minhash

    Decision Tree

    XGBoost

    Forest

    Forest_Evaluate

    Forest_Drive

    Forest_Predict

    Forest_Analyze

    Single_Tree_Drive

    Single_Tree_Predict

    System Functions

    nc_skew

    nc_relationstats

    Naïve Bayes

    Visualization naiveBayesMap naiveBayesReduce

    naiveBayes_text

    naiveBayes_text_predict

    naiveBayes_train

    naiveBayes_predict

    Location Analysis

    LoadGeometry

    PointinPolygon

    GeometryOverlay

    CfilterViz

    NpathViz

    Deep Learning

    Neural Networks

    Graph Engine Functions

    * Available with tdplyr 16.20 (See the Analytics Foundation Guide for a Complete Review of Analytic Operators http://docs.teradata.com/)

  • 10

    Teradata & Azure

  • 11

    The Demands Keep Rising

    Stay on the latest release ●

    Reduce technical debt ●

    Focus on answers ●

    Sleep at night ●

    • Move faster

    • Reduce CapEx

    • Be more responsive

    • Add capabilities more easily

    ??

    what

    to

    do

    ??

  • 12

    • Accelerate time-to-value – Avoid lengthy procurement process

    • Shift to OpEx – Often preferred over CapEx

    • Reduce financial risk – Start small, only grow when needed

    • Save money – Belief that cloud = cost savings

    Cloud Deployment Can Help

    ??

  • 13

    They’ve Done It – and You Can, Too

  • 14

    Deploy Teradata Anywhere Buy Any Way Move Anytime

    Vantage

    subscription-based

    licenses are

    PORTABLE

    Teradata Infrastructure

    Teradata Cloud

    AWS & Azure