Teradata 15assets.teradata.com/pdf/TUGS/Presentations/2015/... · –Smarter use of memory data...

Teradata 15.10 Enterprise Class Agile Analytics

2015

2

• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency

• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the

highest performance and system efficiency

• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance

• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business

performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,

advanced columnar

• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations

with SLA performance across tenants

Teradata Database 15.10 Teradata Database 15.10 Enterprise Class Agile Analytics

© 2015 Teradata

3

© 2015 Teradata

4

INTEGRATED DATA WAREHOUSE

TERADATA DATABASE

DATA PLATFORM

HADOOP

Teradata Database 15 – Teradata QueryGrid Leverage analytic resources, reduce data movement

• Parallel Bi-directional

data transfer

• Push-down processing

• Native Analytics on

Target system

• Easy configuration of

server connections

• Simplified Server

Grammar

• Adaptive Optimizer

© 2015 Teradata

5

• Pick Your Best-of Breed

Technology:

– File systems

– Operating systems

– Data types

– Analytic engines

– Economic options

• With Different Characteristics:

– CPU centric

– I/O centric

– Data volume centric

– Workload characteristics and

volume

– Availability/DR

– Service Level Agreements

Data Fabric Vision Enabled by QueryGrid Analytic Flexibility to meet your business needs

Users direct their queries to a single cohesive data fabric

Focus on data and business questions, not integrating separate systems

© 2015 Teradata

6

Analytic Extensibility Teradata QueryGrid: Teradata-Aster

Aster functions appear as extensions to the Teradata Database

TERADATA ASTER

DATABASE

TERADATA DATABASE

SELECT * FROM AsterExecute(on clickstream USING

RemoteQuery(

‘SELECT * FROM SESSIONIZE(

ON %s partition by("partition_id") order

by("clicktime") timecolumn ("clicktime")

timeout("3000") ’

)

SUBSTITUTE(' "->''''')

@sdll8340) AS DT;

A clickstream table on Teradata is exported to Aster where the SESSIONIZE SQL/MR is executed and the answer set returned to Teradata.

“Send, Execute, Return ” in one step

© 2015 Teradata

7

• Most efficient overall query plan derived from reliable statistics – Statistics dynamically collected from foreign

data

– Incremental query plans generated for single and multi-system queries

– Consistent Optimizer approach for queries within and between systems

– Teradata systems “transfer” query plans between systems

• A fully automatic optimizer feature – users don’t have to change anything

Adaptive Optimizer Incremental Planning & Execution of smaller query fragments

Better Query Plan

Foreign and Sub-Queries

Why?

Unreliable statistics can result in less-than-optimal query plans

Some analytic systems, like Hadoop, don’t keep data statistics

Statistics not designed for compatibility between databases

How?

Pulls out remote server requests and single-row and scalar non-correlated sub-queries from a main query

Plans and executes them

Plugs the results into the main query

Plans and executes the main query

∑

© 2015 Teradata

8

Incremental Planning and Execution(IPE) 15.10

• Enhance IPE functionality to collect dynamic statistics on – Remote/Foreign Table Operators – Local Table Operators – Table Functions

• Dynamic Statistics – Required statistics summary are dynamically collected during the spool

building stage and used to optimize remainder of the query - RowCount, HighModeFrequency, Unique Values

• Improves performance with remote/local table operators and table functions – Optimizes QueryGrid workload

• Fully automatic

© 2015 Teradata

9

How IPE Works

• Recognizes the remote/local table operators and table functions in the query

• If opportunities to use dynamic statistics of the table operators and functions, IPE plan is triggered – Cost thresholds from static planning are not applicable

• Request fragments are generated with table operators and functions

• Plan fragments are generated and executed

• Dynamic statistics are collected when generating output rows

• Collected statistics are sent to the optimizer

• Optimizer applies the statistics feedback and plans the remainder of the query

© 2015 Teradata

10

Teradata QueryGrid: Roadmap

Product One-Way Bi-Directional Push-Down

Teradata – Hortonworks Hadoop P (14.10) P (15.00) P (15.00)

Teradata – Cloudera Hadoop 1Q 2015 (14.10) Planned Planned

Teradata – MapR Hadoop Planned Planned Planned

Teradata – Oracle P (14.10) P (15.00) P (15.00)

Teradata - Teradata 1Q 2015 (15.00) 1Q 2015 (15.00) 1Q 2015 (15.00)

Teradata - Aster 1Q 2015 (15.00) 1Q 2015 (15.00) 1Q 2015 (15.00)

Teradata - MongoDB 1Q 2015 (15.00) Planned Planned

Aster - Teradata P P P

Aster - Hortonworks Hadoop P Planned Planned

Aster - Cloudera Hadoop P Planned Planned

All schedules subject to change.

As of 1/2015 Find the latest on the QueryGrid InfoHub

© 2015 Teradata

https://connections.teradata.com/community/infohub/teradata-querygrid-infohub

11

Data Fabric Enabled by QueryGrid “Turn a best of breed solution environment into an Orchestrated Analytical Ecosystem”

• Teradata-Teradata: Scale Economically with Architecture Flexibility – Blend multiple systems to match needs; data value, usage, CPU/IO, SLA’s,

economics

– Ease of administration; single view of cross-platform resource utilization (EXPLAIN), maintain security, common administration tools

– High performance operation; pass workload controls; system-specific Optimizer plans

• Teradata-Aster: Analytic Extensibility – Seamlessly extends Aster functions to data in the Teradata Database

– Functions transparently “send and execute” data from Teradata to Aster for complex processing

– Tightly integrated to operate like a single system

• Adaptive Optimizer and Enterprise Fit – Dynamic Statistics collected on data returned from external server

– Teradata Incremental Planning creates execution plan in phases

– Teradata security and Kerberos authentication protect remote data

© 2015 Teradata

12







advanced columnar



Teradata Database 15.10 Enterprise Class Agile Analytics

© 2015 Teradata

13

© 2015 Teradata

14

In-Memory Optimization and Vectorization “Advanced engineering accelerates innovation with Teradata Intelligent Memory”

• Pipelining & New In-Memory Structures Improve Efficiency and Performance – Pipelining data between steps in memory without disk I/O

– New in-memory table structures improve efficiency and performance in data movement and processing

– Move sets of rows into memory and auto-compress repeated values

• Exploit CPU Instructions and Cache to Free Memory Bandwidth – Apply row qualification in parallel instead of row at a time by using Vector

Processing - Optimized to take advantage of new instructions in Intel’s Haswell processor

– In-memory hash join using vector processing for performance

– Minimize data movement in and out of memory - Data held in CPU cache during bulk qualification

• Data Temperature Measurement Aligned with Business Priorities – New temperature weightings for Tactical and Strategic I/O’s

© 2015 Teradata

15

• Technology – Processors are changing

• It is more than the Amount of Memory

• It is about improving memory bandwidth

• It is about improving processor cache effectiveness

• Increasing memory should improve Disk I/O

• Improving the bandwidth and cache effectiveness should improve cost per instruction – throughput and response time

Why is In Memory Important?

© 2015 Teradata

16

• Configuring nodes with large memory does not guarantee processor performance

• Data locality must be improved

• Optimized data structures are needed

• Algorithms with better memory access patterns are needed

In-Memory Optimizations

© 2015 Teradata

17

Data Temperature Measurement Aligned with Business Priorities

• New temperature weightings vary by workloads – Tactical and Strategic workloads have different impacts on data temperature

– Tactical workloads heat data faster

– New temperature weightings better align workloads and data temperature with business priorities

© 2015 Teradata

18

Pipelining & Advanced Use of Memory Query Pipelining & New In-Memory Table Structures

Without Pipelining With Pipelining • Improves Performance with fewer disk I/O’s • Optimizes Memory Bandwidth • Improves CPU Throughput

Node

Disk

Node

Disk

New in-memory table structures hold data as column partitioned

to reduce size and store data in the way the CPU accesses it

© 2015 Teradata

19

Exploit CPU Instructions and Cache Bulk Qualification & Vectorization

1, 3, 5, 7

1 3 5 7

5

5 15 25 35

5 5,15 5,15,25 5,15,25,35

12 Operations 4 Loads, 4 Multiplications, 4 Stores

Load

Multiply

Store

Only 3 Operations 1 Load, 1 Multiplication, 1 Store

1, 3, 5, 7

1,3,5,7

5

5,15,25,35

5,15,25,35 Bulk

Load

Vector

Multiply

Bulk

Store

Memory

Input

Result

CPU

R1

R2

R3

Input

Result

CPU

R1

R2

R3

Memory

© 2015 Teradata

20

Major Home Improvement Retailer Teradata Intelligent Memory

Impact

• With only 4% Memory:CDS ratio, 65% of I/O’s served by memory

• 99.9% of tactical queries from Memory I/O

• 80% of TIM benefit was ELT (transformmations)

• No negative impact to FSG Cache

Goal Improve performance by adding

memory to minimize physical I/Os

and increase overall system

throughput

1st – Increase Memory

Double Memory 512GB per node

Results: 15% decrease in physical I/O

2nd – Intelligent Memory

TIM impact: 36% better than just

memory alone

Overall: 46% decrease in physical

I/Os

Doubling Memory:

Reduce physical I/O by 15%

Intelligent Memory: Decrease physical I/O by 46%

© 2015 Teradata

21

A Leading Health Service Company Teradata Intelligent Memory

System was a CPU bound Enterprise Class Warehouse

Customer first doubled memory and saw some nominal improvements

After 2 months the customer turned on Teradata

Intelligent Memory

Physical I/O CPU

By ‘intelligently’ addressing memory Teradata was able to

give them a substantial portion of their investment back

Same Workload, No Customer Intervention © 2015 Teradata

22







advanced columnar




© 2015 Teradata

23

JSON: Right Approach In Any Environment

Business User

• Well understood data

• Relational integrity

• Storage efficiency

IT Professional

• Dynamic data

• Reduced coordination

• Human readable

Relational Model Schema On Read

Teradata 15 offers both

© 2015 Teradata

24

Teradata 15.0 JSON Integration

JSON data type P

Schema on read, late binding P

SQL queries via JSONPath .dot notation P

Regular Expressions P

Statistics, JIs for performance P

Compression P

Geospatial conversion P

Publishing/shredding P © 2015 Teradata

25

What’s New in Teradata 15.10 • Binary JSON data storage formats

– Improved query performance – BSON used by MongoDB

- Efficient data acquisition from MongoDB

– UBJSON - More optimized for numeric storage

• Usability Enhancements – New built-in functions to find characteristics and metadata of JSON data – New conversion routines

• Same dynamic data support, database integration regardless of format – JSON (text) storage format – BSON binary storage format – UBJSON binary storage format

© 2015 Teradata

CREATE TABLE MyJSONTable( Id INTEGER ,jsonCol JSON ,BsonCol JSON STORAGE FORMAT BSON ,UbjsonCol JSON STORAGE FORMAT UBJSON);

26

JSON vs. BSON Format in Teradata Load Time vs. Query Speed

JSON BSON

DATA

UNICODE, Human readable

{ “CompanyInfo” :

{ “company” : “Teradata”, “departments” : [

{ “Sales” : { “employees” : [ { “name” : “John Young”, “age” : 24, “position” : “salesman” }, { “name” : “Matthew Wright”, “age” : 34, “position” : “salesman” }, { “name” : “Martin Frank”, “age” : 44, “position” : “manager” } ] } }

] }

}

DATA

Binary, Structured for search Not Human readable

000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 10111 0111000 11001 1000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 10111 0111000 11001 1000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 101

Faster Queries

Faster Loading

from BSON source

or and

© 2015 Teradata

27

Store JSON Data for Performance

Web browsers

Sensors

JSON

BSON

Fast Ingest

Fast Query

MongoDB

Fast Ingest

Internet of Things

© 2015 Teradata

28







advanced columnar




© 2015 Teradata

29

Teradata 14 Raised the Question of “When to Use Which?”

•A subset of columns

accessed in most queries

•Low data maintenance (i.e.,

updates/deletes) rate

•Compression desired

• I/O constrained system

•Many columns referenced in

each query

•Data frequently updated or

deleted

•Compression not needed

•CPU constrained system

•Need a PI on the table

Columnar Store Row Store

© 2015 Teradata

30

• Queries access varying subsets of the columns of table or Queries of the table are selective – Best if both occur for queries

• Data can be loaded with large INSERT-SELECTs

• There is no or little update/delete maintenance

• Primary Index is not needed – No primary key access via primary index – No direct merge join via primary index

- Direct hash joins, product joins, and rowid joins only - Otherwise, spool/sort

– No direct aggregation on primary index

Teradata 14.0 When to Use Columnar?

© 2015 Teradata

31

What’s New

© 2015 Teradata

• Primary AMP (CPPA) on Columnar Tables – New innovative internal access method of columnar tables

– Fast direct access to AMP with data during query execution

– Reduced overhead by not sorting on AMP

– Excellent for columnar table design - Fast tuned operational queries

- Compact and efficient columnar data storage

• Primary Index on Columnar Tables – PI is the most direct access to a data row during query execution

– Often used as qualification for very fast operational queries

– PI now available for fast operational queries on columnar tables

• Update in place and physical delete of column partitions – Automatic with some columnar data

– Other columnar databases write entire new row for updates

– Evaluate design choices, requires sub-row columnar format

– Space automatically immediately reclaimed when entire partition deleted

32

Teradata 15.10: Shifting the Balance • Operational/tactical queries at full speed along with reduced I/O on

strategic queries on columnar data

• Joins at full speed on columnar data

• Faster updates of columnar data

– Columnar databases generally have poor update performance

– “Update in place” for some Teradata Columnar data rather than writing new copy

• Reduced the trade-offs, let the benefits shine

– New features allow Teradata Columnar to be used in more situations

– Columnar benefit from reduced I/O while retaining other performance features

• New design use cases:

– Vertically partition frequently used and rarely used columns

– Finer granularity: columns frequently used in predicate or projection

© 2015 Teradata

34

Gartner Trends Software-Defined Anything

“Software Defined is a collective term that encapsulates the growing market momentum toward improved standards for infrastructure programmability and data center interoperability, which is driven by automation inherent to cloud computing and fast infrastructure provisioning. Software Defined includes focus on infrastructure , networking , storage, and data centers.”

The Top 10 Strategic Technology Trends for 2014 Gartner - 2014

© 2015 Teradata

35

Rapid Delivery of Warehouse Services for Multi-tenant and Virtual Private Cloud Deployment

Supports Multi-National & Regulatory Compliance Requirements

Improved Security and Controls for Data Mart Consolidation

Customer B2B data and analytic warehouse services

Software Defined Warehouse At the Push of a Button

© 2015 Teradata

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.iconarchive.com/show/I-like-buttons-3a-icons-by-mazenl77/Perspective-Button-Stop-icon.html&ei=UJPGVIXkOoODNsO6gfgO&bvm=bv.84349003,d.eXY&psig=AFQjCNHm8RAmma5flsEtMhazlbtNaFgLKA&ust=1422386371139107

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://pixshark.com/pointing-finger-clip-art.htm&ei=VpTGVLPKD8aYNr-OgogM&bvm=bv.84349003,d.eXY&psig=AFQjCNGOM4D0bWbc1-qDZa_iLm5vM5_czg&ust=1422386564008856

36

Secure Zones

Access controls for user, database, and database objects within secure zones.

Workload Management

Allocates CPU, I/O system resources to virtual partitions, workloads, and users.

Building Blocks of the Software Defined Warehouse

© 2015 Teradata

37

Secure Zones supports grouping of user & database hierarchies into separate database partitions with restriction of user access to one or more partitions

Secure Zones Overview

• Isolation of tenants from each other

as if they were running on physically

segregated databases

• Users within a tenancy have no

access or visibility to objects within

other tenancies

• Consolidation of multiple tenants into

one instance of a database system

© 2015 Teradata

38

Software Defined Warehouse

Subsidiary

Tenant

Zone 1

Tenant

Zone 2

Tenant

Zone 3

System Admin

Zone 1 DBA

Zone 2 DBA

Zone 3 DBA

System

DBA

Users

Users

Users

Subsidiary

Tenant

Zone 1

Tenant

Zone 2

Tenant

Zone 3

Corporate

Zone

Zone 1 DBA

Zone 2 DBA

Zone 3 DBA

System Admin

Users

Users

Users

Users

© 2015 Teradata

39

• Delivering a more consistent user experience – Minimum Response Time allows the system to manage to SLAs.

- No more and no less.

• Align queue with business critical workloads – Prioritized Delay Queue allows workload definitions to affect queue order

instead of just the traditional FIFO

• New Amp Worker Task controls – Amp Worker Task resource limits max AMP tasks by user or utility

• Optimized Back-up and Restores – More granular control of Data Stream Architecture (DSA) with separate rules

and allocations to support multiple jobs, work limits, and priority. – Amp worker tasks can now write straight to device bypassing layers of

protocols improving write performance.

Workload Management 15.10 Overview

© 2015 Teradata

40

Warehouse

Component

Software

Control

Details

Security Secure Zone Isolation of users, databases,

and database objects on the

same hardware

Space Secure Zone This is allocated when setting

the perm/temp/spool space for

the root DB of a Secure Zone

I/O & CPU Workload

Management

1% granularity to virtual

partitions, workloads, & users.

User Experience Workload

Management

Minimum Response Time

Prioritized Queue Delay

Fine Grained Warehouse Controls

© 2015 Teradata

41

• Enables faster data warehouse deployment with software controls of IO, Space, CPU, & Security.

• Increased agility to react to business changes.

• Helps expand customer opportunities by addressing key cloud, multi-tenancy, multi-national, and compliance security requirements for privileged user access, data location, and data segregation.

• Improves TCO and reduces carbon footprint for multi-use data warehouses where data, users, applications or workloads must be completely segregated.

• Better align system performance with business priorities and SLA using finer-grained warehouse control.

Benefits

© 2015 Teradata

42

Teradata 15.10 Supported Workload Specific Platforms

• Active EDW 5650, 6650, 6680, 6700, 6750+

• Data Warehouse Appliance 2650, 2690, 2700+

• Integrated Big Data Platform 1650, 1700+

• Data Mart Appliance 560, 670+

• Operating system releases – SLES 11 SP1

– SLES 10 SP3

© 2015 Teradata

43

Big

da

ta &

An

aly

ti

cs

Pe

rfo

rma

nc

e

Ec

osy

ste

m

Big Data & Analytics

R Integration SAS: Access to table resident metadata

Support XSLT-based Shredding Definition JSON Binary Storage Format (BSON) Right, Left, and Reverse String Functions

Performance Improvements

Load Isolation Partition-level Locking

MLOADX PMPC Improvements

In Memory Optimizations Reduce Parsing Time and IPE

TIM Phase 2 Reduce Spool (Simple Pipelining)

Smart LOBs (LOB optimizations) New PI on access rights table – Phase II

Ecosystem

UDA for Teradata SQL

DSA Enhancement - Support >5000 objects Support POLR with cluster changes

Extend the Lead CP with Primary AMP Index or Primary Index TASM Enhancements

Add Default Queryband Profile Option

Hardware Enabling

Support of Intel Software compression

Quality

CI Segregation (MI Rebuild) Check Table Enhancement Separate Error Codes for Deadlocks

Cleanup Agg Enh Distinct Group By Audit Trail of DBS Control Studio Enhancements Generate TVI alert containing panic dump data

Industry Compatibility

Auto-provisioning of External Users Secure Zones TDGSS – Single Mechanism to Logon Teradata Directory Service (Phase II) SQL Equivalents of FERRET output

Gateway - Support for No-wait I/O Serialization of UDT w/in UDFs 1 GB Memory Buffers for SORT - Phase II Phase II Trusted Sessions

TD 15.10 - Content 2Q2015GCA

© 2015 Teradata

Teradata 15assets.teradata.com/pdf/TUGS/Presentations/2015/... · –Smarter use of memory data...

Documents

Transcript of Teradata 15assets.teradata.com/pdf/TUGS/Presentations/2015/... · –Smarter use of memory data...