Teradata 15assets.teradata.com/pdf/TUGS/Presentations/2015/... · –Smarter use of memory data...
Transcript of Teradata 15assets.teradata.com/pdf/TUGS/Presentations/2015/... · –Smarter use of memory data...
Teradata 15.10 Enterprise Class Agile Analytics
2015
2
• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency
• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the
highest performance and system efficiency
• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance
• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business
performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,
advanced columnar
• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations
with SLA performance across tenants
Teradata Database 15.10 Teradata Database 15.10 Enterprise Class Agile Analytics
© 2015 Teradata
3
© 2015 Teradata
4
INTEGRATED DATA WAREHOUSE
TERADATA DATABASE
DATA PLATFORM
HADOOP
Teradata Database 15 – Teradata QueryGrid Leverage analytic resources, reduce data movement
• Parallel Bi-directional
data transfer
• Push-down processing
• Native Analytics on
Target system
• Easy configuration of
server connections
• Simplified Server
Grammar
• Adaptive Optimizer
© 2015 Teradata
5
• Pick Your Best-of Breed
Technology:
– File systems
– Operating systems
– Data types
– Analytic engines
– Economic options
• With Different Characteristics:
– CPU centric
– I/O centric
– Data volume centric
– Workload characteristics and
volume
– Availability/DR
– Service Level Agreements
Data Fabric Vision Enabled by QueryGrid Analytic Flexibility to meet your business needs
Users direct their queries to a single cohesive data fabric
Focus on data and business questions, not integrating separate systems
© 2015 Teradata
6
Analytic Extensibility Teradata QueryGrid: Teradata-Aster
Aster functions appear as extensions to the Teradata Database
TERADATA ASTER
DATABASE
TERADATA DATABASE
SELECT * FROM AsterExecute(on clickstream USING
RemoteQuery(
‘SELECT * FROM SESSIONIZE(
ON %s partition by("partition_id") order
by("clicktime") timecolumn ("clicktime")
timeout("3000") ’
)
SUBSTITUTE(' "->''''')
@sdll8340) AS DT;
A clickstream table on Teradata is exported to Aster where the SESSIONIZE SQL/MR is executed and the answer set returned to Teradata.
“Send, Execute, Return ” in one step
© 2015 Teradata
7
• Most efficient overall query plan derived from reliable statistics – Statistics dynamically collected from foreign
data
– Incremental query plans generated for single and multi-system queries
– Consistent Optimizer approach for queries within and between systems
– Teradata systems “transfer” query plans between systems
• A fully automatic optimizer feature – users don’t have to change anything
Adaptive Optimizer Incremental Planning & Execution of smaller query fragments
Better Query Plan
Foreign and Sub-Queries
Why?
Unreliable statistics can result in less-than-optimal query plans
Some analytic systems, like Hadoop, don’t keep data statistics
Statistics not designed for compatibility between databases
How?
Pulls out remote server requests and single-row and scalar non-correlated sub-queries from a main query
Plans and executes them
Plugs the results into the main query
Plans and executes the main query
∑
© 2015 Teradata
8
Incremental Planning and Execution(IPE) 15.10
• Enhance IPE functionality to collect dynamic statistics on – Remote/Foreign Table Operators – Local Table Operators – Table Functions
• Dynamic Statistics – Required statistics summary are dynamically collected during the spool
building stage and used to optimize remainder of the query - RowCount, HighModeFrequency, Unique Values
• Improves performance with remote/local table operators and table functions – Optimizes QueryGrid workload
• Fully automatic
© 2015 Teradata
9
How IPE Works
• Recognizes the remote/local table operators and table functions in the query
• If opportunities to use dynamic statistics of the table operators and functions, IPE plan is triggered – Cost thresholds from static planning are not applicable
• Request fragments are generated with table operators and functions
• Plan fragments are generated and executed
• Dynamic statistics are collected when generating output rows
• Collected statistics are sent to the optimizer
• Optimizer applies the statistics feedback and plans the remainder of the query
© 2015 Teradata
10
Teradata QueryGrid: Roadmap
Product One-Way Bi-Directional Push-Down
Teradata – Hortonworks Hadoop P (14.10) P (15.00) P (15.00)
Teradata – Cloudera Hadoop 1Q 2015 (14.10) Planned Planned
Teradata – MapR Hadoop Planned Planned Planned
Teradata – Oracle P (14.10) P (15.00) P (15.00)
Teradata - Teradata 1Q 2015 (15.00) 1Q 2015 (15.00) 1Q 2015 (15.00)
Teradata - Aster 1Q 2015 (15.00) 1Q 2015 (15.00) 1Q 2015 (15.00)
Teradata - MongoDB 1Q 2015 (15.00) Planned Planned
Aster - Teradata P P P
Aster - Hortonworks Hadoop P Planned Planned
Aster - Cloudera Hadoop P Planned Planned
All schedules subject to change.
As of 1/2015 Find the latest on the QueryGrid InfoHub
© 2015 Teradata
11
Data Fabric Enabled by QueryGrid “Turn a best of breed solution environment into an Orchestrated Analytical Ecosystem”
• Teradata-Teradata: Scale Economically with Architecture Flexibility – Blend multiple systems to match needs; data value, usage, CPU/IO, SLA’s,
economics
– Ease of administration; single view of cross-platform resource utilization (EXPLAIN), maintain security, common administration tools
– High performance operation; pass workload controls; system-specific Optimizer plans
• Teradata-Aster: Analytic Extensibility – Seamlessly extends Aster functions to data in the Teradata Database
– Functions transparently “send and execute” data from Teradata to Aster for complex processing
– Tightly integrated to operate like a single system
• Adaptive Optimizer and Enterprise Fit – Dynamic Statistics collected on data returned from external server
– Teradata Incremental Planning creates execution plan in phases
– Teradata security and Kerberos authentication protect remote data
© 2015 Teradata
12
• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency
• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the
highest performance and system efficiency
• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance
• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business
performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,
advanced columnar
• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations
with SLA performance across tenants
Teradata Database 15.10 Enterprise Class Agile Analytics
© 2015 Teradata
13
© 2015 Teradata
14
In-Memory Optimization and Vectorization “Advanced engineering accelerates innovation with Teradata Intelligent Memory”
• Pipelining & New In-Memory Structures Improve Efficiency and Performance – Pipelining data between steps in memory without disk I/O
– New in-memory table structures improve efficiency and performance in data movement and processing
– Move sets of rows into memory and auto-compress repeated values
• Exploit CPU Instructions and Cache to Free Memory Bandwidth – Apply row qualification in parallel instead of row at a time by using Vector
Processing - Optimized to take advantage of new instructions in Intel’s Haswell processor
– In-memory hash join using vector processing for performance
– Minimize data movement in and out of memory - Data held in CPU cache during bulk qualification
• Data Temperature Measurement Aligned with Business Priorities – New temperature weightings for Tactical and Strategic I/O’s
© 2015 Teradata
15
• Technology – Processors are changing
• It is more than the Amount of Memory
• It is about improving memory bandwidth
• It is about improving processor cache effectiveness
• Increasing memory should improve Disk I/O
• Improving the bandwidth and cache effectiveness should improve cost per instruction – throughput and response time
Why is In Memory Important?
© 2015 Teradata
16
• Configuring nodes with large memory does not guarantee processor performance
• Data locality must be improved
• Optimized data structures are needed
• Algorithms with better memory access patterns are needed
In-Memory Optimizations
© 2015 Teradata
17
Data Temperature Measurement Aligned with Business Priorities
• New temperature weightings vary by workloads – Tactical and Strategic workloads have different impacts on data temperature
– Tactical workloads heat data faster
– New temperature weightings better align workloads and data temperature with business priorities
© 2015 Teradata
18
Pipelining & Advanced Use of Memory Query Pipelining & New In-Memory Table Structures
Without Pipelining With Pipelining • Improves Performance with fewer disk I/O’s • Optimizes Memory Bandwidth • Improves CPU Throughput
Node
Disk
Node
Disk
New in-memory table structures hold data as column partitioned
to reduce size and store data in the way the CPU accesses it
© 2015 Teradata
19
Exploit CPU Instructions and Cache Bulk Qualification & Vectorization
1, 3, 5, 7
1 3 5 7
5
5 15 25 35
5 5,15 5,15,25 5,15,25,35
12 Operations 4 Loads, 4 Multiplications, 4 Stores
Load
Multiply
Store
Only 3 Operations 1 Load, 1 Multiplication, 1 Store
1, 3, 5, 7
1,3,5,7
5
5,15,25,35
5,15,25,35 Bulk
Load
Vector
Multiply
Bulk
Store
Memory
Input
Result
CPU
R1
R2
R3
Input
Result
CPU
R1
R2
R3
Memory
© 2015 Teradata
20
Major Home Improvement Retailer Teradata Intelligent Memory
Impact
• With only 4% Memory:CDS ratio, 65% of I/O’s served by memory
• 99.9% of tactical queries from Memory I/O
• 80% of TIM benefit was ELT (transformmations)
• No negative impact to FSG Cache
Goal Improve performance by adding
memory to minimize physical I/Os
and increase overall system
throughput
1st – Increase Memory
Double Memory 512GB per node
Results: 15% decrease in physical I/O
2nd – Intelligent Memory
TIM impact: 36% better than just
memory alone
Overall: 46% decrease in physical
I/Os
Doubling Memory:
Reduce physical I/O by 15%
Intelligent Memory: Decrease physical I/O by 46%
© 2015 Teradata
21
A Leading Health Service Company Teradata Intelligent Memory
System was a CPU bound Enterprise Class Warehouse
Customer first doubled memory and saw some nominal improvements
After 2 months the customer turned on Teradata
Intelligent Memory
Physical I/O CPU
By ‘intelligently’ addressing memory Teradata was able to
give them a substantial portion of their investment back
Same Workload, No Customer Intervention © 2015 Teradata
22
• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency
• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the
highest performance and system efficiency
• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance
• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business
performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,
advanced columnar
• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations
with SLA performance across tenants
Teradata Database 15.10 Teradata Database 15.10 Enterprise Class Agile Analytics
© 2015 Teradata
23
JSON: Right Approach In Any Environment
Business User
• Well understood data
• Relational integrity
• Storage efficiency
IT Professional
• Dynamic data
• Reduced coordination
• Human readable
Relational Model Schema On Read
Teradata 15 offers both
© 2015 Teradata
24
Teradata 15.0 JSON Integration
JSON data type P
Schema on read, late binding P
SQL queries via JSONPath .dot notation P
Regular Expressions P
Statistics, JIs for performance P
Compression P
Geospatial conversion P
Publishing/shredding P © 2015 Teradata
25
What’s New in Teradata 15.10 • Binary JSON data storage formats
– Improved query performance – BSON used by MongoDB
- Efficient data acquisition from MongoDB
– UBJSON - More optimized for numeric storage
• Usability Enhancements – New built-in functions to find characteristics and metadata of JSON data – New conversion routines
• Same dynamic data support, database integration regardless of format – JSON (text) storage format – BSON binary storage format – UBJSON binary storage format
© 2015 Teradata
CREATE TABLE MyJSONTable( Id INTEGER ,jsonCol JSON ,BsonCol JSON STORAGE FORMAT BSON ,UbjsonCol JSON STORAGE FORMAT UBJSON);
26
JSON vs. BSON Format in Teradata Load Time vs. Query Speed
JSON BSON
DATA
UNICODE, Human readable
{ “CompanyInfo” :
{ “company” : “Teradata”, “departments” : [
{ “Sales” : { “employees” : [ { “name” : “John Young”, “age” : 24, “position” : “salesman” }, { “name” : “Matthew Wright”, “age” : 34, “position” : “salesman” }, { “name” : “Martin Frank”, “age” : 44, “position” : “manager” } ] } }
] }
}
DATA
Binary, Structured for search Not Human readable
000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 10111 0111000 11001 1000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 10111 0111000 11001 1000 1 100 1011111111 000100 11111 00 0 001010 0101010 10000 1101111001 101101 01111000 1 0 01010 110 010001 110010 1010 110 1010 010101 1010 101010 101010001 11001 10101 101110 0110 1010 1011010 1010000 111 10111 101
Faster Queries
Faster Loading
from BSON source
or and
© 2015 Teradata
27
Store JSON Data for Performance
Web browsers
Sensors
JSON
BSON
Fast Ingest
Fast Query
MongoDB
Fast Ingest
Internet of Things
© 2015 Teradata
28
• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency
• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the
highest performance and system efficiency
• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance
• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business
performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,
advanced columnar
• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations
with SLA performance across tenants
Teradata Database 15.10 Teradata Database 15.10 Enterprise Class Agile Analytics
© 2015 Teradata
29
Teradata 14 Raised the Question of “When to Use Which?”
•A subset of columns
accessed in most queries
•Low data maintenance (i.e.,
updates/deletes) rate
•Compression desired
• I/O constrained system
•Many columns referenced in
each query
•Data frequently updated or
deleted
•Compression not needed
•CPU constrained system
•Need a PI on the table
Columnar Store Row Store
© 2015 Teradata
30
• Queries access varying subsets of the columns of table or Queries of the table are selective – Best if both occur for queries
• Data can be loaded with large INSERT-SELECTs
• There is no or little update/delete maintenance
• Primary Index is not needed – No primary key access via primary index – No direct merge join via primary index
- Direct hash joins, product joins, and rowid joins only - Otherwise, spool/sort
– No direct aggregation on primary index
Teradata 14.0 When to Use Columnar?
© 2015 Teradata
31
What’s New
© 2015 Teradata
• Primary AMP (CPPA) on Columnar Tables – New innovative internal access method of columnar tables
– Fast direct access to AMP with data during query execution
– Reduced overhead by not sorting on AMP
– Excellent for columnar table design - Fast tuned operational queries
- Compact and efficient columnar data storage
• Primary Index on Columnar Tables – PI is the most direct access to a data row during query execution
– Often used as qualification for very fast operational queries
– PI now available for fast operational queries on columnar tables
• Update in place and physical delete of column partitions – Automatic with some columnar data
– Other columnar databases write entire new row for updates
– Evaluate design choices, requires sub-row columnar format
– Space automatically immediately reclaimed when entire partition deleted
32
Teradata 15.10: Shifting the Balance • Operational/tactical queries at full speed along with reduced I/O on
strategic queries on columnar data
• Joins at full speed on columnar data
• Faster updates of columnar data
– Columnar databases generally have poor update performance
– “Update in place” for some Teradata Columnar data rather than writing new copy
• Reduced the trade-offs, let the benefits shine
– New features allow Teradata Columnar to be used in more situations
– Columnar benefit from reduced I/O while retaining other performance features
• New design use cases:
– Vertically partition frequently used and rarely used columns
– Finer granularity: columns frequently used in predicate or projection
© 2015 Teradata
33
• Data Fabric Enabled by Teradata QueryGrid – New QueryGrid support for architectural flexibility, analytic extensibility, scale economically – Smart multi-system query planning for performance & efficiency
• In-Memory Optimization and Vectorization – Smarter use of memory data temperature, process pipelining, and new vector processing deliver the
highest performance and system efficiency
• Fast JSON Performance – Native BSON support to process the Internet-of-Everything – Agile data acquisition with fast late-binding query performance
• World’s Most Advanced Hybrid Row/Column Database – Changing the rules on Columnar use for diverse workloads and Operational Intelligence – Flexibility of design choices for Row, Column, and Hybrid Row/Column tables to match business
performance needs – Operational intelligence enhancements; partition-level locking, load isolation, reduced parsing time,
advanced columnar
• Software-Defined Warehouse – Securely manage multiple business units, divisions, or countries in one warehouse – Secure zones and TASM enhancements enable multi-tenancy for consolidation and efficient operations
with SLA performance across tenants
Teradata Database 15.10 Teradata Database 15.10 Enterprise Class Agile Analytics
© 2015 Teradata
34
Gartner Trends Software-Defined Anything
“Software Defined is a collective term that encapsulates the growing market momentum toward improved standards for infrastructure programmability and data center interoperability, which is driven by automation inherent to cloud computing and fast infrastructure provisioning. Software Defined includes focus on infrastructure , networking , storage, and data centers.”
The Top 10 Strategic Technology Trends for 2014 Gartner - 2014
© 2015 Teradata
35
Rapid Delivery of Warehouse Services for Multi-tenant and Virtual Private Cloud Deployment
Supports Multi-National & Regulatory Compliance Requirements
Improved Security and Controls for Data Mart Consolidation
Customer B2B data and analytic warehouse services
Software Defined Warehouse At the Push of a Button
© 2015 Teradata
36
Secure Zones
Access controls for user, database, and database objects within secure zones.
Workload Management
Allocates CPU, I/O system resources to virtual partitions, workloads, and users.
Building Blocks of the Software Defined Warehouse
© 2015 Teradata
37
Secure Zones supports grouping of user & database hierarchies into separate database partitions with restriction of user access to one or more partitions
Secure Zones Overview
• Isolation of tenants from each other
as if they were running on physically
segregated databases
• Users within a tenancy have no
access or visibility to objects within
other tenancies
• Consolidation of multiple tenants into
one instance of a database system
© 2015 Teradata
38
Software Defined Warehouse
Subsidiary
Tenant
Zone 1
Tenant
Zone 2
Tenant
Zone 3
System Admin
Zone 1 DBA
Zone 2 DBA
Zone 3 DBA
System
DBA
Users
Users
Users
Subsidiary
Tenant
Zone 1
Tenant
Zone 2
Tenant
Zone 3
Corporate
Zone
Zone 1 DBA
Zone 2 DBA
Zone 3 DBA
System Admin
Users
Users
Users
Users
© 2015 Teradata
39
• Delivering a more consistent user experience – Minimum Response Time allows the system to manage to SLAs.
- No more and no less.
• Align queue with business critical workloads – Prioritized Delay Queue allows workload definitions to affect queue order
instead of just the traditional FIFO
• New Amp Worker Task controls – Amp Worker Task resource limits max AMP tasks by user or utility
• Optimized Back-up and Restores – More granular control of Data Stream Architecture (DSA) with separate rules
and allocations to support multiple jobs, work limits, and priority. – Amp worker tasks can now write straight to device bypassing layers of
protocols improving write performance.
Workload Management 15.10 Overview
© 2015 Teradata
40
Warehouse
Component
Software
Control
Details
Security Secure Zone Isolation of users, databases,
and database objects on the
same hardware
Space Secure Zone This is allocated when setting
the perm/temp/spool space for
the root DB of a Secure Zone
I/O & CPU Workload
Management
1% granularity to virtual
partitions, workloads, & users.
User Experience Workload
Management
Minimum Response Time
Prioritized Queue Delay
Fine Grained Warehouse Controls
© 2015 Teradata
41
• Enables faster data warehouse deployment with software controls of IO, Space, CPU, & Security.
• Increased agility to react to business changes.
• Helps expand customer opportunities by addressing key cloud, multi-tenancy, multi-national, and compliance security requirements for privileged user access, data location, and data segregation.
• Improves TCO and reduces carbon footprint for multi-use data warehouses where data, users, applications or workloads must be completely segregated.
• Better align system performance with business priorities and SLA using finer-grained warehouse control.
Benefits
© 2015 Teradata
42
Teradata 15.10 Supported Workload Specific Platforms
• Active EDW 5650, 6650, 6680, 6700, 6750+
• Data Warehouse Appliance 2650, 2690, 2700+
• Integrated Big Data Platform 1650, 1700+
• Data Mart Appliance 560, 670+
• Operating system releases – SLES 11 SP1
– SLES 10 SP3
© 2015 Teradata
43
Big
da
ta &
An
aly
ti
cs
Pe
rfo
rma
nc
e
Ec
osy
ste
m
Big Data & Analytics
R Integration SAS: Access to table resident metadata
Support XSLT-based Shredding Definition JSON Binary Storage Format (BSON) Right, Left, and Reverse String Functions
Performance Improvements
Load Isolation Partition-level Locking
MLOADX PMPC Improvements
In Memory Optimizations Reduce Parsing Time and IPE
TIM Phase 2 Reduce Spool (Simple Pipelining)
Smart LOBs (LOB optimizations) New PI on access rights table – Phase II
Ecosystem
UDA for Teradata SQL
DSA Enhancement - Support >5000 objects Support POLR with cluster changes
Extend the Lead CP with Primary AMP Index or Primary Index TASM Enhancements
Add Default Queryband Profile Option
Hardware Enabling
Support of Intel Software compression
Quality
CI Segregation (MI Rebuild) Check Table Enhancement Separate Error Codes for Deadlocks
Cleanup Agg Enh Distinct Group By Audit Trail of DBS Control Studio Enhancements Generate TVI alert containing panic dump data
Industry Compatibility
Auto-provisioning of External Users Secure Zones TDGSS – Single Mechanism to Logon Teradata Directory Service (Phase II) SQL Equivalents of FERRET output
Gateway - Support for No-wait I/O Serialization of UDT w/in UDFs 1 GB Memory Buffers for SORT - Phase II Phase II Trusted Sessions
TD 15.10 - Content 2Q2015GCA
© 2015 Teradata
44 44
© 2015 Teradata