Autonomous Data IngestionTuning inData …
Transcript of Autonomous Data IngestionTuning inData …
©2017 IBM Corp.
Knut Stolze, Felix Beier, Jens MüllerIBM Germany Research & Development2017-03-08
Autonomous DataIngestion Tuning in DataWarehouse Accelerators
©2017 IBM Corp. 2©2017 IBM Corp. 2
Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
- CREATING ANY WARRANTY OR REPRESENTATION FROM IBM(OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR
- ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSEAGREEMENT GOVERNING THE USE OF IBM SOFTWARE.
©2017 IBM Corp. 3
Agenda
• IDAA Overview
• Load Processing
• Autonomous Load Performance Tuning / Smart Load
• Preliminary Performance Evaluation
• Summary & Outlook
©2017 IBM Corp. 5
incremental update
...Col BCol A...Col BCol A...Col BCol A
partition update
...Col BCol A
MayMayAprilAprilMarchMarchMarchFebruaryJanuaryJanuary
...Col BCol A
...Col BCol A...Col BCol A
full table refresh
...Col BCol A
Loading Data from DB2 z/OS into IDAA
Full Table Reload
Partial Reload
Incremental Update
©2017 IBM Corp. 6
Incremental Update
• Use replication to replay changes in accelerator
• Low latency
• Concurrent to queries; commit scope on transaction boundaries
©2017 IBM Corp. 7©2017 IBM Corp. 7
IBM DB2 Analytics Accelerator ApplianceIBM DB2 for z/OS
User Applications
Optimizer
Local QueryProcessor
IDAAStored
ProceduresOffloaded Query
Execution
Query ResultsQuery ResultsQuery
OLAP Query
Query Results
Query QueryResults
OLTPQuery
OLAPQuery
B ...2N+2...2N+1 A
F ...2N
... ......
... ......DN+2 ...CN+1 ...
...AN...... ...B ...2
...
...A1Col F2Col F1
Fact Table
... ......c3 ...b ...2
...
...a1Col D12Col D11
Dim1
... ......1.23 ...0.5 ...2
...
...15.71Col D22Col D21
Dim2
Log Reader
Transaction Log
IncrementalUpdate
Log Records ChangeRecords
Log Records
Insert /DeleteTuples
B ...2N+2...2N+1 A
F ...2N
... ......
... ......DN+2 ...CN+1 ...
...AN...... ...B ...2
...
...A1Col F2Col F1
Fact Table
... ......c3 ...b ...2
...
...a1Col D12Col D11
Dim1
... ......1.23 ...0.5 ...2
...
...15.71Col D22Col D21
Dim2
©2017 IBM Corp. 8
Batch Update Scenarios
• Reload many tables fully or partially, e.g. in batch window
• Fully concurrent to queries; commit scope is LOAD operation
• Excellent throughput
• Tables processed sequentially
• Partitions of partitioned tables loaded concurrently
©2017 IBM Corp. 10
Challenges
• Load optimized for throughput of large tables only
• Many small tables handled not very efficiently
• Manual management of parallelism by starting multiple, independent
load processes (by customer)
• Lack of control of resources / workload balancing amongst load
operations and between load and other operations (e.g. query)
©2017 IBM Corp. 11
Smart Load
• Automatically manage scale concurrent load operations up and down
• Considers involved resources
- CPU consumption in Netezza backend
- CPU consumption on System z
- Network saturation
- I/O utilization in Netezza backend
©2017 IBM Corp. 13
Characteristics
• Oscillating behavior for scaling up/down possible, but not a problem
• Expected resource consumption for new load stream is estimated by averaging over history of prior load streams
• Increment step is only 1
• Already running load streams are not terminated prematurely
©2017 IBM Corp. 14
Preliminary Performance Evaluation
Improvements:
• Keeping load streams open for multiple tables or table partitions
• Inter-table parallelism very beneficial for non-partitioned tables
Scenario Sequential Load Smart Load50 tables with 1 row each 4min 16s 13sTPC-H (30MB) 42s 7s
©2017 IBM Corp. 15
Summary & Outlook
• Introduction of Load Scheduler to consolidate management ofconcurrent load operations
• Noticeably improved throughput
• Significantly easier to use
Next Steps:
• Product integration (currently ongoing)
• Deeper integration with workload management in Netezza backend