Autonomous Data IngestionTuning inData …

15
©2017 IBM Corp. Knut Stolze, Felix Beier, Jens Müller IBM Germany Research & Development 2017-03-08 Autonomous Data Ingestion Tuning in Data Warehouse Accelerators

Transcript of Autonomous Data IngestionTuning inData …

©2017 IBM Corp.

Knut Stolze, Felix Beier, Jens MüllerIBM Germany Research & Development2017-03-08

Autonomous DataIngestion Tuning in DataWarehouse Accelerators

©2017 IBM Corp. 2©2017 IBM Corp. 2

Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

- CREATING ANY WARRANTY OR REPRESENTATION FROM IBM(OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

- ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSEAGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

©2017 IBM Corp. 3

Agenda

• IDAA Overview

• Load Processing

• Autonomous Load Performance Tuning / Smart Load

• Preliminary Performance Evaluation

• Summary & Outlook

©2017 IBM Corp. 4

IBM DB2 Analytics Accelerator

©2017 IBM Corp. 5

incremental update

...Col BCol A...Col BCol A...Col BCol A

partition update

...Col BCol A

MayMayAprilAprilMarchMarchMarchFebruaryJanuaryJanuary

...Col BCol A

...Col BCol A...Col BCol A

full table refresh

...Col BCol A

Loading Data from DB2 z/OS into IDAA

Full Table Reload

Partial Reload

Incremental Update

©2017 IBM Corp. 6

Incremental Update

• Use replication to replay changes in accelerator

• Low latency

• Concurrent to queries; commit scope on transaction boundaries

©2017 IBM Corp. 7©2017 IBM Corp. 7

IBM DB2 Analytics Accelerator ApplianceIBM DB2 for z/OS

User Applications

Optimizer

Local QueryProcessor

IDAAStored

ProceduresOffloaded Query

Execution

Query ResultsQuery ResultsQuery

OLAP Query

Query Results

Query QueryResults

OLTPQuery

OLAPQuery

B ...2N+2...2N+1 A

F ...2N

... ......

... ......DN+2 ...CN+1 ...

...AN...... ...B ...2

...

...A1Col F2Col F1

Fact Table

... ......c3 ...b ...2

...

...a1Col D12Col D11

Dim1

... ......1.23 ...0.5 ...2

...

...15.71Col D22Col D21

Dim2

Log Reader

Transaction Log

IncrementalUpdate

Log Records ChangeRecords

Log Records

Insert /DeleteTuples

B ...2N+2...2N+1 A

F ...2N

... ......

... ......DN+2 ...CN+1 ...

...AN...... ...B ...2

...

...A1Col F2Col F1

Fact Table

... ......c3 ...b ...2

...

...a1Col D12Col D11

Dim1

... ......1.23 ...0.5 ...2

...

...15.71Col D22Col D21

Dim2

©2017 IBM Corp. 8

Batch Update Scenarios

• Reload many tables fully or partially, e.g. in batch window

• Fully concurrent to queries; commit scope is LOAD operation

• Excellent throughput

• Tables processed sequentially

• Partitions of partitioned tables loaded concurrently

©2017 IBM Corp. 9

Loading Data from DB2 z/OS into IDAA

©2017 IBM Corp. 10

Challenges

• Load optimized for throughput of large tables only

• Many small tables handled not very efficiently

• Manual management of parallelism by starting multiple, independent

load processes (by customer)

• Lack of control of resources / workload balancing amongst load

operations and between load and other operations (e.g. query)

©2017 IBM Corp. 11

Smart Load

• Automatically manage scale concurrent load operations up and down

• Considers involved resources

- CPU consumption in Netezza backend

- CPU consumption on System z

- Network saturation

- I/O utilization in Netezza backend

©2017 IBM Corp. 12

©2017 IBM Corp. 13

Characteristics

• Oscillating behavior for scaling up/down possible, but not a problem

• Expected resource consumption for new load stream is estimated by averaging over history of prior load streams

• Increment step is only 1

• Already running load streams are not terminated prematurely

©2017 IBM Corp. 14

Preliminary Performance Evaluation

Improvements:

• Keeping load streams open for multiple tables or table partitions

• Inter-table parallelism very beneficial for non-partitioned tables

Scenario Sequential Load Smart Load50 tables with 1 row each 4min 16s 13sTPC-H (30MB) 42s 7s

©2017 IBM Corp. 15

Summary & Outlook

• Introduction of Load Scheduler to consolidate management ofconcurrent load operations

• Noticeably improved throughput

• Significantly easier to use

Next Steps:

• Product integration (currently ongoing)

• Deeper integration with workload management in Netezza backend