Is IDAA Tuning Worth It?

Is IDAA Tuning Worth It?

Thomas Baumann, Swiss Mobiliar

Session Code: E10

Wednesday, 18 November 2015 08.30-09.30 AM

Platform: DB2 zOS

Objectives

Learn how to tune accelerated queries

Discover the impact of distribution keys and organizing

keys

Be able to compare IDAA to Oracle Database In-Memory

Get hints and tips for IDAA capacity planning

Learn how to report on IDAA usage

2

Agenda

IDAA Highlights at Swiss Mobiliar and Company

Profile

Today‘s Usage of IDAA at Swiss Mobiliar

Oracle In-Memory Column Store Option Compared to

IDAA

IDAA Tuning

A Review of the Oracle Challenge and Our Next Steps with

IDAA

3

ANALYTICS ON OLTP DATA

IDAA Making Headlines at Swiss MobiliarNew business insights due to real-time analytics

4

9 OUT OF 10 QUERIES DON‘T NEED SQL TUNING

IDAA Making Headlines at Swiss MobiliarMassively reduced tuning efforts for SQL query texts

5

30% MAINFRAME CPU REDUCTION

IDAA Making Headlines at Swiss MobiliarReduced CPU consumption during peak time

6

Which are the IDAA users (last 30 weekdays)?

7

0

200

400

600

800

1000

1200

1400

1600

1800

Number of queries

Response time (min)

IDAA Making Headlines at Swiss MobiliarMost queries sent by business users

Swiss MobiliarSwitzerland‘s most personal insurer

13x continuously 2003-2015

8

• legal form of a cooperative

association (mutual company).

• Switzerland’s number one insurer for

household contents, business and

pure risk life insurance.

• close to customers throughout the

country thanks to around 80 general

agencies at 160 locations.

• over 1.7 million insured persons or

firms.

• over 4,400 employees and 325

trainees.

Insurance Market Growth in SwitzerlandOverall market non-life

9

• Close to 2/3 of Market Growth to Swiss

Mobiliar

Growth Mobiliar Market Growth

in Mio CHF. Source: Schweizerischer Versicherungsverband

The SpeakerThomas Baumann Born in 1963

MSc. from the Swiss Federal Institute of Technology (ETH Zurich)

Computer Sciences combined with statistics

These days, this mix is calles „Big Data“

Has been focused on DBMS and performance since 1992

Internationally recognized DB2 expert and speaker on numerous conferences

„Minister of Performance“ at Swiss Mobiliar

10

Agenda

Swiss Mobiliar



IDAA

IDAA Tuning


IDAA

11

OLTP

Decision SupportBusiness

Intelligence

Data Mart

Data

Warehouse

Cross

Information

Systems

Core

Information

Systems

Access

Information

Systems

Analytical

(OLAP)

Operational

(OLTP)IDAA

Scope

IDAA Scope at Mobiliar

12

http://www.google.ch/url?url=http://radar.oreilly.com/?attachment_id=14055&rct=j&frm=1&q=&esrc=s&sa=U&ei=o5L0U6b_EOWf7Ab4ioEg&ved=0CBgQ9QEwAA&usg=AFQjCNEAF8cPLU7EJtT5I14XaMAbagRF4w


IDAA (IBM DB2 Analytics Accelerator) Value Delivery

13

DB2 zOS

SQL Query

(to DB2)

Result

Set

Automatic query re-routing of

search-intensive queries1) to data

copy at Netezza appliance

1) For applications which don‘t require transactionally

consistent

data, and can accept data delayed by a few minutes

Netezza

Design Patterns

Column oriented data storage

Data replication close to real time

based on log records,

not transactionally consistent

Query Re-Routing decided by

optimizer, transparent for application

No need for indexes

Very high compression rate

Major Results

Increased OLAP query performance

100 times faster in average

Faster inserts on DB2 and higher

scalability

Due to elimination of most

indexes

Short timeframe between

data ingress and analysis

Performance ComparisonDB2 vs. IDAA on 13 sample reports(logarithmic scale)

14

response time in sec

report no.

DB2 zOS V10, IDAA V3 (with Netezza NPS 7.0.2.13)

1.00

10.00

100.00

1000.00

10000.00

100000.00

1 2 3 4 5 6 7 8 9 10 11 12 13

DB2

IDAA

First Business Applications

Ad-hoc reports from business end users

Improved end-of-month processing

Log analysis based on DB2 tables for access pattern

analytics

Improved ETL flow

Streamline Mainframe for OLTP

Eliminating indexes used for analytics only

Eliminate MQT and other auxiliary structures for

analytics

Reduced demand for reorg

More efficient inserts12 month

Step by Step Usage of IDAA

15

Agenda

Swiss Mobiliar



IDAA

IDAA Tuning


IDAA

16

Oracle In-Memory Column Store Option

Spalten

Memory

Daten

Memory

Zeilen

Daten

Until now: data organized in rows, loaded into memory at first usage

New:

data additionally stored in column format

permanently kept in memory

Optimized for

OLTP

Optimized for

Analytics

Near real-time replication

17

IDAA vs. Oracle: Ready to Rumble… in the red corner…

Oracle 12c • Hitachi Unified Compute

Platform

• 16 Cores

• 384 GB Main Memory

• OS: RedHat 6.4

• Oracle 12.1.0.2 Beta 3

… in the blue corner...

IBM Netezza IDAA • PureData System for Analytics

N1001-002

• 24 Cores

• 72 GB Main Memory

• Netezza V 7.02

• IDAA V3

18

http://www.google.ch/url?url=http://www.netezza-twinfin.com/&rct=j&frm=1&q=&esrc=s&sa=U&ei=o5L0U6b_EOWf7Ab4ioEg&ved=0CB4Q9QEwAw&usg=AFQjCNFrL3XEae-EMJeE-2nk5wd0D-tVsg

http://www.google.ch/url?url=http://www.netezza-twinfin.com/&rct=j&frm=1&q=&esrc=s&sa=U&ei=o5L0U6b_EOWf7Ab4ioEg&ved=0CB4Q9QEwAw&usg=AFQjCNFrL3XEae-EMJeE-2nk5wd0D-tVsg

http://blog.boxingworld.eu/wp-content/uploads/2012/10/fitnessboxen-boxring.jpg

http://blog.boxingworld.eu/wp-content/uploads/2012/10/fitnessboxen-boxring.jpg

http://www.google.ch/url?url=http://db.cse.ohio-state.edu/&rct=j&frm=1&q=&esrc=s&sa=U&ei=Mo_0U77fKY6O7QaYhYDoDw&ved=0CBoQ9QEwAg&usg=AFQjCNFobK6sb9eowoQOAFRTfyMPNxqEYA

http://www.google.ch/url?url=http://db.cse.ohio-state.edu/&rct=j&frm=1&q=&esrc=s&sa=U&ei=Mo_0U77fKY6O7QaYhYDoDw&ved=0CBoQ9QEwAg&usg=AFQjCNFobK6sb9eowoQOAFRTfyMPNxqEYA



Performance ComparisonIDAA vs. Oracle In Memory Column Store(logarithmic scale)

19


report no.

IDAA V3 (with Netezza NPS 7.0.2.13), Oracle 12c

0.10

1.00

10.00

100.00

1000.00

1 2 3 4 5 6 7 8 9 10 11 12 13

IDAA

Oracle InMemory

Column Store

Agenda

Swiss Mobiliar



IDAA

IDAA Tuning


IDAA

20

Application

Application

inter-

face

SM

P H

ost

Memory

DB2 runtime

environment

SELECT…

WHERE

CustNo

=4711

SELECT…

WHERE

Type=5

IBM NetezzaDB2 zOS

row format

co

lum

n

form

at

IDAA Behind the Scenes

21

Distribution Key

22

Application

Application

inter-

face

SM

P H

ost

SELECT…

WHERE

Type=5

IBM Netezza

DB2

zOS

CustNo

1

2

3

4

hash(CustNo)

example withCustNo as

distribution key

Distribution Key

1 Data Slice = Disk+Memory+FPGA+CPU.

Example: N2002-010 Data Sheet:

1 Cabinet (full rack)

7 S-Blades

112 Processing Units

240 Data Slices

Data Slice = Individual element of parallelism

23

Distribution Key

Objective 1:

Data evenly distributed among all slices

(no data skew)

Objective 2:

Processing evenly distributed among all slices

(no processing skew)

Objective 3:

Joins can be performed locally1)

(no data re-distribution).

1) Rows of the two tables that belong together are situated on the same slice, which

means that they are co-located and can be joined locally

The default random distribution key is focused on objective 1 only.

24

Distribution Key Candidates

Columns frequently used as join keys between large

tables

Columns used as join keys between a large fact and a

small

dimension table are not likely to be good candidates, see

“broadcasting tables“ on next slide

Combining a few columns together only if you always join

on all of them

Columns frequently aggregated on

Columns providing even data distribution

Usually columns with high cardinality

Columns providing even processing distribution

Be careful with date or time based columns as distribution

keys

Columns not frequently restricted on

25

Broadcasted tables

Imagine a join between customer table and state table

customer table: 200M Rows

state table: 26 Rows

Frequent joins of customer#state on column

state_code

customer table is also joined to other tables

join often based on customer_no

state table should be distributed based on random key:

Forces broadcasting of state table rows to all slices

The blades send their individual records to the host

The host returns them in full to all of the data slices

26

Organizing Keys (a.k.a. zone maps)

Organizing Keys

Explicitly selected table columns to physically cluster

rows

of a table with the same key column values

Performance advantage if predicates reference one or

more

organizing key columns

Used to limit scan to relevant blocks only

„RUNSTATS“: during LOAD, INSERT, UPDATE,

DELETE

Data is loaded into extents of 3MB each

Each extent contains blocks of 128 KB

MIN and MAX value of each column of each block

tracked

these statistics are called “zone maps“

27

Zone Maps and Organizing Keys

“unorganized“ rows “clustered“ rows

Multi-dimensional clustering (up to 4 organization keys)

Order of the columns defined as organizing keys does not

matter 28

Organizing Key Candidates

Tables with > 1mio rows only

Columns frequently present in range or equal predicates

low filter factor predicates benefit most (“restrictive

predicates“)

If access pattern is not known, good candidates are

columns with high cardinality

columns with low cardinality

columns with date/time data types

If columns with high cardinality are selected as organizing

keys,

compression ratio will typically be reduced

Incremental update performance benefits if one or

more columns of the primary key will be selected29

Organizing Key Recommendation

Data Studio highlights organizing key candidates based on

cardinality (both high and low cardinality columns).

30

Distribution Keys Use Case

TITLE

BOOKORDERCUSTOMERSTORE

REGION

table counts: BOOKORDER 9,899,818,991

CUSTOMER 13,872,410

TITLE 6,105,440

STORE 26

REGION 231

Distribution Keys Use Case (continued)

BOOKORDER

ORDER_ID, ISBN, CUSTOMER_ID, STORE_ID,

ORDER_DATE, TSTMP, QUANTITY,

PRICE_PER_ITEM TITLE

ISBN, TITLE, CATEGORY, AUTHOR,

PUBLICATION_YEAR CUSTOMER

CUSTOMER_ID, FIRST_NAME, LAST_NAME,

ADDRESS, CITY, STATE, REGIONSTORE

STORE_ID, ADDRESS, CITY, STATE, ZIP, REGION_ID REGION

REGION_ID, NAME, LOCATION, REGION_MANAGER32


Data Pattern All „_ID“ columns are INTEGER

Primary key columns are marked with an underscore ORDER_DATE and CALENDAR_DATE both have DATE as

data

type BOOKORDER column skew *) :

ORDER_ID: 0%

ISBN: 5%

CUSTOMER_ID: 1%

STORE_ID: 20%

ORDER_DATE: 1%

33

*) see notes page for query text to calcuate skew value


Access Pattern Most queries are GROUP BY TITLE.CATEGORY against

BOOKORDER and TITLE tables, with miminal restriction

against TITLE table columns

CUSTOMER table is frequently restricted on various levels

of

geography. Join cardinality (number of records after join)

to BOOKORDER table is smaller than BOOKORDER#TITLE

joins. STORE table is frequently restricted on various levels of

geography as well. Restrictions on BOOKORDER only occasionally, and limited

toORDER_DATE.34


Questions concerning BOOKORDER Distribution Key

Comment on each of the following:

Distribution

Candidate

Should be

considered

further? (Yes/No)

Comment

RANDOM No > 100 Mio Rows

Full Primary Key

ORDER_ID

ISBN

CUSTOMER_ID

STORE_ID

ORDER_DATE

Combination of

any

35


For those columns with “Should be considered further=YES“ :

After analysis of this table, which is the ideal candidate to beselected as distribution key for the BOOKORDER table?

Distribution

Candidate

Data

Skew

(the

lower

the

better)

Join Size

(the

larger

the

better)

Likelihood of

predicate

within join

(the larger

the better)

Column

appears in

WHERE

clauses?

(the fewer the

better)

(STORE_ID) 20% small medium few

36

Defining/Altering Distribution and OrganizingKeys with Data Studio: OK for small number of changes

37

Defining/Altering Distribution and OrganizingKeys with Batch JCL: OK for larger number of changes

38

<?xml version="1.0" encoding="UTF-8"?>

<aqttables:tableSpecifications

xmlns:aqttables="http://www.ibm.com/xmlns/prod/dwa/2011"

version="1.0">

<table name="TEO_TDSTAT2" schema="DB2PROD">

<distributionKey>

<column name="C63654"/>

<column name="C32006"/>

</distributionKey>

<organizingKey name="C63654"/>

<organizingKey name="C63655"/>

</table>

Comprehensive JCL see notes page

All columns (old and new) to be included

Usage of Stored Proc “ACCEL_ALTER_TABLES“

XML document for table_alter_specification:

IDAA Tuning Approaches

39

1st Approach

Export Queries by GET_QUERIES and

GET_QUERY_DETAILS stored procedures

Run Explain on them (DB2 native, no IDAA)

Analyze Filter Factors, Join Predicates etc.

DSN_PREDICAT_TABLE

2nd Approach

Identify most resource-intensive IDAA queries in Data

Studio

Analyze IDAA access paths (access plan graph)

Concentrate on BROADCAST, REDIST and TBSCAN

See details on following pages

Access Plan Graphs

40TBSCAN: Identify predicates, check organizing keys

REDIST: If high no of rows, try to avoid

by using distribution keys

BROADCAST:

If high no of rows, try to avoid

by using distribution keys

IDAA tuning: A Real Life Case Study

41

Where to start?

Agenda

Swiss Mobiliar


Oracle In-Memory Column Store Option Compared to IDAA

IDAA Tuning

A Review of the Oracle Challenge and Our Next Steps

with IDAA

42

Report Results and Comparison to DB2 native

43


report no.

DB2 zOS V10, IDAA V3 (with Netezza NPS 7.0.2.13)

0.10

1.00

10.00

100.00

1000.00

10000.00

100000.00

1 2 3 4 5 6 7 8 9 10 11 12 13

DB2

IDAA

IDAA with tuning

Further improvements (not yet reflected in the diagram) IDAA Query No.11: Predicate selectivity estimation incorrect

Performance ComparisonDB2 IDAA vs. Oracle InMemory Column Store

44

sec

0.10

1.00

10.00

100.00

1000.00

1 2 3 4 5 6 7 8 9 10 11 12 13

IDAA

Oracle InMemory

Column Store

DB2 zOS V10 with IDAA V3 (NPS 7.0.2.13)

Oracle 12c.1.0.2

IDAA Capacity Planning

No individual CPU or disk enhancements

If capacity reached, replace by larger accelerator model

disk capacity

cpu capacity

limit of concurrent users

counter represents “high water mark“ since

accelerator

start

If performance objective not reached, replace by larger

accelerator model

45

The Past: Separate OLTP from OLAP

OLAP Analytics was a wait-for-result

proposition

Data needed to be ETL’d before analysis

Analytics on OLTP data impacted

performance

“Bring data to the analytics” paradigm

The Present: Integrated Approach

“Bring analytics to the data” paradigm

New DB technologies to allow

Analytics at the point of customer contact

Transactional and Operational Business

Analytics: Real-Time Analytics

Summary: A Paradigm Change

46

Like an additional access path:

Best access path to take you from

Dublin to Brussels?

SQL Tuning IDAA

Beam me up, Scotty13 hours 1hours 40 min

Business User‘s perception of IDAA

47

Thomas BaumannSwiss Mobiliar

[email protected]

Is IDAA Tuning Worth It?Session Code E10

Please fill out your session

evaluation before leaving!

Is IDAA Tuning Worth It?

Documents

Transcript of Is IDAA Tuning Worth It?