Is IDAA Tuning Worth It?
Transcript of Is IDAA Tuning Worth It?
Is IDAA Tuning Worth It?
Thomas Baumann, Swiss Mobiliar
Session Code: E10
Wednesday, 18 November 2015 08.30-09.30 AM
Platform: DB2 zOS
Objectives
Learn how to tune accelerated queries
Discover the impact of distribution keys and organizing
keys
Be able to compare IDAA to Oracle Database In-Memory
Get hints and tips for IDAA capacity planning
Learn how to report on IDAA usage
2
Agenda
IDAA Highlights at Swiss Mobiliar and Company
Profile
Today‘s Usage of IDAA at Swiss Mobiliar
Oracle In-Memory Column Store Option Compared to
IDAA
IDAA Tuning
A Review of the Oracle Challenge and Our Next Steps with
IDAA
3
ANALYTICS ON OLTP DATA
IDAA Making Headlines at Swiss MobiliarNew business insights due to real-time analytics
4
9 OUT OF 10 QUERIES DON‘T NEED SQL TUNING
IDAA Making Headlines at Swiss MobiliarMassively reduced tuning efforts for SQL query texts
5
30% MAINFRAME CPU REDUCTION
IDAA Making Headlines at Swiss MobiliarReduced CPU consumption during peak time
6
Which are the IDAA users (last 30 weekdays)?
7
0
200
400
600
800
1000
1200
1400
1600
1800
Number of queries
Response time (min)
IDAA Making Headlines at Swiss MobiliarMost queries sent by business users
Swiss MobiliarSwitzerland‘s most personal insurer
13x continuously 2003-2015
8
• legal form of a cooperative
association (mutual company).
• Switzerland’s number one insurer for
household contents, business and
pure risk life insurance.
• close to customers throughout the
country thanks to around 80 general
agencies at 160 locations.
• over 1.7 million insured persons or
firms.
• over 4,400 employees and 325
trainees.
Insurance Market Growth in SwitzerlandOverall market non-life
9
• Close to 2/3 of Market Growth to Swiss
Mobiliar
Growth Mobiliar Market Growth
in Mio CHF. Source: Schweizerischer Versicherungsverband
The SpeakerThomas Baumann Born in 1963
MSc. from the Swiss Federal Institute of Technology (ETH Zurich)
Computer Sciences combined with statistics
These days, this mix is calles „Big Data“
Has been focused on DBMS and performance since 1992
Internationally recognized DB2 expert and speaker on numerous conferences
„Minister of Performance“ at Swiss Mobiliar
10
Agenda
Swiss Mobiliar
Today‘s Usage of IDAA at Swiss Mobiliar
Oracle In-Memory Column Store Option Compared to
IDAA
IDAA Tuning
A Review of the Oracle Challenge and Our Next Steps with
IDAA
11
OLTP
Decision SupportBusiness
Intelligence
Data Mart
Data
Warehouse
Cross
Information
Systems
Core
Information
Systems
Access
Information
Systems
Analytical
(OLAP)
Operational
(OLTP)IDAA
Scope
IDAA Scope at Mobiliar
12
IDAA (IBM DB2 Analytics Accelerator) Value Delivery
13
DB2 zOS
SQL Query
(to DB2)
Result
Set
Automatic query re-routing of
search-intensive queries1) to data
copy at Netezza appliance
1) For applications which don‘t require transactionally
consistent
data, and can accept data delayed by a few minutes
Netezza
Design Patterns
Column oriented data storage
Data replication close to real time
based on log records,
not transactionally consistent
Query Re-Routing decided by
optimizer, transparent for application
No need for indexes
Very high compression rate
Major Results
Increased OLAP query performance
100 times faster in average
Faster inserts on DB2 and higher
scalability
Due to elimination of most
indexes
Short timeframe between
data ingress and analysis
Performance ComparisonDB2 vs. IDAA on 13 sample reports(logarithmic scale)
14
response time in sec
report no.
DB2 zOS V10, IDAA V3 (with Netezza NPS 7.0.2.13)
1.00
10.00
100.00
1000.00
10000.00
100000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
DB2
IDAA
First Business Applications
Ad-hoc reports from business end users
Improved end-of-month processing
Log analysis based on DB2 tables for access pattern
analytics
Improved ETL flow
Streamline Mainframe for OLTP
Eliminating indexes used for analytics only
Eliminate MQT and other auxiliary structures for
analytics
Reduced demand for reorg
More efficient inserts12 month
Step by Step Usage of IDAA
15
Agenda
Swiss Mobiliar
Today‘s Usage of IDAA at Swiss Mobiliar
Oracle In-Memory Column Store Option Compared to
IDAA
IDAA Tuning
A Review of the Oracle Challenge and Our Next Steps with
IDAA
16
Oracle In-Memory Column Store Option
Spalten
Memory
Daten
Memory
Zeilen
Daten
Until now: data organized in rows, loaded into memory at first usage
New:
data additionally stored in column format
permanently kept in memory
Optimized for
OLTP
Optimized for
Analytics
Near real-time replication
17
IDAA vs. Oracle: Ready to Rumble… in the red corner…
Oracle 12c • Hitachi Unified Compute
Platform
• 16 Cores
• 384 GB Main Memory
• OS: RedHat 6.4
• Oracle 12.1.0.2 Beta 3
… in the blue corner...
IBM Netezza IDAA • PureData System for Analytics
N1001-002
• 24 Cores
• 72 GB Main Memory
• Netezza V 7.02
• IDAA V3
18
Performance ComparisonIDAA vs. Oracle In Memory Column Store(logarithmic scale)
19
response time in sec
report no.
IDAA V3 (with Netezza NPS 7.0.2.13), Oracle 12c
0.10
1.00
10.00
100.00
1000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
IDAA
Oracle InMemory
Column Store
Agenda
Swiss Mobiliar
Today‘s Usage of IDAA at Swiss Mobiliar
Oracle In-Memory Column Store Option Compared to
IDAA
IDAA Tuning
A Review of the Oracle Challenge and Our Next Steps with
IDAA
20
Application
Application
inter-
face
SM
P H
ost
Memory
DB2 runtime
environment
SELECT…
WHERE
CustNo
=4711
SELECT…
WHERE
Type=5
IBM NetezzaDB2 zOS
row format
co
lum
n
form
at
IDAA Behind the Scenes
21
Distribution Key
22
Application
Application
inter-
face
SM
P H
ost
SELECT…
WHERE
Type=5
IBM Netezza
DB2
zOS
CustNo
1
2
3
4
hash(CustNo)
example withCustNo as
distribution key
Distribution Key
1 Data Slice = Disk+Memory+FPGA+CPU.
Example: N2002-010 Data Sheet:
1 Cabinet (full rack)
7 S-Blades
112 Processing Units
240 Data Slices
Data Slice = Individual element of parallelism
23
Distribution Key
Objective 1:
Data evenly distributed among all slices
(no data skew)
Objective 2:
Processing evenly distributed among all slices
(no processing skew)
Objective 3:
Joins can be performed locally1)
(no data re-distribution).
1) Rows of the two tables that belong together are situated on the same slice, which
means that they are co-located and can be joined locally
The default random distribution key is focused on objective 1 only.
24
Distribution Key Candidates
Columns frequently used as join keys between large
tables
Columns used as join keys between a large fact and a
small
dimension table are not likely to be good candidates, see
“broadcasting tables“ on next slide
Combining a few columns together only if you always join
on all of them
Columns frequently aggregated on
Columns providing even data distribution
Usually columns with high cardinality
Columns providing even processing distribution
Be careful with date or time based columns as distribution
keys
Columns not frequently restricted on
25
Broadcasted tables
Imagine a join between customer table and state table
customer table: 200M Rows
state table: 26 Rows
Frequent joins of customer#state on column
state_code
customer table is also joined to other tables
join often based on customer_no
state table should be distributed based on random key:
Forces broadcasting of state table rows to all slices
The blades send their individual records to the host
The host returns them in full to all of the data slices
26
Organizing Keys (a.k.a. zone maps)
Organizing Keys
Explicitly selected table columns to physically cluster
rows
of a table with the same key column values
Performance advantage if predicates reference one or
more
organizing key columns
Used to limit scan to relevant blocks only
„RUNSTATS“: during LOAD, INSERT, UPDATE,
DELETE
Data is loaded into extents of 3MB each
Each extent contains blocks of 128 KB
MIN and MAX value of each column of each block
tracked
these statistics are called “zone maps“
27
Zone Maps and Organizing Keys
“unorganized“ rows “clustered“ rows
Multi-dimensional clustering (up to 4 organization keys)
Order of the columns defined as organizing keys does not
matter 28
Organizing Key Candidates
Tables with > 1mio rows only
Columns frequently present in range or equal predicates
low filter factor predicates benefit most (“restrictive
predicates“)
If access pattern is not known, good candidates are
columns with high cardinality
columns with low cardinality
columns with date/time data types
If columns with high cardinality are selected as organizing
keys,
compression ratio will typically be reduced
Incremental update performance benefits if one or
more columns of the primary key will be selected29
Organizing Key Recommendation
Data Studio highlights organizing key candidates based on
cardinality (both high and low cardinality columns).
30
Distribution Keys Use Case
TITLE
BOOKORDERCUSTOMERSTORE
REGION
table counts: BOOKORDER 9,899,818,991
CUSTOMER 13,872,410
TITLE 6,105,440
STORE 26
REGION 231
Distribution Keys Use Case (continued)
BOOKORDER
ORDER_ID, ISBN, CUSTOMER_ID, STORE_ID,
ORDER_DATE, TSTMP, QUANTITY,
PRICE_PER_ITEM TITLE
ISBN, TITLE, CATEGORY, AUTHOR,
PUBLICATION_YEAR CUSTOMER
CUSTOMER_ID, FIRST_NAME, LAST_NAME,
ADDRESS, CITY, STATE, REGIONSTORE
STORE_ID, ADDRESS, CITY, STATE, ZIP, REGION_ID REGION
REGION_ID, NAME, LOCATION, REGION_MANAGER32
Distribution Keys Use Case (continued)
Data Pattern All „_ID“ columns are INTEGER
Primary key columns are marked with an underscore ORDER_DATE and CALENDAR_DATE both have DATE as
data
type BOOKORDER column skew *) :
ORDER_ID: 0%
ISBN: 5%
CUSTOMER_ID: 1%
STORE_ID: 20%
ORDER_DATE: 1%
33
*) see notes page for query text to calcuate skew value
Distribution Keys Use Case (continued)
Access Pattern Most queries are GROUP BY TITLE.CATEGORY against
BOOKORDER and TITLE tables, with miminal restriction
against TITLE table columns
CUSTOMER table is frequently restricted on various levels
of
geography. Join cardinality (number of records after join)
to BOOKORDER table is smaller than BOOKORDER#TITLE
joins. STORE table is frequently restricted on various levels of
geography as well. Restrictions on BOOKORDER only occasionally, and limited
toORDER_DATE.34
Distribution Keys Use Case (continued)
Questions concerning BOOKORDER Distribution Key
Comment on each of the following:
Distribution
Candidate
Should be
considered
further? (Yes/No)
Comment
RANDOM No > 100 Mio Rows
Full Primary Key
ORDER_ID
ISBN
CUSTOMER_ID
STORE_ID
ORDER_DATE
Combination of
any
35
Distribution Keys Use Case (continued)
For those columns with “Should be considered further=YES“ :
After analysis of this table, which is the ideal candidate to beselected as distribution key for the BOOKORDER table?
Distribution
Candidate
Data
Skew
(the
lower
the
better)
Join Size
(the
larger
the
better)
Likelihood of
predicate
within join
(the larger
the better)
Column
appears in
WHERE
clauses?
(the fewer the
better)
(STORE_ID) 20% small medium few
36
Defining/Altering Distribution and OrganizingKeys with Data Studio: OK for small number of changes
37
Defining/Altering Distribution and OrganizingKeys with Batch JCL: OK for larger number of changes
38
<?xml version="1.0" encoding="UTF-8"?>
<aqttables:tableSpecifications
xmlns:aqttables="http://www.ibm.com/xmlns/prod/dwa/2011"
version="1.0">
<table name="TEO_TDSTAT2" schema="DB2PROD">
<distributionKey>
<column name="C63654"/>
<column name="C32006"/>
</distributionKey>
<organizingKey name="C63654"/>
<organizingKey name="C63655"/>
</table>
Comprehensive JCL see notes page
All columns (old and new) to be included
Usage of Stored Proc “ACCEL_ALTER_TABLES“
XML document for table_alter_specification:
IDAA Tuning Approaches
39
1st Approach
Export Queries by GET_QUERIES and
GET_QUERY_DETAILS stored procedures
Run Explain on them (DB2 native, no IDAA)
Analyze Filter Factors, Join Predicates etc.
DSN_PREDICAT_TABLE
2nd Approach
Identify most resource-intensive IDAA queries in Data
Studio
Analyze IDAA access paths (access plan graph)
Concentrate on BROADCAST, REDIST and TBSCAN
See details on following pages
Access Plan Graphs
40TBSCAN: Identify predicates, check organizing keys
REDIST: If high no of rows, try to avoid
by using distribution keys
BROADCAST:
If high no of rows, try to avoid
by using distribution keys
IDAA tuning: A Real Life Case Study
41
Where to start?
Agenda
Swiss Mobiliar
Today‘s Usage of IDAA at Swiss Mobiliar
Oracle In-Memory Column Store Option Compared to IDAA
IDAA Tuning
A Review of the Oracle Challenge and Our Next Steps
with IDAA
42
Report Results and Comparison to DB2 native
43
response time in sec
report no.
DB2 zOS V10, IDAA V3 (with Netezza NPS 7.0.2.13)
0.10
1.00
10.00
100.00
1000.00
10000.00
100000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
DB2
IDAA
IDAA with tuning
Further improvements (not yet reflected in the diagram) IDAA Query No.11: Predicate selectivity estimation incorrect
Performance ComparisonDB2 IDAA vs. Oracle InMemory Column Store
44
sec
0.10
1.00
10.00
100.00
1000.00
1 2 3 4 5 6 7 8 9 10 11 12 13
IDAA
Oracle InMemory
Column Store
DB2 zOS V10 with IDAA V3 (NPS 7.0.2.13)
Oracle 12c.1.0.2
IDAA Capacity Planning
No individual CPU or disk enhancements
If capacity reached, replace by larger accelerator model
disk capacity
cpu capacity
limit of concurrent users
counter represents “high water mark“ since
accelerator
start
If performance objective not reached, replace by larger
accelerator model
45
The Past: Separate OLTP from OLAP
OLAP Analytics was a wait-for-result
proposition
Data needed to be ETL’d before analysis
Analytics on OLTP data impacted
performance
“Bring data to the analytics” paradigm
The Present: Integrated Approach
“Bring analytics to the data” paradigm
New DB technologies to allow
Analytics at the point of customer contact
Transactional and Operational Business
Analytics: Real-Time Analytics
Summary: A Paradigm Change
46
Like an additional access path:
Best access path to take you from
Dublin to Brussels?
SQL Tuning IDAA
Beam me up, Scotty13 hours 1hours 40 min
Business User‘s perception of IDAA
47
Thomas BaumannSwiss Mobiliar
Is IDAA Tuning Worth It?Session Code E10
Please fill out your session
evaluation before leaving!