Database Statistics - Good Practises Not Only for Experienced_ Administators

38
Database Statistics - good practices not only for experienced administrators 18.03. 2015

description

Database Statistics - Good Practises Not Only for Experienced_ Administators

Transcript of Database Statistics - Good Practises Not Only for Experienced_ Administators

Page 1: Database Statistics - Good Practises Not Only for Experienced_ Administators

Database Statistics - good practices not only

for experienced administrators

18.03. 2015

Page 2: Database Statistics - Good Practises Not Only for Experienced_ Administators

Asseco at a Glance

• Founded in 1991

• The largest IT company in CEE

• 6th largest software producer in Europe

• Traded on the WSE, included in the WIG30 Blue Chip index

• 17 000 employees worldwide

• Selling proprietary software and services

• Strong financials with a great track record

– 2013 revenue of PLN 5,9b (EUR 1,4b)

– CAGR +17.9% (2009-2013)

– 2013 EBIT of PLN 611m (EUR 145m)

Page 3: Database Statistics - Good Practises Not Only for Experienced_ Administators

Our Offices Worldwide

– The Asseco Group

– Asseco Poland S.A.

Page 4: Database Statistics - Good Practises Not Only for Experienced_ Administators

Presentation Objective

Question:

How to maintain everyday database statistics process

in a complex environment?

What do I mean by the term:

complex environment?

Page 5: Database Statistics - Good Practises Not Only for Experienced_ Administators

Complex Environment

> 60 TB of production

data (6 DB2 members

in Data Sharing)

DML 48K/s,

GETPAGES 392K/s, IO

RW 13K/s

IO R 12K/s

> 475 GB data growth

per month

DB2 environment

Weekly cost

of maintenance for

RUNSTATS - 240 MIPS

3 807 colgroups

definitions

2 344 columns to verify

> 500 active plans and

over 12 000 active

packages to BIND własny.

Maintenance

Largest objects - NPI

INDEX >230 GB,

Tablespace 280 GB

400 000 objects

to maintain

Every year >30K new

objects

Automation

Page 6: Database Statistics - Good Practises Not Only for Experienced_ Administators

Conclusions

Presentation Plan

1

2 Tools for database statistics

3 Automation of the process of database statistics

4 Challenges in the project

5

Introduction to database statistics

Page 7: Database Statistics - Good Practises Not Only for Experienced_ Administators

Introduction

to Database statistics

Page 8: Database Statistics - Good Practises Not Only for Experienced_ Administators

Database Statistics – General

Overview

• Usage of the RUNSTATS utility:

Running RUNSTATS utillity enables DB2® to choose efficient access

paths by keeping the statistics accurate and up-to-date.

• The collected statistics concerning database objects

are stored in DB2 in a catalog.

• The collected statistics are used by DB2 during the

BIND process, when the most efficient access

paths are determined.

Page 9: Database Statistics - Good Practises Not Only for Experienced_ Administators

RUNSTATS Utility 1/2

* RUNSTATS does not collect statistics for clone tables, CGTT or index spaces.

** Leading columns are collected by using RUNSTATS INDEX.

Functionalities RUNSTATS

TABLESPACE

RUNSTATS

INDEX

gathers statistics on a tablespace X

gathers statistics on tables X

gathers statistics on indexes X

gathers statistics on columns x

Page 10: Database Statistics - Good Practises Not Only for Experienced_ Administators

RUNSTATS Utility 2/2

RUNSTATS collects the following three types

of distribution statistics:

Cardinality

The number of distinct

values in the column

or set of columns.

02 Histograms

Histogram statistics are

to be gathered for the

specified group

of columns.

03 Frequency

The percentage of rows

in the table that contain

a value for a column or

combination of values

for a set of columns.

01

Page 11: Database Statistics - Good Practises Not Only for Experienced_ Administators

Selection of Statistics Stored in DB2

Catalog Used for Access Path Selection

• SYSIBM.SYSCOLDIST - CARDF, COLGROUPCOLNO, COLVALUE,

FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, TYPE,

QUANTILENO

• SYSIBM.SYSCOLSTATS - COLCARD, HIGHKEY, LOWKEY, PARTITION

• SYSIBM.SYSCOLUMNS - COLCARDF, HIGH2KEY, LOW2KEY

• SYSIBM.SYSINDEXES - CLUSTERING, CLUSTERRATIOF, FIRSTKEYCARDF,

FULLKEYCARDF, NLEAF, NLEVELS, DATAREPEATFACTORF

• SYSIBM.SYSINDEXPART – LIMITKEY

• SYSIBM.SYSTABLES - CARDF, EDPROC, NPAGES, NPAGESF,

PCTROWCOMP

• SYSIBM.SYSTABLESPACE - NACTIVEF

• SYSIBM.SYSTABSTATS - CARDF, NPAGES

Page 12: Database Statistics - Good Practises Not Only for Experienced_ Administators

Statistics Rules – General

Recommendations 1/2

Database statistics are recommended to run:

• after loading a table and before binding application

plans and packages that access the table,

• after creating an index,

• after reorganizing a table space or an index,

• after running utilities such as RECOVER or REBUILD,

Page 13: Database Statistics - Good Practises Not Only for Experienced_ Administators

Statistics Rules – General

Recommendations 2/2

Database statistics are recommended to run:

• after heavy insert, update, and delete activity,

• against the DB2 catalog to provide DB2 with more

accurate information for access path selection of user

queries to the catalog,

• before REORG or REBUILD in order to determine

which objects need reorganisation.

Page 14: Database Statistics - Good Practises Not Only for Experienced_ Administators

Other Factors Influencing the

Access Path

Among other factors influencing access paths there are:

• DB2 HINTS,

• amount of CPU and bufferpool definition;

In order to examine or improve an access path, it is worth

considering:

• BIND EXPLAIN YES,

• virtual indexes,

• ”what if”;

For more information check explain tables.

Page 15: Database Statistics - Good Practises Not Only for Experienced_ Administators

Invalidate Dynamic Statement

Cache

• After database statistics changes the RUNSTATS utility

can be run with the REPORT NO and UPDATE NONE

options on the tablespace or on the index that the query is

dependent on.

It allows invalidating dynamic statement cache.

RUNSTATS TABLESPACE BPG01.SPGOBJRT

TABLE(ALL)

SHRLEVEL CHANGE

REPORT NO

UPDATE NONE

Page 16: Database Statistics - Good Practises Not Only for Experienced_ Administators

Database statistics - tools

Page 17: Database Statistics - Good Practises Not Only for Experienced_ Administators

DSNACCOX

– Procedure 1/2

• DSNACCOX helps you determine on which objects

RUNSTATS utility should be run.

• Recommendations are based on the amount of

changes rather than on the type of changes (distribution

statistics).

Page 18: Database Statistics - Good Practises Not Only for Experienced_ Administators

DSNACCOX

– Procedure 2/2

Page 19: Database Statistics - Good Practises Not Only for Experienced_ Administators

IBM Data Studio – Statistics

Advisor

Types of RUNSTATS:

• COMPLETE RUNSTATS - for the query or workload.

• REPAIR RUNSTATS - repairs the immediate

statistics problems.

Page 20: Database Statistics - Good Practises Not Only for Experienced_ Administators

Automation of the process

Page 21: Database Statistics - Good Practises Not Only for Experienced_ Administators

Maintenance Process

IBM

TW

S

ADT

BKP

REO

RTS

Page 22: Database Statistics - Good Practises Not Only for Experienced_ Administators

Integration of the DB2 Utilities

with IBM Tivoli Workload Scheduler –

Automation

Characteristics of the implementation

of the IBM TWS:

• universal pattern for every job,

• application limit – 255 programs per application,

• automatic restart – in case of an error,

• one ”steering wheel” for all maintenance tasks,

(dependencies between utilities);

Page 23: Database Statistics - Good Practises Not Only for Experienced_ Administators

RUNSTATS Process - How We Use It

Make decision reports, prioritise objects

Select candidate object from a control table

RUNSTATS

Verify and update / insert manually

DB2 catalog stats (only when applies)

Page 24: Database Statistics - Good Practises Not Only for Experienced_ Administators

Statistics Rules – Reality of the

Project

Database statistics are run in the discussed project when:

• objects do not have statistics, but are not empty,

• object growth exceeds 3 mln rows since the last

RUNSTATS or

• 10% of changes occured,

• table activities reached:

WHERE (STATSINSERTS + STATSDELETES + STATSUPDATES +

+ STATSMASSDELETE) > 100 and CARD = 0,

• maintenance occured and:

– REORGLASTTIME>STATSLASTTIME,

– LOADRLASTTIME>STATSLASTTIME,

– REBUILDLASTTIME>STATSLASTTIME,

• every 3 months.

Page 25: Database Statistics - Good Practises Not Only for Experienced_ Administators

Control Table – Maintenance

Processes

CREATE TABLE

PG.MAINTENANCE_CONTROL_TABLE (

ST_OBJECT

ST_DATABASE

ST_PARTITION

ST_OBJECT_TYPE

ST_PRIORITY

ST_PLANNING_DATE

ST_UPDATEPRIO_DATE

ST_JOBID

ST_SAMPLE

ST_SQLID

ST_RULE_NAME

ST_NACTIVE

ST_ONDEMAND

AD_IF_COPY_FULL

AD_IF_COPY_INC

COPY part

Select candidate object for control table

RUNSTATS part

Page 26: Database Statistics - Good Practises Not Only for Experienced_ Administators

Control Tables - Content

Page 27: Database Statistics - Good Practises Not Only for Experienced_ Administators

Example of RUNSTATS

Implementation – IBM TWS

Page 28: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenges

in maintaining database statistics

Page 29: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #1 – Mass Update and Copy

Statistics in a Production Environment

• Complexity of the process:

– many correlations;

• Clone test statistics to the production environment;

• Recommendations coming from tests are distributed as

a project product in order to:

– indicate which release of application they concern,

– minimise the cost of RUNSTATS,

– install before the first BIND;

• Every year new databases partitioned by year have to

be prepared and statistics need to be copied:

– update LOW2KEY, HIGH2KEY and COLVALUE.

Page 30: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #2

– Home-made Procedure for Updating

Statistics

Home-made procedure for updating statistics ad hoc

• Requirements for the procedure: – security of transactions – procedure saves previous values,

– easy to use,

• Columns allowed to be changed: – sysibm.SYSCOLUMNS.COLCARDF

– sysibm.SYSINDEXES.CLUSTERRATIOF

– sysibm.SYSINDEXES.NLEVELS

– sysibm.SYSINDEXES.NLEAF

– sysibm.SYSINDEXES.FIRSTKEYCARDF

– sysibm.SYSINDEXES.FULLKEYCARDF

– sysibm.SYSINDEXES.DATAREPEATFACTORF

– sysibm.SYSCOLDIST (INSERT, UPDATE, DELETE)

Page 31: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #3 – Statistics Conflicts

• Statistics older on tablespaces than on indexes

– Try to run statistics on tablespace and idexes at the same time.

– Take care of your indexes statistics – especially for distribution

of the first column of the index.

• DB2 v11 helps you discover which statistics conflict

with each other.

See presentation ”Runstats Challenges for Optimal Query

Performance” by Jase Alpers (IBM), Terry Purcell (IBM).

Page 32: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #4 – Excessive MIPS

Consumption

Use as much as possible inline statistics to lower the

maintenance cost. They:

• can be used in LOAD, REORG TABLESPACE, REORG

INDEX, and REBUILD INDEX utilities, REORG TABLESPACE LIST REORG_TBSP DRAIN_WAIT 30 RETRY 4 RETRY_DELAY 10

STATISTICS TABLE (ALL) SAMPLE 60 INDEX (ALL KEYCARD FREQVAL NUMCOLS 2 COUNT

15)

• cost less than using RUNSTATS utility in a separate job,

• have some restrictions:

– LOAD with inline statistics is only valid for REPLACE or RESUME

NO options,

– be aware of collecting statistics during REORG SHRLEVEL

CHANGE

or REBUILD SHRLEVEL CHANGE.

Page 33: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #5

– Running RUNSTATS in the Right Time

• Be aware of your data (Frequency Statistics) on indexes

with ”status” column.

• Inconvenient time of running statistics influences access

paths.

• Try to run statistics for indexes and tables at the same

time (keep indexes statistics newer).

Page 34: Database Statistics - Good Practises Not Only for Experienced_ Administators

Challenge #6

– Building Access Paths by Using CGTT

• Update CGTT statistics in order to put this table as first

in the access path.

SELECT JOIN

SMALL

TABLE

BIG TABLE CGTT

SELECT * FROM SMALL, BIG

WHERE SMALL.ID = BIG.ID • Small = 1 000 rows

• Big = 2,5 mld rows

Update table colcard and

columns used for joins

Page 35: Database Statistics - Good Practises Not Only for Experienced_ Administators
Page 36: Database Statistics - Good Practises Not Only for Experienced_ Administators

Conclusions

Page 37: Database Statistics - Good Practises Not Only for Experienced_ Administators

To Sum up…

• Inline statistics cost less.

• Automation and database statistics rules help you keep

statistics process under control.

• Coping statistics cost less than running RUNSTATS.

• Remember about correlation between statistics.

• DB2 v11 can find inconsistencies in statistics.

• R e m e m b e r a b o u t T E S T S !

Page 38: Database Statistics - Good Practises Not Only for Experienced_ Administators

Dziękuję!

Jacek Rafalak

[email protected]

Asseco Poland S.A.