Database Statistics - Good Practises Not Only for Experienced_ Administators
-
Upload
fallenlord -
Category
Documents
-
view
12 -
download
0
description
Transcript of Database Statistics - Good Practises Not Only for Experienced_ Administators
Database Statistics - good practices not only
for experienced administrators
18.03. 2015
Asseco at a Glance
• Founded in 1991
• The largest IT company in CEE
• 6th largest software producer in Europe
• Traded on the WSE, included in the WIG30 Blue Chip index
• 17 000 employees worldwide
• Selling proprietary software and services
• Strong financials with a great track record
– 2013 revenue of PLN 5,9b (EUR 1,4b)
– CAGR +17.9% (2009-2013)
– 2013 EBIT of PLN 611m (EUR 145m)
Our Offices Worldwide
– The Asseco Group
– Asseco Poland S.A.
Presentation Objective
Question:
How to maintain everyday database statistics process
in a complex environment?
What do I mean by the term:
complex environment?
Complex Environment
> 60 TB of production
data (6 DB2 members
in Data Sharing)
DML 48K/s,
GETPAGES 392K/s, IO
RW 13K/s
IO R 12K/s
> 475 GB data growth
per month
DB2 environment
Weekly cost
of maintenance for
RUNSTATS - 240 MIPS
3 807 colgroups
definitions
2 344 columns to verify
> 500 active plans and
over 12 000 active
packages to BIND własny.
Maintenance
Largest objects - NPI
INDEX >230 GB,
Tablespace 280 GB
400 000 objects
to maintain
Every year >30K new
objects
Automation
Conclusions
Presentation Plan
1
2 Tools for database statistics
3 Automation of the process of database statistics
4 Challenges in the project
5
Introduction to database statistics
Introduction
to Database statistics
Database Statistics – General
Overview
• Usage of the RUNSTATS utility:
Running RUNSTATS utillity enables DB2® to choose efficient access
paths by keeping the statistics accurate and up-to-date.
• The collected statistics concerning database objects
are stored in DB2 in a catalog.
• The collected statistics are used by DB2 during the
BIND process, when the most efficient access
paths are determined.
RUNSTATS Utility 1/2
* RUNSTATS does not collect statistics for clone tables, CGTT or index spaces.
** Leading columns are collected by using RUNSTATS INDEX.
Functionalities RUNSTATS
TABLESPACE
RUNSTATS
INDEX
gathers statistics on a tablespace X
gathers statistics on tables X
gathers statistics on indexes X
gathers statistics on columns x
RUNSTATS Utility 2/2
RUNSTATS collects the following three types
of distribution statistics:
Cardinality
The number of distinct
values in the column
or set of columns.
02 Histograms
Histogram statistics are
to be gathered for the
specified group
of columns.
03 Frequency
The percentage of rows
in the table that contain
a value for a column or
combination of values
for a set of columns.
01
Selection of Statistics Stored in DB2
Catalog Used for Access Path Selection
• SYSIBM.SYSCOLDIST - CARDF, COLGROUPCOLNO, COLVALUE,
FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, TYPE,
QUANTILENO
• SYSIBM.SYSCOLSTATS - COLCARD, HIGHKEY, LOWKEY, PARTITION
• SYSIBM.SYSCOLUMNS - COLCARDF, HIGH2KEY, LOW2KEY
• SYSIBM.SYSINDEXES - CLUSTERING, CLUSTERRATIOF, FIRSTKEYCARDF,
FULLKEYCARDF, NLEAF, NLEVELS, DATAREPEATFACTORF
• SYSIBM.SYSINDEXPART – LIMITKEY
• SYSIBM.SYSTABLES - CARDF, EDPROC, NPAGES, NPAGESF,
PCTROWCOMP
• SYSIBM.SYSTABLESPACE - NACTIVEF
• SYSIBM.SYSTABSTATS - CARDF, NPAGES
Statistics Rules – General
Recommendations 1/2
Database statistics are recommended to run:
• after loading a table and before binding application
plans and packages that access the table,
• after creating an index,
• after reorganizing a table space or an index,
• after running utilities such as RECOVER or REBUILD,
Statistics Rules – General
Recommendations 2/2
Database statistics are recommended to run:
• after heavy insert, update, and delete activity,
• against the DB2 catalog to provide DB2 with more
accurate information for access path selection of user
queries to the catalog,
• before REORG or REBUILD in order to determine
which objects need reorganisation.
Other Factors Influencing the
Access Path
Among other factors influencing access paths there are:
• DB2 HINTS,
• amount of CPU and bufferpool definition;
In order to examine or improve an access path, it is worth
considering:
• BIND EXPLAIN YES,
• virtual indexes,
• ”what if”;
For more information check explain tables.
Invalidate Dynamic Statement
Cache
• After database statistics changes the RUNSTATS utility
can be run with the REPORT NO and UPDATE NONE
options on the tablespace or on the index that the query is
dependent on.
It allows invalidating dynamic statement cache.
RUNSTATS TABLESPACE BPG01.SPGOBJRT
TABLE(ALL)
SHRLEVEL CHANGE
REPORT NO
UPDATE NONE
Database statistics - tools
DSNACCOX
– Procedure 1/2
• DSNACCOX helps you determine on which objects
RUNSTATS utility should be run.
• Recommendations are based on the amount of
changes rather than on the type of changes (distribution
statistics).
DSNACCOX
– Procedure 2/2
IBM Data Studio – Statistics
Advisor
Types of RUNSTATS:
• COMPLETE RUNSTATS - for the query or workload.
• REPAIR RUNSTATS - repairs the immediate
statistics problems.
Automation of the process
Maintenance Process
IBM
TW
S
ADT
BKP
REO
RTS
Integration of the DB2 Utilities
with IBM Tivoli Workload Scheduler –
Automation
Characteristics of the implementation
of the IBM TWS:
• universal pattern for every job,
• application limit – 255 programs per application,
• automatic restart – in case of an error,
• one ”steering wheel” for all maintenance tasks,
(dependencies between utilities);
RUNSTATS Process - How We Use It
Make decision reports, prioritise objects
Select candidate object from a control table
RUNSTATS
Verify and update / insert manually
DB2 catalog stats (only when applies)
Statistics Rules – Reality of the
Project
Database statistics are run in the discussed project when:
• objects do not have statistics, but are not empty,
• object growth exceeds 3 mln rows since the last
RUNSTATS or
• 10% of changes occured,
• table activities reached:
WHERE (STATSINSERTS + STATSDELETES + STATSUPDATES +
+ STATSMASSDELETE) > 100 and CARD = 0,
• maintenance occured and:
– REORGLASTTIME>STATSLASTTIME,
– LOADRLASTTIME>STATSLASTTIME,
– REBUILDLASTTIME>STATSLASTTIME,
• every 3 months.
Control Table – Maintenance
Processes
CREATE TABLE
PG.MAINTENANCE_CONTROL_TABLE (
ST_OBJECT
ST_DATABASE
ST_PARTITION
ST_OBJECT_TYPE
ST_PRIORITY
ST_PLANNING_DATE
ST_UPDATEPRIO_DATE
ST_JOBID
ST_SAMPLE
ST_SQLID
ST_RULE_NAME
ST_NACTIVE
ST_ONDEMAND
AD_IF_COPY_FULL
AD_IF_COPY_INC
COPY part
Select candidate object for control table
RUNSTATS part
Control Tables - Content
Example of RUNSTATS
Implementation – IBM TWS
Challenges
in maintaining database statistics
Challenge #1 – Mass Update and Copy
Statistics in a Production Environment
• Complexity of the process:
– many correlations;
• Clone test statistics to the production environment;
• Recommendations coming from tests are distributed as
a project product in order to:
– indicate which release of application they concern,
– minimise the cost of RUNSTATS,
– install before the first BIND;
• Every year new databases partitioned by year have to
be prepared and statistics need to be copied:
– update LOW2KEY, HIGH2KEY and COLVALUE.
Challenge #2
– Home-made Procedure for Updating
Statistics
Home-made procedure for updating statistics ad hoc
• Requirements for the procedure: – security of transactions – procedure saves previous values,
– easy to use,
• Columns allowed to be changed: – sysibm.SYSCOLUMNS.COLCARDF
– sysibm.SYSINDEXES.CLUSTERRATIOF
– sysibm.SYSINDEXES.NLEVELS
– sysibm.SYSINDEXES.NLEAF
– sysibm.SYSINDEXES.FIRSTKEYCARDF
– sysibm.SYSINDEXES.FULLKEYCARDF
– sysibm.SYSINDEXES.DATAREPEATFACTORF
– sysibm.SYSCOLDIST (INSERT, UPDATE, DELETE)
Challenge #3 – Statistics Conflicts
• Statistics older on tablespaces than on indexes
– Try to run statistics on tablespace and idexes at the same time.
– Take care of your indexes statistics – especially for distribution
of the first column of the index.
• DB2 v11 helps you discover which statistics conflict
with each other.
See presentation ”Runstats Challenges for Optimal Query
Performance” by Jase Alpers (IBM), Terry Purcell (IBM).
Challenge #4 – Excessive MIPS
Consumption
Use as much as possible inline statistics to lower the
maintenance cost. They:
• can be used in LOAD, REORG TABLESPACE, REORG
INDEX, and REBUILD INDEX utilities, REORG TABLESPACE LIST REORG_TBSP DRAIN_WAIT 30 RETRY 4 RETRY_DELAY 10
STATISTICS TABLE (ALL) SAMPLE 60 INDEX (ALL KEYCARD FREQVAL NUMCOLS 2 COUNT
15)
• cost less than using RUNSTATS utility in a separate job,
• have some restrictions:
– LOAD with inline statistics is only valid for REPLACE or RESUME
NO options,
– be aware of collecting statistics during REORG SHRLEVEL
CHANGE
or REBUILD SHRLEVEL CHANGE.
Challenge #5
– Running RUNSTATS in the Right Time
• Be aware of your data (Frequency Statistics) on indexes
with ”status” column.
• Inconvenient time of running statistics influences access
paths.
• Try to run statistics for indexes and tables at the same
time (keep indexes statistics newer).
Challenge #6
– Building Access Paths by Using CGTT
• Update CGTT statistics in order to put this table as first
in the access path.
SELECT JOIN
SMALL
TABLE
BIG TABLE CGTT
SELECT * FROM SMALL, BIG
WHERE SMALL.ID = BIG.ID • Small = 1 000 rows
• Big = 2,5 mld rows
Update table colcard and
columns used for joins
Conclusions
To Sum up…
• Inline statistics cost less.
• Automation and database statistics rules help you keep
statistics process under control.
• Coping statistics cost less than running RUNSTATS.
• Remember about correlation between statistics.
• DB2 v11 can find inconsistencies in statistics.
• R e m e m b e r a b o u t T E S T S !