To REORG or not to REORG That is the Question
description
Transcript of To REORG or not to REORG That is the Question
To REORG or not to REORGThat is the Question
Kevin BakerBMC Software
2
Objectives
› Identify I/O performance trends for DB2 pagesets
› Correlate reorganization benefits to I/O performance trends
› Understand the methods for collecting I/O performance data
› Identify object and application metrics that help identify objects in need of reorganization
› Establish a process for identifying application pagesets for analysis
3
Why Pageset Organization matters - the basics
› Its all about your data and the I/O required to access it.
› DB2 computes the best access path for an SQL statement to minimize I/O wait time.
› The access path chosen is primarily influenced by catalog statistics about the referenced tables and indexes.
› DB2’s primary mechanism for avoiding I/O wait time is asynchronous I/O via PREFETCH.
› PREFETCH is most effective in situations involving sequential processing, even if in small bursts, and is significantly affected by the physical ordering of the pages on disk.
4
About Access Paths - Prefetch
› Given viable indexes, the DB2 Optimizer will try to use PREFETCH to read in the data pages asynchronously.
› If it can determine that sequential processing is reasonable when the access path is determined, it will call for SEQUENTIAL PREFETCH.
› If it identifies the need for a lot of specific records that are not sequentially located it may call for RID-based LIST PREFETCH.
› Even if the access path calls for random access (which involves synchronous I/O), runtime monitoring may invoke DYNAMIC PREFETCH if the pages requested appear to be even loosely sequential.
5
About Access Paths - BIND
› For STATIC SQL, the access path is determined at the time the program containing the SQL goes through the BIND process.
› This means that the table and index statistics in the catalog at the time of the BIND determine the access path, and it remains fixed until the next time a BIND is done.
› For DYNAMIC SQL, the access path is set by the PREPARE process every time the statement is executed (more or less).
› This means that the access path can change from run to run if the relevant catalog statistics are updated.
6
About Access Paths - Statistics
› There are several dozen statistics, kept in various catalog tables, that are used by the DB2 optimizer to select access paths.
› Many of these influence the choice of indexes, order of joins, etc.
› For the purpose of this discussion, we are interested in just a few that indicate organization level of the table or index.
› We will cover some of the statistics traditionally used to recommend REORGS later in the presentation.
7
About Access Paths - Statistics
› Tables (TABLESPACES/PARTITIONS)– CARD (Cardinality) – the number of rows in the tablespace or partition– FARINDREF – the number of rows relocated far from their original page
› Indexes (INDEXSPACES)– CLUSTERRATIO – the percentage of rows in clustering order – LEAFFAR – the percentage of leaf pages physically located far from the
previous leaf page accessed in an index scan.
8
Impact on SQL Performance
› To explore the impact on SQL performance we set up some special tables, indexes, and SQL workloads.
› To minimize variables:– We used test DB2 subsystems with stable configurations throughout the
testing.– We isolated measured pagesets to their own buffer pools that were large, and
cleared prior to each run.– We avoided workloads that would cause dis-organization to an extent that
would drastically affect access path.
› Factors explored were inserts, updates that relocate rows, and free space.
9
Case 1: Dynamic Sequential after Updates
› DB2 version 8› 100k row table, no freespace, clustering index› DYNAMIC SELECT workload; returns 10k rows
– access path using index and sequential prefetch› UPDATE workload that updates 5k random rows in such a way that
the rows have to be relocated.› RUNSTATS for the table and index done after each update workload
and key statistics captured.› Access performance statistics for the SELECT workloads were
gathered for both the table and index.
10
Case 1: Dynamic Sequential after Updates
187651422801000100000861272828414FINAL RUN
REORG
36812114244223801001433510000012666566122023Run10
36812114244223801001298910000012036060220583Run9
36812114244223801001162910000011965453319204Run8
36812114244223801001026510000011064946917574Run7
3681211424422380100888710000010634640716044Run6
3681211424422380100752410000010304433214564Run5
368121142442238010061421000009554128313205Run4
368121142442238010046881000009243822211765Run3
370121122422237210032311000008673214410377Run2
33812112242222861001696100000812288487910Run1
155551252501000100000733232699350Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
11
Case 2: Dynamic Random after Updates
› DB2 version 8
› 100k row table, no freespace
› DYNAMIC SELECT workload; returns 10k rows– access path completely random
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› RUNSTATS for the table and index done after each update workload.
12
Case 2: Dynamic Random after Updates
0074730234010001000000098110031FINAL RUN
REORG
008863046323801001433510000000122912961Run10
009003051323801001298910000000122712791Run9
008883052323801001162910000000119312501Run8
008873052323801001026510000000118212311Run7
00911305432380100888710000000112811781Run6
00908306032380100752410000000110511571Run5
00894305232380100614210000000107811301Run4
00912305532380100468810000000106811091Run3
00878304532372100323110000000103710811Run2
0088630483228610016961000000099010301Run1
0071430274010001000000096610031Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
13
Case 3: Dynamic Sequential after Inserts
› DB2 version 8
› 100k row table, no freespace, clustering index
› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch
› Insert workload that inserts 100 random rows each run.
› RUNSTATS for the table and index done after each insert workload.
14
Case 3: Dynamic Sequential after Inserts
155551212401000101000733232681341FINAL RUN
REORG
186610722122389901010007268551232446Run10
186610822322389901009007308147217746Run9
186610922522389901008006996243202847Run8
186611122822389901007007316137187551Run7
187711223022389901006007315831170655Run6
187711323222389901005007315425154462Run5
187711323522389901004007315320136368Run4
189811223522349901003007313414120186Run3
202810021921969901002007313591024114Run2
1967751953142990100100733303848283Run1
155551252501000100000733232698349Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
15
Case 4: Dynamic Random after Inserts
› DB2 version 8
› 100k row table, no freespace
› DYNAMIC SELECT workload; returns 10k rows– access path completely random
› Insert workload that inserts 100 random rows each run.
› RUNSTATS for the table and index done after each Insert workload.
16
Case 4: Dynamic Random after Inserts
00132302023010001010000058710132FINAL RUN
REORG
002403033132389901010000059111132Run10
002403060132389901009000057310782Run9
002413066132389901008000056910802Run8
002393057132389901007000057010732Run7
002363055132389901006000055910642Run6
002413049132389901005000056110422Run5
002423046132389901004000056110492Run4
002343047132349901003000054110182Run3
002213045141969901002000054810242Run2
001943042161429901001000054910112Run1
00123302225010001000000054610032Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
17
Case 5: Dynamic Sequential after Updates without RUNSTATS
› DB2 version 8
› 100k row table, no freespace, clustering index
› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› No RUNSTATS between runs so the catalog statistics are not beingupdated.
18
Case 5: Dynamic Sequential after Updates without RUNSTATS
187651422801000100000861272828414FINAL RUN
REORG
3681211424420100010000012666566122023Run10
3681211424420100010000012036060220583Run9
3681211424420100010000011965453319204Run8
3681211424420100010000011064946917574Run7
3681211424420100010000010634640716044Run6
3681214424420100010000010304433214564Run5
368121142442010001000009554128313205Run4
368121142442010001000009243822211765Run3
370121122422010001000008673214410377Run2
33812112242201000100000812288487910Run1
155551252501000100000733232698349Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
19
Case 6: Static Sequential after Updates without Rebind or RUNSTATS
› DB2 version 8
› 100k row table, no freespace, clustering index
› STATIC SELECT workload; returns 10k rows– access path using index and sequential prefetch
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› No RUNSTATS and no REBIND between runs
› Final runs after REORG, and then after RUNSTATS and REBIND
20
Case 6: Static Sequential after Updates without Rebind or RUNSTATS
187651422801000100000861272828414
After REORG, STATS, REBIND
160681421801000100000832275828166Final run after
REORG
002422441010001000007362390422022Run10
002422441010001000007362381220583Run9
002422441010001000007362372419203Run8
002422441010001000007362362517573Run7
002422441010001000007362353716043Run6
002422441010001000007362345014563Run5
002422441010001000007362337013204Run4
002422441010001000007362328611764Run3
002402421010001000007362320410375Run2
00240242101000100000736231078798Run1
160581251601000100000704235698140Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
21
Case 7: Dynamic Sequential after Updates, no RUNSTATS, with FREESPACE
› DB2 version 8
› 100k row table, – table PCTFREE = 10; index PCTFREE = 5
› Clustering index
› Dynamic SELECT workload; returns 10k rows– access path using index and sequential prefetch
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› No RUNSTATS between runs
22
Case 7: Dynamic Sequential after Updates, with FREESPACE
187651493001000100000957302926463FINAL RUN
REORG
28410126259223801001433510000010275727516166Run10
2841012625922380100129891000009675220914717Run9
2761012625922380100116291000009525014913389Run8
2761012625922380100102651000009024689119013Run7
27691202532238010088871000008293821102849Run6
2769100233223801007524100000829353928309Run5
214778306323801006142100000829282862431Run4
187654182323801004688100000829272814407Run3
187628156623721003231100000829272790395Run2
187681361722861001696100000829262780390Run1
187651332701000100000829262778389Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
23
Chart of Case1: Sequential after Updates without Freespace
24
Chart of Case7: Sequential after Updates with Freespace
25
Conclusions about SQL Performance and Pageset Organization
› Relocating update activity reduces organization, significantly affects sequential performance ,
– can cause some increases in workload due to increased row sizes,› Insert activity reduces organization, significantly affects sequential
performance, – can also increase workload (rows fetched)
› Random table access will not be significantly affected by reduced organization or improved by REORGs
– Index access can be degraded although this is usually a lesser impact› Freespace delays worst performance impacts,
– at cost of unused disk space
26
Conclusions about SQL Performance and Pageset Organization
› Typical performance impacts include– Increased getpage activity
• Can also be caused by increased workloads– Increased sync I/Os, increased sync I/Os per getpage
• Can be masked by buffer pool tuning
› Updated statistics can help optimizer compensate – Dynamic SQL, Static SQL if rebound– Can also cause unexpected access path changes; could make things much
worse – RUNSTATS causes statement invalidation in the Dynamic statement cache.
27
Case 8: DB2 9 - Dynamic Sequential after Updates, no FREESPACE
› DB2 version 9
› 100k row table, no freespace, clustering index
› DYNAMIC SELECT workload; returns 10k rows– access path using index and sequential prefetch
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› RUNSTATS for the table and index done after each update workload.
28
Case 8: DB2 9 - Dynamic Sequential after Updates, no FREESPACE
224781952401000100000864275828166FINAL RUN
REORG
0038238413768100141111000007362393221582Run10
0038238413754100127821000007362384120182Run9
0038238413742100114521000007362375418873Run8
0038138313720100101031000007362365817263Run7
003803821365810087491000007362356915793Run6
003753771355210074171000007362347914363Run5
003663681338210060571000007362338913023Run4
003463481306210046371000007362329711654Run3
3212823161250810032231000007362321010295Run2
224781267315621001707100000736231138768Run1
224781952401000100000736235698140Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats Statistics
Thread Statistics (tablespace)
29
Case 9: DB2 9 - Dynamic Random after Updates, no FREESPACE
› DB2 version 9
› 100k row table, no freespace
› DYNAMIC SELECT workload; returns 10k rows– access path completely random
› UPDATE workload that updates 5k random rows in such a way that the rows have to be relocated.
› RUNSTATS for the table and index done after each update workload.
30
Case 9: DB2 9 - Dynamic Random after Updates, no FREESPACE
0082430264010001000000097810031FINAL RUN
REORG
009683054337681001411110000000122812811Run10
009513039337541001278210000000121012721Run9
009553032337421001145210000000119812611Run8
009553038337201001010310000000115712141Run7
00938303733658100874910000000113911861Run6
00949304233552100741710000000109311311Run5
00939304033382100605710000000108211211Run4
00945304233062100463710000000104710801Run3
00932302832508100322310000000102810671Run2
00881303031562100170710000000100410401Run1
0082030204010001000000097310031Initial Run
pgsreqSIOFARratioDREFpgsreqSIO
PrefPrefSIOGPGP/LEAFCLUSTFARIN-CARDPrefPrefSIOGPGP/
Thread Statistics (indexspace)Runstats StatisticsThread Statistics (tablespace)
31
DB2 9 and Real-Time Statistics (RTS)
› DB2 version 9 has added the ability to dynamically maintain pageset statistics in separate catalog tables.
– SYSIBM.SYSTABLESPACESTATS– SYSIBM.SYSINDEXSPACESTATS
› These values are maintained without RUNSTATS› The Optimizer does not use them for access path analysis› Many of the statistics are relative to the last REORG › With an understanding of when they get updated, they are basically
always available. › These are available in v8 too. (even v7!) You just have to do some
work to set them up.
32
Some RTS Statistics
› The RTS tables have the basic statistics we have been looking at:
– SYSIBM.SYSTABLESPACESTATS– REORGLASTTIME – timestamp of last REORG– REORGNEARINDREF – number rows relocated since REORG but near the
original page– REORGFARINDREF - number rows relocated since REORG far from the
original page– SYSIBM.SYSINDEXSPACESTATS– NLEVELS - number of index levels– REORGLEAFNEAR – number of leaf pages relocated but near its previous
logical leaf page– REORGLEAFFAR – number of leaf pages relocated far from its previous
logical leaf page
33
Methods for deciding when to REORG
› Typical methods include– Automatically on fixed schedule– When certain catalog statistics breach a threshold
• Change in cardinality and degraded cluster ratio• Degraded page ordering (FARINDREF)• Degraded leaf page distribution, ordering, levels
› We can save resources used for REORGs if we– Correlate those statistics with performance data– Only REORG if and when needed
34
Collecting Performance Data
› DB2 IFCID 199 contains key statistics for each pageset with at least 1 I/O per second average in a stat cycle:
– DBID, PSID, partition – Getpage counts– Sync I/Os– Async I/Os – Async pages read
› If activated, these records are produced on the regular DB2 statistics cycle.
35
Collecting Performance Data
› A custom program, SAS procedure, or vendor tool can be used to– collect these records periodically (e.g. daily), – summarize the numbers by pageset, – add them to a performance statistics table,
› This performance statistics table can have columns for date-time, DBID, PSID, Partition, getpages, sync I/Os, async I/Os, async pages read.
› It’s also necessary to know when REORGS are done – add a column to indicate a REORG event
36
Real-Time Statistics
› With the availability of the RTS tables, – it is now possible to capture key organization statistics,– on a daily basis,– easily correlate with performance statistics.– Could capture them at the same time the performance statistics are
summarized for the day.– Could keep them in the same table.– Solves the problem of capturing last REORG time.
› Now possible to do percentage change calculations on catalog statistics as well as performance statistics.
37
Collecting Performance Data - sample table layout
› CREATE TABLE PAGESET_PERFORMANCE_TABLE › (COLLECT_TIME TIMESTAMP, › DBID SMALLINT, › PSID SMALLINT,› PART SMALLINT, › BPID SMALLINT, › DBNAME CHAR(8),› PSNAME CHAR(8), › TYPE CHAR(1), › REORGLASTTIME TIMESTAMP, › TABLE_FARINDREF INTEGER, › TABLE_NEARRINDREF INTEGER, › TABLE_REORGUNCLUSTINS INTEGER, › TABLE_TOTALROWS INTEGER,› INDEX_REORGLEAFFAR INTEGER, › INDEX_REORGLEAFNEAR INTEGER, › INDEX_NLEAF INTEGER,› INDEX_TOTALENTRIES INTEGER,› GETPAGES INTEGER, › SYNC_IO INTEGER, › GPPERSIO INTEGER,› ASYNC_IO INTEGER, › ASYNC_PAGES INTEGER);
38
Developing REORG Triggers
› Performance data analysis can be used either– to recommend REORGS as needed, or– to study the results of REORGS and use the information to adjust fixed
schedules.› Either way, the analysis usually depends on a “trigger”, which is a
metric or formula threshold that is used to decide whether a REORG is needed or not.
› The threshold part of these triggers often have to be tailored to the needs of each application.
› Note that physical state and storage use triggers, such as extents, percentage of dropped table rows, etc. are indicators that represent issues unrelated to performance.
39
Developing REORG Triggers
› Recommendations from the Administration Guide
– Table spaces, REORG if• More than 10% of rows relocated far • If clustering index, then CLUSTERRATIO< 90%
– Else number referenced rows far from optimal > 10%
– Index spaces, REORG if• More than 10% of active leaf pages are far from optimal position• The average distance between consecutive leaf pages exceeds 2• More than a designated percentage of rows have been inserted or deleted
40
Developing REORG Triggers
› Factoring in degraded performance…
› Tables paces or index spaces, REORG if– Baseline prefetch pages is > 2 x sync I/Os– And sync I/Os have increased 20% since baseline– And getpages per sync I/O have fallen 20%
› Any pageset nominated for REORG by the performance triggers and the catalog statistics triggers is a good candidate for REORG.
41
Developing REORG Triggers
› Post REORG analysis – If using regularly scheduled REORGS, analysis of the performance data a day
after the reorganization can indicate the degree of improvement.
› Interesting data points: (getpages, sync I/O, async pages, and getpages per sync I/O)
– Before the REORG, % increase in metrics since the last REORG (baseline).– After the latest REORG, % decrease in metrics.– Small values indicate REORG was too soon or not needed at all.– Large values mean the REORG was too late.
42
Putting it all Together
› Activate Real-Time Statistics in DB2 – (pre v9, do setup work)
› Activate DB2 statistics class 8 and begin recording IFCID 199 data › Create the pageset performance table.› Setup daily job to summarize 199 data, collect RTS data, and populate
the pageset performance table.› Either use the data to
– adjust REORG schedules, – directly trigger REORGS, or – perform post REORG analysis on benefits.
43
Conclusion
› With a little work it is possible to setup a process to capture pageset performance statistics and real-time object statistics.
› With this data better triggers can be developed that only recommend REORGS when performance has been degraded.
› Post REORG analysis of this data can help to refine trigger thresholds or adjust schedules to balance performance versus costs.
44
Kevin BakerBMC Software, [email protected]