Ugif 10 2012 ppt0000002
Transcript of Ugif 10 2012 ppt0000002
User Group Informix France
Update Statistics
Olivier [email protected]
Mercredi 3 Octobre 2012Mercredi 3 Octobre 2012
User Group Informix France
� Brief Review and History� What’s changed?
– 11.10, 11.50– 11.70 – “Smart Statistics”
� 11.70 FAQ’s– Do I need to do anything different?
– Did the update statistics update any stats?– Update statistics and reoptimization
Overview
User Group Informix France
Why is statistics important?
�Choosing the right QUERY PATH determines how fast you get your results.
�Choosing the Wrong Path can be like going around the world to get to your neighbor’s.
• Expensive to go around the world.
• Takes too long.
User Group Informix France
Query Optimization Process
� Examine all tables (table A, table B, table C)– Examine selectivity of every filter (where clauses)– Determine if indexes can be used for filters, order by, group by– Find the best way to scan a table -- sequentially or by an index
� Identify Join Pairs (AB, AC, BA, BC, CA, CB)– Find best join method (nested loop, hash, or sort merge)– Decide which indexes are best for the join– Calculate the cost of the join
� Repeat for each additional table (ABC, ACB, BAC, ...)
User Group Informix France
Estimating costs: need data !
� Find the cheapest/lowest cost path.– Cost = I/O cost + Weight * (CPU cost)– I/O -- disk access– CPU -- Rows processed
� Estimate costs – Filters -- Which indexes to use?– Joins -- Nested Loop, Hash, or Sort Merge?– Eliminate redundant pairs?
User Group Informix France
Filter selectivity
� Selectivity is the percentage of rows selected as a result of a filter (number between 0 and 1)
Expression Filter Selectivity
indexed_col = literal
value
F=1/(number of distinct keys in index)
indexed_col > literal
value
F = (literal value - 2nd min)/(2nd max-2nd
min)
NOT expression F = 1 - F(expression)
expr1 AND expr2 F = F(expr1) x F(expr2)
User Group Informix France
How do we influence Quey Optimization ?
� OPTCOMPIND� Optimizer directives, Optimization Goals� Update Statistics
– Collect information for the optimizer– Table nrows, npused; Index Statistics -- LOW– Data Distributions -- MEDIUM & HIGH– Compile Stored Procedures
User Group Informix France
Where are the stats stored ?
� systables (Low)– nrows, npused
� sysindices (Low)– leaves, levels, nunique, clust
� syscolumns (Low)– colmin, colmax
� sysfragments (Low)– nrows, npused, – For index partitions, levels, clust
� sysdistrib (Medium or High)Can view with dbschema -hd
User Group Informix France
View Query Path
� Set explain on– Can be set in session
� Explain Directive– Can be embedded in the query
� xtrace Debug– Support may ask you to turn this on
FOREACH SELECT {+EXPLAIN } order_num INTO p_num FROM orders WHERE customer_num = 104 ORDER BY order_num
User Group Informix France
Debugging with xtrace
� To “see” the statistics information being used for query optimization
Example:xtrace heavy -c XTF_OPTMZR -f XTF_DEBUGxtrace size 10000xtrace on
Use “xtrace fview” or “xtrace view” to view traces.
“xtrace fview” includes timestamps.
Use “xtrace info” to display current xtrace settings.
Use “xtrace --” for xtrace usage info.
User Group Informix France
Xtrace: example
f1 31310 16 get_distrib(): distrib not found for table c col zipcodef1 7401 16 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0434783……f2 1207 16 oprowspages(tab = c, nrows = 28, npages = 2)f2 13217 16 opmix_iscancost(numrows=1.21739,npages=2,pagesread=1.13988)f2 13225 16 opmix_iscancost(scancost=1.1764,indexcost=1.08, …, iscancost=2.2564)
f1 31310 18 get_distrib(): distrib found for table c col zipcodef1 7401 18 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0357143……f2 1207 18 oprowspages(tab = c, nrows = 28672, npages = 2048)…f2 2237 18 dpages = 24576 lpages = 84 nlevels = 2f2 1871 18 dcost = 33.72 seek 0 keyonly = TRUEf2 1896 18 iscancost(c, zip_ix) cost = 35.72f2 13217 18 opmix_iscancost(numrows=1024,npages=2048,pagesread=805.977)f2 13225 18 opmix_iscancost(scancost=836.697,indexcost=35.72, …, iscancost=872.417)
Before
After Update Statistics
User Group Informix France
Xtrace (after ... cont’d)
…f2 1207 18 oprowspages(tab = c, nrows = 28672, npages = 2048)f2 1320 18 opscantabcost(c) npages = 2048, nrows = 28672, cost = 2909.16f2 1527 18 opcartcost(c) cost = 2909.16 initcost = 0f2 1988 18 index_info(): index 100_1 fullness 0.75 recs_per_node 128 keylen 4…f2 2237 18 dpages = 2048 lpages = 187 nlevels = 3f2 10863 18 idxtree_travcost s 3.48772e-05 nlevels 3 lpages .. dpages .. mempages 512f2 14448 18 seek_factor 6 clust 2048 clust_scale 0 seek 0…f2 1727 18 opidxcost(c, 100_1) = 0.745763f1 16094 18 index 100_1 considered, icost 0.745763, istart 0.0078125, fltragg 0f1 16324 18 indexp(): best index path: idx 100_1 icost = 0.745763 idx_flags2f3 3462 18 idx cost = 0.745763 initcost = 0.0078125 totalcost = 17.1526f3 3465 18 outer size = 23 join size = 1f3 8468 18 build inner table, init cost is 13.5745, join cost is 4.24268f3 8568 18 build outer table, init cost is 4.24268, join cost is 13.5745
User Group Informix France
sqexplain.out (before)
select c.city, c.state, o.ship_date from customer c , orders o where c.customer_num = o.customer_num and c.state = ? and c.zipcode = ?
Estimated Cost: 3Estimated # of Rows Returned: 1
1) informix.c: INDEX PATHFilters: informix.c.state = 'AZ'
(1) Index Name: informix.zip_ixIndex Keys: zipcode (Serial, fragments: ALL)Lower Index Filter: informix.c.zipcode = '85016'
2) informix.o: INDEX PATH(1) Index Name: informix. 102_4
Index Keys: customer_num (Serial, fragments: ALL)Lower Index Filter: informix.c.customer_num =
informix.o.customer_numNESTED LOOP JOIN
User Group Informix France
sqexplain.out (after)
select c.city, c.state, o.ship_date from customer c , orders o where c.customer_num = o.customer_num and c.state = ? and c.zipcode = ?
Estimated Cost: 19Estimated # of Rows Returned: 1
1) informix.o: SEQUENTIAL SCAN2) informix.c: INDEX PATH
Filters: (informix.c.zipcode = '85016' AND informix.c.state = 'AZ' )
(1) Index Name: informix. 100_1Index Keys: customer_num (Serial, fragments: ALL)Lower Index Filter: informix.c.customer_num =
informix.o.customer_numNESTED LOOP JOIN
Customer has 28672 rows.
Orders has 23 rows.
User Group Informix France
Before 11.x
� Before 11.x– Update statistics low, – Update statistics medium, high
• Resolution, Confidence
– Update statistics distributions only– Update statistics drop distributions– Update statistics for table, for procedure– Lots of guidelines
• What to run update statistics on• Which update statistics to run• How to run update statistics
� Scripts
� Cron jobs
User Group Informix France
Guidelines
� Update statistics medium distributions only for all columns that do not have an index
� Update statistics high for columns that are the first key in an index
� Update statistics low for all columns in multicolumn indexes� Run with PDQ for better performance (for table ONLY)� Do not run with PDQ for update statistics for procedure
User Group Informix France
Issues (before 11.x)
� Difficult to know when update statistics was run last� Guidelines weren’t always well-understood� People weren’t sure how to run update statistics
– Accidentally over-wrote statistics by running HIGH first, then MEDIUM
– Accidentally compiled stored procedures with PDQ– Ran Update Stats LOW twice (performance issue)
Update statistics LOW for table tab1;
Update statistics HIGH for table tab1 (col1, col2);
What might be considered “missing” here?
User Group Informix France
11.10 Features
� 11.10 Enhancements– Create index creates initial stats and distribution
information for the leading column of the index– Enhance catalog information
• What time was update statistics Low run?• What time were the distributions created?• How many rows were sampled for the distributions?
– New “Sampling Size” option– Update statistics drop distributions ONLY– Auto Update Statistics Scheduler tasks
User Group Informix France
Help with Guidelines
� Use scheduler task “Auto Update Statistics Evaluation”– Scheduler task can be run “on-demand” using exectask()
� Use script in Informix Technote (swg21137764)– UPDATE STATISTICS commands to allow the optimizer
to work its best
� Use Art Kagel’s dostats (from IIUG)
http://www-01.ibm.com/support/docview.wss?uid=swg21137764
Execute function exectask(‘Auto Update Statistics Evaluation’)
User Group Informix France
US History
� First introduced in 11.10– Scheduler task “Auto Update Statistics Evaluation”– Scheduler task “Auto Update Statistics Refresh”– Uses the guidelines to determine the update statistics
commands to run
� Enhancement to work with non-English Locales in 11.50.xC6
User Group Informix France
AUS Scheduler Tasks
� Runs Update Statistics FOR TABLE commands
� Runs with PDQ set to AUS_PDQ in sysadmin:ph_threshold
UPDATE STATISTICS LOW FOR TABLE stores7:customerUPDATE STATISTICS HIGH FOR TABLE stores7:customer ( customer_num, zipcode ) RESOLUTION 0.500 DISTRIBUTI ONS ONLY
> select * from ph_threshold where name = "AUS_PDQ" ;id 30name AUS_PDQtask_name Auto Update Statistics Refreshvalue 10 value_type NUMERICdescription Update statistics executes with this P DQ priority.
User Group Informix France
AUS Parameters
AUS_AGE aus_evaluator
The statistics are rebuilt after specified days.
AUS_CHANGE aus_evaluator
The statistics are rebuilt after specified percentage
of data has changed.
AUS_AUTO_RULES aus_evaluator
1 or 0 – if “off”, only evaluates tables that already
have statistics.
AUS_SMALL_TABLES aus_evaluator
Tables containing less than this number of rows will
always have their statistics rebuilt.
AUS_PDQ aus_refresh_stats
Run Update Statistics with this PDQ setting.
User Group Informix France
11.70 Features
� Smart Statistics– Default: AUTO_STAT_MODE 1 – Default: STATCHANGE 10– Update Statistics command, when run, is not executed
for index statistics and for table distribution if the STATCHANGE threshold has not been met
� Fragment-level Statistics– Not on by default– Not discussed in this presentation
User Group Informix France
11.70 Statistics Updated ?
�Update Statistics info in database catalog tables–Look at ustlowts in systables
• Updated when systables' nrows and npused are updated – this is done whenever update statistics command is run – STATCHANGE threshold is not looked at
–Look at ustlowts in sysindices• Updated when index statistics are rebuilt/updated
–Look at constr_time in sysdistrib• Updated when distribution statistics are rebuilt/updated
User Group Informix France
Example
$ dbaccessdemo7 stores7 –nots
select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts, ustlowtsfrom sysindiceswhere tabid = 100 and idxname = “zip_ix” ;
idxname zip_ixlevels 1leaves 1.000000000000nrows 28.00000000000nupdates 0.00ndeletes 0.00ninserts 28.00000000000ustlowts 2012-04-03 22:54:56.00000
> select * from sysdistrib where tabid = 100;
No rows found.
Index on customer(zipcode)
UDI counters for this index at the time of the update statistics low run.
dbaccessdemo7 did not create table distributions for customer table.
User Group Informix France
Example (cont’d)
> load from customer.unl insert into customer;
199863 row(s) loaded.
> select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts, > ustlowts from sysindiceswhere tabid = 100 and idxname = “zip_ix”;
idxname zip_ixlevels 1leaves 1.000000000000nrows 28.00000000000nupdates 0.00ndeletes 0.00ninserts 28.00000000000ustlowts 2012-04-03 22:54:56.00000
Index statistics for zip_ix unchanged after 199,863 rows inserted into the customer table.
-- No update statistics command has been run.
User Group Informix France
Example (cont’d)
idxname zip_ixlevels 1leaves 1.000000000000nrows 28.00000000000nupdates 0.00ndeletes 0.00ninserts 28.00000000000ustlowts 2012-04-03 22:54:56.00000
> create index state_ix on customer(state);
idxname state_ixlevels 3leaves 556.0000000000nrows nupdates 0.00ndeletes 0.00ninserts 0.00ustlowts 2012-04-03 23:04:33.00000
After inserting 199,863 rows into the customer table, create index state_ix on customer(state). -- No update statistics command has been run.
User Group Informix France
Example (cont’d)
tabid 100colno 8mode Hsmplsize 199891.0000000rowssmpld 199891.0000000constr_time 2012-04-03 23:04:33.00000ustnrows 199891.0000000ustbuildduration 0:00:00.00000nupdates 0.00ndeletes 0.00ninserts 199891.0000000
> select tabid, colno, mode, smplsize, rowssmpld, constr_time, > ustnrows, ustbuildduration, nupdates, ndeletes, ninserts > from sysdistrib where tabid = 100;
column state
Distribution information for column state in customer table
User Group Informix France
Example (cont’d)
partnum nupdates ndeletes ninserts zip_ix 1049092 0 0 199891state_ix 1049100 0 0 0
> select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr > where partnum in (select partn from sysfragments > where fragtype = "I" and indexname in ('state_ix', 'zip_ix'));
> select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr> where partnum = (select partnum from systables where tabid = 100);
partnum nupdates ndeletes ninserts customer 1049069 0 0 199891
Actual partition page info, showing the UDI counters for the partition, since the partition was created – this is not the same as the UDI info in the catalogs, which are updated when statistics are updated.
User Group Informix France
OAT view of Statistics
User Group Informix France
OAT view (cont’d)
For customer table --• Index zip_ix has exceeded STATCHANGE.• Index state_ix has not.
User Group Informix France
Example (cont’d)
idxname zip_ixlevels 1leaves 1.000000000000nrows 28.00000000000nupdates 0.00ndeletes 0.00ninserts 28.00000000000ustlowts 2012-04-03 22:54:56.00000
> update statistics low for table customer;
idxname zip_ixlevels 3leaves 505.0000000000nrows 199891.0000000nupdates 0.00ndeletes 0.00ninserts 199891.0000000ustlowts 2012-04-04 00:36:53.00000
• Index statistics updated.• Catalog UDI values updated.• sysindices ustlowts updated.
zip_ix index
BEFORE AFTER
User Group Informix France
Example (cont’d)
idxname state_ixlevels 3leaves 556.0000000000nrows 199891.0000000nupdates 0.00ndeletes 0.00ninserts 0.00ustlowts 2012-04-03 23:04:33.00000
idxname state_ixlevels 3leaves 556.0000000000nrows nupdates 0.00ndeletes 0.00ninserts 0.00ustlowts 2012-04-03 23:04:33.00000
> update statistics low for table customer;
• Index statistics unchanged.• Catalog UDI values unchanged.• sysindices ustlowts unchanged.
BEFORE AFTER
state_ix index
User Group Informix France
Example (cont’d)
> select tabname, tabid, nrows, created, ustlowts > from systables where tabid = 100;
tabname customertabid 100nrows 199891.0000000created 04/03/2012ustlowts 2012-04-04 00:36:53.00000
The systables information is always updated when update statistics for table stats are run, regardless of STATCHANGE.
User Group Informix France
Example
� Before 11.70– You should put “Distributions Only” in the Update
Statistics HIGH command to avoid collecting index statistics again
� After 11.70– Doesn’t matter since index statistics will only be
updated if STATCHANGE has been met for the index
Update Statistics LOW for table tab1;
Update Statistics HIGH for table tab1 (col1, col2);
User Group Informix France
Sysmaster query for %change
SELECT colname as name, 'Column' as type, constr_time::datetime year to second as build_date, rowssmpld::bigint as sample, d.ustnrows::bigint as nrows,case when d.mode = 'M' then 'Medium‘ when d.mode = 'H' then 'High' end as mode,resolution, confidence, ustbuildduration as build_duration,(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) as udi_counter,CASE WHEN d.ustnrows=0 and(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) = 0 THEN 0.00
WHEN d.ustnrows=0 and(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) != 0 THEN -1
ELSE ROUND((table_counter.udi_counter - d.ninserts - d.nupdates –d.ndeletes)/d.ustnrows * 100,2)
END as changeFROM sysdistrib d, syscolumns c, ( select SUM(nupdates + ndeletes + ninserts) as udi_counter from sysmaster:sysptnhdr
where partnum in (select partn from sysfragments where tabid = 100 and fragtype='T'union select partnum as partn from systables where tabid = 100) )as table_counter
WHERE d.tabid=100 and c.tabid=100 and d.colno = c.colno and d.seqno = 1
UNION
User Group Informix France
Sysmaster query for %change
-- Continuing query started on previous slideSELECT idxname as name, MIN('Index') as type, MIN(ustlowts)::datetime year to second as build_date, MIN(0) assample, SUM(f.nrows)::bigint as nrows, MIN('Low') as mode,MIN(0) as resolution, MIN(0) as confidence, SUM(i.ustbuildduration) as build_duration,SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) -SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)) as udi_counter,CASE WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0)+ NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) = 0
THEN 0.00WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0)
+ NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) != 0 THEN -1ELSE ROUND((SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0))
- SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)))/SUM(f.nrows) * 100,2) END as changeFROM sysindices i, sysmaster:sysptnhdr p, sysfragments fWHERE i.idxname = f.indexname
AND i.tabid = 100 AND i.tabid = f.tabid AND f.partn = p.partnumGROUP BY i.idxname ORDER BY change DESC
User Group Informix France
Table STATCHANGE value
� Default STATCHANGE applies if not set for table
� Can be set at session level using set environment – Set environment statchange ‘5’ ;
� Can set STATCHANGE when creating table� Can alter table to set STATCHANGE
– Alter table customer statchange 5;
select tabname, NVL ( statchange, (select cf_effective from sysmaster:sysconfig where cf_name = ‘STATCHANGE’) ) as statchange from systables where tabname = "customer";
User Group Informix France
FORCE option
� Can add “FORCE” to any update statistics command to ignore STATCHANGE
� When you upgrade to 11.70– Existing partition pages will have UDI counters added
(UDI values are 0)– Catalog tables sysfragments (for indexes) and
sysdistrib (for table column data distributions) will have UDI counters added (values are 0)
– What does this mean for Update Statistics?• FORCE � Execute even if NO change• STATCHANGE 0 � Execute if any amount of change (non-
zero)
User Group Informix France
FORCE option (cont’d)
� Add “FORCE” to end of update statistics command to get legacy behavior (ignore STATCHANGE)
� FORCE– Execute even if NO change– Sets sysdistrib nupdates, ndeletes, ninserts to 0 –
same behavior isn’t seen with sysfragments nupdates, ndeletes, ninserts
� STATCHANGE 0– Execute if non-zero amount of change– Set environment STATCHANGE ‘0’
User Group Informix France
Stored Procedures
� Not affected by STATCHANGE -- Update statistics FOR PROCEDURE
� SQL statements in SPL are optimized– When SPL is created or on first execution– When dependent table or indexes are altered– When statistics of dependent tables change
In 11.70, this means every time update statistics is run to update a table, systable’s npused, nrows, and ustlowts are updated (even if index statistics or distribution statistics are not updated due to STATCHANGE not having been met).
User Group Informix France
Update Statistics Low - Summary
� Update statistics low performance improvement feature takes effect when :
• USTLOW_SAMPLE is set to 1 • the index has 100,000 or more leaf pages
• Detached index
� USTLOW_SAMPLE • New ONCONFIG parameter, documented in 11.70.xC4
• Controls use of sampling (new feature) to collect index statistics during update statistics
• 0 or 1 (on) / Default value is 0 (off)• Can be updated with onmode -wm/wf
• Can be set at session-level using SET ENVIRONMENT
– Set Environment USTLOW_SAMPLE '0' / '1' / 'on' / 'off'
User Group Informix France
Update Statistics Low – Why?
� Update Statistics LOW takes too long when gathering statistics for large indexes
• Entire index is read in sequence• Each leaf page of an index must be read individually (separate I/O)
• Some customers do not run the command because it does not fit in the maintenance window
• On a single large table (billions of rows and many indexes), command can take over 3 days
� New Feature Solution: USTLOW_SAMPLE• Use sampling to reduce time required to gather index statistics
• Many samples are taken, and index statistics is calculated based on statistics from the samples
User Group Informix France
Update Statistics Low - Details
� Update statistics low gathers the following index statistics • number of index levels• number of index leaf pages
• number of unique values for index lead key
• clustering factor• 2nd lowest and 2nd highest value for index lead key
� Index statistics saved in database catalog• Sysindices (levels, leaves, nunique, clust)
• Syscolumns (colmin, colmax)
• Sysfragments (levels, clust) for fragtype = “I”
� When Update Statistics Med or High is run, index statistics are also collected, unless “Distributions Only” is used
User Group Informix France
Update Statistics Low – Details (cont’d)
� Instead of reading the entire index in sequence, the new feature:
• Uses sampling• Each sample will go from index root page to index leaf page,
reading one or more index leaf pages• Sampling is “dynamic” -- number of samples is not pre-
determined• Number of samples is determined by the quality of the samples
– Fewer samples needed if data is evenly distributed– More samples needed if data distribution is skewed
– Standard deviation among the samples is used as criteria as a measurement of “quality”
• Time for update statistics is not predictable up-front
User Group Informix France
Update Statistics Low - Example
� Example based on internal traces
User Group Informix France
Update Statistics Low - Example
� Example based on internal traces
User Group Informix France
Update Statistics Low - Notes
� Review of Update statistics feature– 11.70.xC1 “Smart Statistics” Feature Review
• Default: AUTO_STAT_MODE 1 • Default: STATCHANGE 10
• Update Statistics command, when run, is not executed for index statistics and for table distribution if the STATCHANGE threshold has not been met
– Update Statistics info in database catalog tables• Look at ustlowts in systables
– Updated when systables' nrows and npused are updated – this is done whenever update statistics command is run – STATCHANGE threshold is not looked at
• Look at ustlowts in sysindices– Updated when index statistics are rebuilt/updated
• Look at constr_time in sysdistrib– Updated when distribution statistics are rebuilt/updated
� Remember, 11.10 Feature – Statistics are collected when Index is created
User Group Informix France
Catalog for smarter Statistics
systables sysfragments 11.70
statchange nupdates Existing
statlevel ndeletes
ustlowts ninserts
sysindices sysdistrib sysfragdist
nupdates nupdates nupdates
ndeletes ndeletes ndeletes
ninserts ninserts ninserts
ustbuildduration ustbuildduration ustbuildduration
ustlowts constr_time constr_time
User Group Informix France
Questions ?
User Group Informix France
MerciMerci
Olivier [email protected]
Mercredi 3 Octobre 2012Mercredi 3 Octobre 2012