DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or...

41
DB2 10 Hash Access: Access Path or Collision Course? Donna Di Carlo BMC Software Session Code: B06 Wed, May 16, 2012 9:45 | Platform: DB2 for z/OS - II

Transcript of DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or...

Page 1: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

DB2 10 Hash Access: Access Path or

Collision Course?

Donna Di Carlo

BMC Software

Session Code: B06

Wed, May 16, 2012 9:45 | Platform: DB2 for z/OS - II

Page 2: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Objectives

• An introduction to the new DB2 10 Hash Access

• See how a hashed table and index are physically

organized

• Determine if Hash Access is suitable for your application

• Some rules of thumb on how to define hashed tables and

indexes

• What happens to performance when hashed objects are

defined correctly; and worse, what happens when they

are defined incorrectly

2

Page 3: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

DB2 Access Paths

• DB2 has several different methods to

access data

• DB2 10 has added hash access to its

bag of tricks

• Single row access

• Unique non-updateable key

• Fixed hash space size

3

Page 4: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Organization

PARTITION BY GROWTH

OVERFLOW

HASH AREAOVERFLOW

INDEX

PART 2

OVERFLOW

OVERFLOW

INDEX

PART 1

OVERFLOW

OVERFLOW

HASH AREAOVERFLOW

INDEX

PART 3

PARTITION BY RANGE

OVERFLOW

OVERFLOW

HASH AREA

OVERFLOW

INDEX

DSN1

DSN2

HASH AREA

HASH AREA

4

Page 5: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access

• Create a new table with hash organization

• Need to specify hash space size

• Unique key can be comprised of multiple columns, but the length

must be less than 256

• Entire hash area is allocated, plus 1% - 3% for overflow

• Recommendation

• Set ZPARM IMPDSDEF=YES so the overflow index is instantiated

during the CREATE – avoids penalty for first insert into the overflow

5

Page 6: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access (cont.)

• Alter existing table to hash organization

• Criteria

• Must be UTS

• Must contain a unique non-updatable key

• Must have Reordered Row Format

• Table size should be relatively static

• Can’t be MEMBER CLUSTER

• Can’t have clustering index

6

Page 7: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access (cont.)

• Alter existing table to hash organization (cont.)

• Table space set in AREOR status

• Overflow index is created and set in RBDP status

• Updates and deletes allowed

• Keys cannot be updated

• Inserts allowed after overflow index is rebuilt

• Not a sparse index at this time

• Must REORG table space to instantiate hash access

• REORG cannot be by partition

• REBIND

7

Page 8: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access (cont.)

• REORG

• SHRLEVEL NONE is not supported

• AUTOESTSPACE(YES/NO)

• Estimate the hash space size

• Targets about 8% - 15% overflow

• Catalog value HASHSPACE remains unchanged

• Catalog value HASHDATAPAGES updated with new allocation

• FREEPAGE is ignored

• PCTFREE is ignored for AUTOESTSPACE(NO)

• PCTFREE has a new meaning for AUTOESTSPACE(YES)

• Determines target hash space, then increases it by PCTFREE

• PCTFREE is ignored for the overflow area

8

Page 9: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Organization (cont.)

• REORG (cont.)

• Recommendation

• Even though AUTOESTSPACE(YES)

makes sizing the hash area easy, it’s

not fool proof

• Overflow estimate is less accurate for

varying length rows

• If DSSIZE is not large enough to

accommodate the hash area and

overflow, SQL and utilities can fail for

a PBR

• It’s expensive for overflows when the

hash and overflow are in different data

sets of a PBG

9

Page 10: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access (cont.)

• LOAD

• Does not sort input

• Hashes each row

• FREEPAGE and PCTFREE are ignored

• SLOWWWWWWWW

• Recommendation

• Create non-hashed table

• LOAD

• ALTER ADD HASHED

• REORG

• Instead of creating a new table, you can drop hash

organization on an existing table, but…UH OH!

10

Page 11: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Implementing Hash Access (cont.)

• ALTER TABLE DROP HASH

• Overflow index is immediately dropped

• Table space is placed in REORP status – must REORG

• Cannot recover to a point prior to the DROP

• Recommendation

• When defining the unique key think carefully - changing it requires a

DROP

• If you drop hash altogether, consider adding an index

11

Page 12: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Birds Eye View of Index Access - SELECT

12

Page 13: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashing - SELECT

13

HASH ALGORITHM(KEY)HASH ALGORITHM(KEY)

Page 14: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashing – SELECT (cont.)

• Recommendation

• Tables with multi-level indexes are good candidates for Hash

Access

• Keep the overflow index small – it will increase the odds of finding

the index pages in the buffer pool

14

Page 15: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

A Bird’s Eye View of Hashing - INSERT

• Unique key and hash algorithm determine page number

and page location

• If another row is at this location, row is added to the

collision chain, also known as the Anchor ID Map

• There can be several collisions on a single chain

• An overflow does not occur until the page is full

• The row is stored in the overflow area and the overflow index is

updated

15

Page 16: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

A Bird’s Eye View of Hashing – INSERT (cont.)

16

00

00

01

06

05

04

03

02

06

03

05 01

02

03

00

A

N

C

H

O

R

I

D

M

A

P

00

00

04

01

06

05

04

03

02

06

03

05 01

02

03

00

A

N

C

H

O

R

I

D

M

A

P

Page 17: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

A Bird’s Eye View of Hashing – INSERT (cont.)

• Recommendation

• Tables with high insert activity may not be good candidates for

Hash Access

17

Page 18: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashing - UPDATE

• Unique key and hash algorithm determines page number

and page location

• If row does not exist on page, access the overflow index

• If row does exist on page

• If row length decreased or stayed the same, update in place

• If row length increased, move row to another place on the page

• If row length increased, but no more room on page

• Replace row with a pointer record

• Move indirect reference to overflow

• No need to index indirect reference

18

Page 19: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashing – UPDATE (cont.)

• Recommendation

• Tables with high update activity and varying length rows may not

be good candidates for Hash Activity

• VARCHAR, VARGRAPHIC…

• Compression

19

Page 20: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashing – Range Predicate

20

LIST PREFETCH

Page 21: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Bird’s Eye View of Hashed Data – Range Predicate (cont.)

• Recommendation

• Tables that have queries with range predicates are poor choices

for hashed tables

• What used to be clustered will now be random

• If you must use a query with a range predicate, consider creating

an index to access the data

• You may be able to force index access, but the query may still require

a RID sort

21

Page 22: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Organization

• Header page

• HPGHASHMOD – number of pages in the

Hash Area

• Rounded down to the nearest prime number

• HPGHASHFIXIDMAP – number of hash

Anchor ID Map entries

• More on that later

• HPGLASTHASHPG_P - page number of

last page in hash area

22

HEADER

SPACEMAP

DICTIONARY (reserved)

SYSTEM PAGES (reserved)

HASH AREA

OVERFLOW

Page 23: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Organization (cont.)

• Space Map

• Hashed inserts do not have to use the

space map to determine if a page has room

• Hash Area segment entries are not chained

- SEGNEXT always zero

• Free space bit indicators are still maintained

• During creation, all space maps are

formatted for the Hash Area

• There will always be a dictionary and

system page segments, even if the space is

not compressed or versioned

• If more system pages are needed, extra

segments will be allocated in the overflow

23

HEADER

SPACEMAP

DICTIONARY (reserved)

SYSTEM PAGES (reserved)

HASH AREA

OVERFLOW

Page 24: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Organization (cont.)

• Space Map (cont.)

• Recommendation

• If you are concerned with wasted space, use a small segment size.

24

Page 25: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Data/Overflow Page

• The first x14 bytes

contain header

information

• Page number, free space

management, number of

rows

• The RID map is at the

bottom of the page

• Each RID position

contains an offset to the

row in the page

25

x00

x10

DATA PAGE HEADER

DATA ROWS

0107 06 05 04 03 02

RID MAPRID MAP

Page 26: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Home Page

• PGCOMB contains hash

home page bit

• Hash header

• PGAPBEGIN - pointer to

beginning of data rows

• Anchor ID Map contains

collision chains

• The number of anchors,

HPGHASHFIXIDMAP, is

stored in the header page

• The number of IDs vary

from table to table based on

page and row size

26

40

x00

x30

x20

x10

DATA PAGE HEADER

HASH HEADER

ANCHOR

ID MAP

DATA ROWS

0107 06 05 04 03 02

RID MAP

x40

Page 27: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Non-Hash Data Row • RID Map

• Contains offset of row location

• Can have up to 255 RID

values on one page

• Row header

• PGSFLAGS

• Pointer

• Indirection

• Compressed

• Versioned

• PGSLTH – row length

• PGSOBD - OBID of table

• PGSID - RID map ID or

version number

27

ROW DATA

P

G

S

F

L

A

G

S

P

G

S

L

T

H

P

G

S

O

B

D

P

G

S

I

D

0107 06 05 04 03 02

00B50118017B0052024102A401DE

017B 0063 0003 0300

Page 28: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Row • PGSOBD is not needed for UTS

• One table per table space

• Hash home pages reuse this value to point to the hash anchor ID

map or the next row in the collision chain

• The first bit in the second byte indicates if this is the last entry in

the chain

28

ROW DATA

0107 06 05 04 03 02

00B50118017B0052024102A401DE

+017B 0063 0680 0300

01 08070605040302

03000000 000000000000000000000000

Page 29: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Algorithm

• Key is passed to the hash algorithm

• Returns a double word integer - HASHEDINT

• Page Number = HASHEDINT % HPGHASHMOD

• Anchor ID = HASHEDINT % HPGHASHFIXIDMAP

• If a row already exists at this anchor, the new row is chained

29

Page 30: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Accessing a Hashed Row

0107 06 05 04 03 02

00B50118017B0052024102A401DE

03

01 08070605040302

04000000 000000000000000000000000

ROW DATA+017B 0063 0680 0300

ROW DATA+0052 0063 0300 0400

Anchor ID

Map

RID Map

30

Page 31: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Anchor ID Map and Overflow

• Second byte of anchor ID is updated when overflow

occurs

• It doesn’t necessarily mean there was a collision

31

01 08070605040302

04010000 000000000000000000010000Anchor ID

Map

Page 32: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Real Time Statistics (RTS)

• SYSTABLESPACESTATS.DATASIZE

• Total number of data bytes in the table

• SYSTABLESPACESTATS.TOTALROWS

• Total number of rows in the table

• SYSINDEXSPACESTATS .TOTALENTRIES

• Total number of overflow rows

32

Page 33: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Real Time Statistics (RTS)

• SYSTABLESPACESTATS. HASHLASTUSED

• Last time the table was accessed by hash

• SYSTABLESPACESTATS. REORGHASHACCESS

• Number of hash accesses

• SYSTABLESPACESTATS. REORGSCANACCESS

• Number of table space scans

33

Page 34: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

RUNSTATS

• No new statistics collected

• Columns for sparse index not valid (contains -2)

• SYSIBM.SYSINDEXPART

• LEAFDIST

• FAROFFPOSF

• NEAROFFPOSF

• SYSIBM.SYSCOLUMNS

• COLCARDF

• HIGH2KEY

• SYSIBM.INDEXES

• CLUSTERRATIOF

• LEAFNEAR

• LEAFFAR

34

• LOW2KEY

• LEAFDIST

Page 35: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Explain & Plan Table

• ACCESSTYPE

• H – hash access

• HN – hash access using an IN predicate

• Some SQL may be converted by DB2 to IN predicate

• MH – multiple index scan using the hash overflow index

35

Page 36: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Sizing Your Hash Area

• Delicate balance between storage utilization and

GETPAGE activity

• Slow response and high CPU if more than one GETPAGE is

needed

• How much are you willing to pay for storage to get excellent

response?

• All of this can be measured!

36

Page 37: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Storage Efficiency

37

0%

10%

20%

30%

40%

50%

60%

% Overflow% Unused Hash

Page 38: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash Access GETPAGE Counts

38

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

Index GETPAGE Count

Table GETPAGE Count

Page 39: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Hash vs. Non-hash Access GETPAGE Counts

• Range predicates on a hashed table can be disastrous

• SELECT *

FROM table

WHERE COL1_CHAR8 LIKE 'ABC%';

• Non-hash table space with clustering unique key

• 3 table GETPAGE requests

• 2 index GETPAGE requests

• Hash table space

• 349,114 table GETPAGE requests

• Added index on hash table space with COL1_CHAR8

• 42 table GETPAGE requests

• 7 index GETPAGE requests

39

Page 40: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Summary

• Choose hash candidates carefully

• Tables with mostly random access

• Tables with multi-level unique indexes

• Tables with low insert activity

• Tables with update activity should have static length rows

• Monitor to ensure continued efficiency

• As overflow area grows, so does GETPAGE counts

• RTS TOTALENTRIES

• Use REORG AUTOESTSPACE(YES)

• Beware of table space scans

• RTS REORGSCANACCESS

40

Page 41: DB2 10 Hash Access: Access Path or Collision Course? · DB2 10 Hash Access: Access Path or Collision ... • Tables with high insert activity may not be good ... mean there was a

Donna Di Carlo

BMC Software

[email protected] Session

DB2 10 Hash Access: Access Path or Collision Course?