OpenVMS Distributed Lock Manager Performance

80
OpenVMS Distributed Lock Manager Performance Session ES-09-U Keith Parris HPQ

description

OpenVMS Distributed Lock Manager Performance. Session ES-09-U Keith Parris HPQ. Background. VMS system managers have traditionally looked at performance in 3 areas: CPU Memory I/O But in VMS clusters, what may appear to be an I/O bottleneck can actually be a lock-related issue. Overview. - PowerPoint PPT Presentation

Transcript of OpenVMS Distributed Lock Manager Performance

Page 1: OpenVMS Distributed Lock Manager Performance

OpenVMS Distributed Lock Manager PerformanceSession ES-09-U

Keith ParrisHPQ

Page 2: OpenVMS Distributed Lock Manager Performance

Background

VMS system managers have traditionally looked at performance in 3 areas: CPU Memory I/O

But in VMS clusters, what may appear to be an I/O bottleneck can actually be a lock-related issue

Page 3: OpenVMS Distributed Lock Manager Performance

Overview

VMS keeps some lock activity data that no existing performance management tools look at

Locking statistics and lock-related symptoms can provide valuable clues in detecting disk, adapter, or interconnect saturation problems

Page 4: OpenVMS Distributed Lock Manager Performance

Overview The VMS Lock Manager does an excellent job under a

wide variety of conditions to optimize locking activity and minimize overhead, but: In clusters with identical nodes running the same

applications, remastering can sometimes happen too often In extremely large clusters, nodes can “gang up” on lock

master nodes and overload them Locking activity can contribute to:

CPU 0 saturation in Interrupt State Spinlock contention (Multi-Processor Synchronization time)

We’ll look at methods of detection, and solutions to, these types of problems

Page 5: OpenVMS Distributed Lock Manager Performance

Topics

Available monitoring tools for the Lock Manager

How to map VMS symbolic lock resource names to real physical entities

Lock request latencies How to measure lock rates

Page 6: OpenVMS Distributed Lock Manager Performance

Topics

Lock mastership, and why one might care about it

Dynamic lock remastering How to detect and prevent lock mastership

thrashing How to find the lock master node for a given

resource tree How to force lock mastership of a given

resource tree to a specific node

Page 7: OpenVMS Distributed Lock Manager Performance

Topics

Lock queues, their causes, and how to detect them

Examples of problem locking scenarios How to measure pent-up remastering

demand

Page 8: OpenVMS Distributed Lock Manager Performance

Monitoring tools MONITOR utility

MONITOR LOCK MONITOR DLOCK MONITOR RLOCK (in VMS 7.3 and above; not 7.2-

2) MONITOR CLUSTER MONITOR SCS

SHOW CLUSTER /CONTINUOUS DECamds / Availability Manager DECps (Computer Associates’ Unicenter

Performance Management for OpenVMS, earlier Advise/IT)

Page 9: OpenVMS Distributed Lock Manager Performance

Monitoring tools

ANALYZE/SYSTEMNew SHOW LOCK qualifiers for VMS 7.2 and above:

/WAITING Displays only the waiting lock requests (those blocked

by other locks) /SUMMARY

Displays summary data and performance counters

New SHOW RESOURCE qualifier for VMS 7.2 and above: /CONTENTION

Displays resources which are under contention

Page 10: OpenVMS Distributed Lock Manager Performance

Monitoring tools ANALYZE/SYSTEM

New SDA extension LCK for lock tracing in VMS 7.2-2 and above

SDA> LCK !Shows help text with command summaryCan display various additional lock manager statistics:

SDA> LCK STATISTIC !Shows lock manager statisticsCan show busiest resource trees by lock activity rate:

SDA> LCK SHOW ACTIVE !Shows lock activityCan trace lock requests:

SDA> LCK LOAD !Load the debug execlet SDA> LCK START TRACE !Start tracing lock requests SDA> LCK STOP TRACE !Stop tracing SDA> LCK SHOW TRACE !Display contents of trace

bufferCan even trigger remaster operations:

SDA> LCK REMASTER !Trigger a remaster operation

Page 11: OpenVMS Distributed Lock Manager Performance

Mapping symbolic lock resource names to real entities

Techniques for mapping resource names to lock types Common prefixes:

SYS$ for VMS executiveF11B$ for XQP, file systemRMS$ for Record Management Services

See Appendix H in Alpha V1.5 IDSM or Appendix A in Alpha V7.0 version

Page 12: OpenVMS Distributed Lock Manager Performance

Resource names

Example: XQP File Serialization Lock Resource name format is

“F11B$s” {Lock Basis}Parent lock is the Volume Allocation Lock “F11B$v”

{Lock Volume Name}

Calculate File ID from Lock BasisLock Basis is RVN and File Number from File ID

(ignoring Sequence Number), packed into 1 longword

Identify disk volume from parent resource name

Page 13: OpenVMS Distributed Lock Manager Performance

Resource names

Identifying file from File IDLook at file headers in Index File to get filespec:

Can use DUMP utility to display file header (from Index File)

$ DUMP /HEADER /IDENTIFIER=(file_id) /BLOCK=COUNT=0 disk:[000000]INDEXF.SYS

Follow directory backlinks to determine directory path See example procedure FILE_ID_TO_NAME.COM

(or use LIB$FID_TO_NAME routine to do all this, if sequence number can be obtained)

Page 14: OpenVMS Distributed Lock Manager Performance

Resource names

Example: RMS lock tree for an RMS indexed file: Resource name format is

“RMS$” {File ID} {Flags byte} {Lock Volume Name}

Identify filespec using File ID Flags byte indicates shared or private disk mount Pick up disk volume name

This is label as of time disk was mounted

Sub-locks are used for buckets and records within the file

Page 15: OpenVMS Distributed Lock Manager Performance

Internal Structure of an RMS Indexed File

Data Bucket Data Bucket Data Bucket

Level 2 Index Bucket

Data Bucket Data Bucket

Level 2 Index Bucket

Level 1 Index Bucket

Data Bucket Data Bucket

Level 2 Index Bucket

Data Bucket Data Bucket

Level 2 Index Bucket

Level 1 Index Bucket

Root Index Bucket

Page 16: OpenVMS Distributed Lock Manager Performance

RMS Data Bucket Contents

Data Bucket

Data Record Data Record

Data Record Data Record

Data Record Data Record

Data Record Data Record

Data Record Data Record

Page 17: OpenVMS Distributed Lock Manager Performance

RMS Indexed FileBucket and Record Locks

Sub-locks of RMS File Lock Have to look at Parent lock to identify file

Bucket lock: 4 bytes: VBN of first block of the bucket

Record lock: 8 bytes (6 on VAX): Record File Address

(RFA) of record

Page 18: OpenVMS Distributed Lock Manager Performance

Locks and File I/O

Lock requests and data transfers for a typical RMS indexed file I/O(prior to 7.2-1H1):1) Lock & get root index bucket2) Lock & get index buckets for any additional index

levels3) Lock & get data bucket containing record4) Lock record5) For writes: write data bucket containing recordNote: Most data reads may be avoided thanks to

RMS global buffer cache

Page 19: OpenVMS Distributed Lock Manager Performance

Locks and File I/O

Since all indexed I/Os access Root Index Bucket, contention on lock for Root Index Bucket of hot file can be a bottleneck

Lookup by Record File Address (RFA) avoids index lookup on 2nd and subsequent accesses to a record

Page 20: OpenVMS Distributed Lock Manager Performance

Lock Request Latencies

Latency depends on several things: Directory lookup needed or not

Local or remote directory node

$ENQ or $DEQ operation Local or remote lock master

If remote, type of interconnect

Page 21: OpenVMS Distributed Lock Manager Performance

Directory Lookups

This is how VMS finds out which node is the lock master

Only needed for 1st lock request on a particular resource tree on a given node Resource Block (RSB) remembers master node

CSID Basic conceptual algorithm: Hash resource

name and index into lock directory vector, which has been created based on LOCKDIRWT values

Page 22: OpenVMS Distributed Lock Manager Performance

Lock Request Latencies

Local requests are fastest Remote requests are significantly

slower: Code path ~20 times longer Interconnect also contributes latency Total latency up to 2 orders of magnitude

higher than local requests

Page 23: OpenVMS Distributed Lock Manager Performance

Lock Request LatencyClient process on same node:4-6 microseconds

Lock Master Node

Client

Page 24: OpenVMS Distributed Lock Manager Performance

Lock Request LatencyClient across CI star coupler:440 microseconds

Lock Master Client node

StarCoupler

Storage

Client

Page 25: OpenVMS Distributed Lock Manager Performance

Lock Request Latencies

4

94120

230270 285

333

440

050

100150200250300350400450500

Latency (micro-seconds)

Local node

Galaxy SMCI

MC 2

Gigabit Ethernet

FDDI GS-FDDI-GS

FDDI GS-ATM-GS

DSSI

CI

Page 26: OpenVMS Distributed Lock Manager Performance

How to measure lock rates

VMS keeps counters of lock activity for each resource tree but not for each of the sub-resources

So you can see the lock rate for an RMS indexed file, for example but not for individual buckets or records

within that file SDA extension LCK can trace all lock

requests if needed

Page 27: OpenVMS Distributed Lock Manager Performance

Identifying busiest lock trees in the cluster with a program

Measure lock rates based on RSB data: Follow chain of root RSBs from

LCK$GQ_RRSFL listhead via RSB$Q_RRSFL links

Root RSBs contain counters:RSB$W_OACT: Old activity field (average lock rate

per 8 second interval) Divide by 8 to get per-second average

RSB$W_NACT: New activity (locks so far within current 8-second interval)

Transient value, so not as useful

Page 28: OpenVMS Distributed Lock Manager Performance

Identifying busiest lock trees in the cluster with a program

Look for non-zero OACT values: Gather resource name, master node CSID,

and old-activity field Do this on each node Summarize data across the cluster See example procedure LOCK_ACTV.COM

and program LCKACT.MAR Or, for VMS 7.2-2 and above:

SDA> LCK SHOW ACTIVE Note: Per-node data, not cluster-wide summary

Page 29: OpenVMS Distributed Lock Manager Performance

Lock Activity Program Example

0000002020202020202020203153530200004C71004624534D52 RMS$F.qL...SS1 ... RMS lock tree for file [70,19569,0] on volume SS1 File specification: DISK$SS1:[DATA8]PDATA.IDX;1 Total: 11523 *XYZB12 6455 XYZB11 746 XYZB14 611 XYZB15 602 XYZB23 564 XYZB13 540 XYZB19 532 XYZB16 523 XYZB20 415 XYZB22 284 XYZB18 127 XYZB21 125

* Lock Master Node for the resource

{This is a fairly hot file. Here the lock master node is optimal.}

Page 30: OpenVMS Distributed Lock Manager Performance

Lock Activity Program Example

0000002020202032454C494653595302000000D3000C24534D52 RMS$.......SYSFILE2 ... RMS lock tree for file [12,211,0] on volume SYSFILE2 File specification: DISK$SYSFILE2:[SYSFILE2]SYSUAF.DAT;5 Total: 184 XYZB16 75 XYZB20 48 XYZB23 41 XYZB21 16 XYZB19 2 *XYZB15 1 XYZB13 1 XYZB14 0 XYZB12 0

{This reflects user logins, process creations, password changes, and such.Note the poor lock master node selection here (XYZB16 would be optimal).}

Page 31: OpenVMS Distributed Lock Manager Performance

Example: Application (re)opens file frequently

Symptom: High lock rate on File Access Arbitration Lock for application data file

Cause: BASIC program re-executing OPEN command for a file; BASIC dutifully closes and then re-opens file

Fix: Modify BASIC program to execute OPEN statement only once at image startup time

Page 32: OpenVMS Distributed Lock Manager Performance

Lock Activity Program Example

00000016202020202020202031505041612442313146 F11B$aAPP1 .... Files-11 File Access Arbitration lock for file [22,*,0] on volume APP1 File specification: DISK$APP1:[DATA]XDATA.IDX;1 Total: 50 *XYZB15 8 XYZB21 7 XYZB16 7 XYZB19 6 XYZB20 6 XYZB23 6 XYZB18 5 XYZB13 3 XYZB12 1 XYZB22 1 XYZB14 1

{This shows where the application is apparently opening (or re-opening) thisparticular file 50 times per second.}

Page 33: OpenVMS Distributed Lock Manager Performance

Lock Mastership (Resource Mastership) concept

One lock master node is selected by VMS for a given resource tree at a given time

Different resource trees may have different lock master nodes

Page 34: OpenVMS Distributed Lock Manager Performance

Lock Mastership (Resource Mastership) concept

Lock master remembers all locks on a given resource tree for the entire cluster

Each node holding locks also remembers the locks it is holding on resources, to allow recovery if lock master node dies

Page 35: OpenVMS Distributed Lock Manager Performance

Lock Mastership

Lock mastership node may change for various reasons: Lock master node goes down -- new master

must be elected VMS may move lock mastership to a

“better” node for performance reasonsLOCKDIRWT imbalance found, orActivity-based Dynamic Lock RemasteringLock Master node no longer has interest

Page 36: OpenVMS Distributed Lock Manager Performance

Lock Remastering

Circumstances under which remastering occurs, and does not: LOCKDIRWT values

VMS tends to remaster to node with higher LOCKDIRWT values, never to node with lower LOCKDIRWT

Shifting initiated based on activity counters in root RSBPE1 parameter being non-zero can prevent movement

or place threshold on lock tree size

Shift if existing lock master loses interest

Page 37: OpenVMS Distributed Lock Manager Performance

Lock Remastering

VMS rules for dynamic remastering decision based on activity levels:

assuming equal LOCKDIRWT values

1) Must meet general threshold of 80 lock requests so far (LCK$GL_SYS_THRSH)

2) New potential master node must have at least 10 more requests per second than current master (LCK$GL_ACT_THRSH)

Page 38: OpenVMS Distributed Lock Manager Performance

Lock Remastering

VMS rules for dynamic remastering: 3) Estimated cost to move (based on size of

lock tree) must be less than estimated savings (based on lock rate)except if new master meets criteria (2) for 3

consecutive 8-second intervals, cost is ignored

4) No more than 5 remastering operations can be going on at once on a node (LCK$GL_RM_QUOTA)

Page 39: OpenVMS Distributed Lock Manager Performance

Lock Remastering

VMS rules for dynamic remastering: 5) If PE1 on the current master has a

negative value, remastering trees off the node is disabled

6) If PE1 has a positive, non-zero value on the current master, the tree must be smaller than PE1 in size or it will not be remastered

Page 40: OpenVMS Distributed Lock Manager Performance

Lock Remastering

Implications of dynamic remastering rules: LOCKDIRWT must be equal for lock activity

levels to control choice of lock master node PE1 can be used to control movement of lock

trees OFF of a node, but not ONTO a node RSB stores lock activity counts, so even high

activity counts can be lost if the last lock is DEQueued on a given node and thus the RSB gets deallocated

Page 41: OpenVMS Distributed Lock Manager Performance

Lock Remastering

Implications of dynamic remastering rules: With two or more large CPUs of equal size

running the same application, lock mastership “thrashing” is not uncommon:10 more lock requests per second is not much

of a difference when you may be doing 100s or 1,000s of lock requests per second

Whichever new node becomes lock master may then see its own lock rate slow somewhat due to the remote lock request workload

Page 42: OpenVMS Distributed Lock Manager Performance

Lock Remastering

Lock mastership thrashing results in user-visible delays

Lock operations on a tree are stalled during a remaster operation

Locks and Resources were sent over 1 per SCS messageRemastering large lock trees could take a long time

e.g. 10 to 50 seconds for 15K lock tree size, prior to 7.2-2

Improvement in VMS in version 7.2-2 and above gives very significant performance gain

by using 64 Kbyte block data transfers instead of sending 1 SCS message per RSB or LKB

Page 43: OpenVMS Distributed Lock Manager Performance

How to Detect Lock Mastership Thrashing

Detection of remastering activity MONITOR RLOCK in 7.3 and above (not 7.2-2) SDA> SHOW LOCK/SUMMARY in 7.2 and above Change of mastership node for a given resource Check message counters under SDA:

SDA> EXAMINE PMS$GL_RM_RBLD_SENTSDA> EXAMINE PMS$GL_RM_RBLD_RCVD

Counts which increase suddenly by a large amount indicate remastering of large tree(s)

SENT: Off of this nodeRCVD: Onto this node

See example procedures WATCH_RBLD.COM and RBLD.COM

Page 44: OpenVMS Distributed Lock Manager Performance

How to Prevent Lock Mastership Thrashing

Unbalanced node power Unequal workloads Unequal values of LOCKDIRWT Non-zero values of PE1

Page 45: OpenVMS Distributed Lock Manager Performance

How to find the lock master node for a given resource tree

1) Take out a Null lock on the root resource using $ENQ VMS does directory lookup and finds out

master node 2) Use $GETLKI to identify the current lock

master node’s CSID and the lock count If the local node is the lock master, and the

lock count is 1 (i.e. only our NL lock), there’s no interest in the resource now

Page 46: OpenVMS Distributed Lock Manager Performance

How to find the lock master node for a given resource tree

3) $DEQ to release the lock 4) Use $GETSYI to translate the CSID to

an SCS Nodename See example procedure

FINDMASTER_FILE.COM and program FINDMASTER.MAR, which can find the lock master node for RMS file resource trees

Page 47: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership Lock Remastering is a good thing

Maximizes the number of lock requests which are local (and thus fastest) by trying to move lock mastership of a tree to the node with the most activity on that tree

So why would you want to wrest control of lock mastership away from VMS? Spread lock mastership workload more evenly

across nodes to help avoid saturation of any single lock master node

Provide best performance for a specific job by guaranteeing local locking for its files

Page 48: OpenVMS Distributed Lock Manager Performance

How to force lock mastership of a resource tree to a specific node

3 ways to induce VMS to move a lock tree:1) Generate a lot of I/Os

For example, run several copies of a program that rapidly accesses the file

2) Generate a lot of lock requestswithout the associated I/O operations

3) Generate the effect of a lot of lock requests without actually doing themby modifying VMS’ data structures

Page 49: OpenVMS Distributed Lock Manager Performance

How to force lock mastership of a resource tree to a specific node

We’ll examine: 1) Method using documented features

thus fully supported

2) Method modifying VMS data structures

Page 50: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership Using Supported Methods

To move a lock tree to a particular node (non-invasive method):

Assume PE1 non-zero on all nodes to start with

1) Set PE1 to 0 on existing lock master node to allow dynamic lock remastering of tree off that node

2) Set PE1 to negative value (or small positive value) on target node to prevent lock tree from moving off of it afterward

Page 51: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership Using Supported Methods

3) On target node, take out a Null lock on root resource

4) Take out a sub-lock of the parent Null lock, and then repeatedly convert it between Null and some other mode

Check periodically to see if tree has moved yet (using $GETLKI)

5) Once tree has moved, free locks 6) Set PE1 back to original value on former

master node

Page 52: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership Using Supported Methods

Pros: Uses only supported interfaces to VMS

Cons:Generates significant load on existing lock master, from

which you may have been trying to off-load work. In some cases, node may thus be saturated and unable to initiate lock remastering

Programs running locally on existing lock master can generate so many requests that tree won’t move because you can’t generate nearly as many lock requests remotely

See example program LOTSALOX.MAR

Page 53: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

Goal: Reproduce effect of lots of lock requests without the overhead of the lock requests actually occurring

General Method: Modify activity-related counts and remastering-related fields and flags in root RSB to persuade VMS to remaster the resource tree

Page 54: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

1) Run program on node which is presently lock master

2) Use $GETSYI to get CSID of desired target node, given nodename

3) Lock down code and data 4) $CMKRNL, raise IPL, grab LCKMGR

spinlock

Page 55: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

5) Starting at LCK$GQ_RRSFL listhead, follow chain of root RSBs via RSB$Q_RRSFL links

6) Search for root RSB with matching resource name, access mode, and group (0=System)

Page 56: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

7) Set up to trigger remaster operation: Set RSB$L_RM_CSID to target node‘s CSID Set RSB$B_LSTCSID_IDX to low byte of

target node’s CSID Set RSB$B_SAME_CNT to 3 or more so

remastering occurs regardless of cost

Page 57: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

Zero our activity counts RSB$W_OACT and RSB$W_NACT so local lock rate seems low

Set new-master activity count RSB$W_NMACT to maximum possible (hex FFFF) to simulate tons of locking activity

Set RSB$M_RM_PEND flag in RSB$L_STATUS field to indicate a remaster operation is now pending

8) Release LCKMGR spinlock, lower IPL, and let VMS do its job

Page 58: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

Problem (for all methods): Once PE1 is set to zero to allow the desired lock

tree to migrate, other lock trees may also migrate, unwanted

Solution: To prevent this, in all other resource trees

mastered on this node:Clear RM_PEND flag in L_STATUS if set, and

Set W_OACT and W_NACT to max. (hex FFFF) Zero W_NMACT, L_RM_CSID, B_LSTCSID_IDX, and

B_SAME_CNT

Page 59: OpenVMS Distributed Lock Manager Performance

Controlling Lock Mastership By Modifying VMS Data Structures

Pros: Does the job reliablyCan avoid other resource trees “escaping”

Cons:High-IPL code presents some level of risk of

crashing a system

See example program REMASTER.MAR One might instead use (in 7.2-2 & above)

SDA> LCK REMASTER

Page 60: OpenVMS Distributed Lock Manager Performance

Causes of lock queues

Program bug (e.g. not freeing a record lock)

I/O or interconnect saturation “Deadman” locks

Page 61: OpenVMS Distributed Lock Manager Performance

How to detect lock queues

Using DECamds / Availability Manager Using SDA Using other methods

Page 62: OpenVMS Distributed Lock Manager Performance

Lock contention & DECamds

DECamds can identify lock contention if a lock blocks others for 15 seconds

AMDS$LOCK_LOG.LOG file in AMDS$SYSTEM: contains a log of occurrences of suspected contention

Resource name decoding techniques shown earlier can sometimes be used to identify the file involved

Deadman locks can be filtered out

Page 63: OpenVMS Distributed Lock Manager Performance

Detecting Lock Queues with ANALYZE/SYSTEM (SDA)

New qualifier added to SHOW RESOURCE command in SDA for 7.2 and above: SHOW RESOURCE/CONTENTION shows blocking

and blocked lock requests

New qualifier was added to SHOW LOCK command in SDA for 7.2 and above: SHOW LOCK/WAITING displays blocked lock

requests (but then you must determine what’s blocking them)

Page 64: OpenVMS Distributed Lock Manager Performance

Detecting Lock Queues with a program

Traverse lock database starting with LCK$GQ_RRSFL listhead and following chain of root RSBs via RSB$Q_RRSFL links

Within each resource tree, follow RSB$Q_SRSFL chain to examine all sub-resources, recursively

Page 65: OpenVMS Distributed Lock Manager Performance

Detecting Lock Queues with a program

Check the Wait Queue (RSB$Q_WTQFL and RSB$Q_WTQBL)

Check the Convert Queue (RSB$Q_CVTQFL and RSB$Q_CVTQBL)

If queues are found, display: Queue length(s) Resource name Resource names for all parent locks, up to the root lock

See example DCL procedure LCKQUE.COM and program LCKQUE.MAR

Page 66: OpenVMS Distributed Lock Manager Performance

Example: Directory File Grows Large

Symptom: High queue length on file serialization lock for .DIR file

Cause: Directory file has grown to over 127 blocks (VMS version 7.1-2 or earlier; 7.2 and later

are much less sensitive to this problem) Fix: Delete or rename files out of

directory

Page 67: OpenVMS Distributed Lock Manager Performance

Lock Queue Program ExampleHere are examples where a directory file got very large under 7.1-2:

'F11B$vAPP2 ' 202020202020202032505041762442313146 Files-11 Volume Allocation lock for volume APP2 'F11B$sH...' 00000148732442313146 Files-11 File Serialization lock for file [328,*,0] on volume APP2 File specification: DISK$APP2:[]DATA.DIR;1 Convert queue: 0, Wait queue: 95

'F11B$vLOGFILE ' 2020202020454C4946474F4C762442313146 Files-11 Volume Allocation lock for volume LOGFILE'F11B$s....' 00000A2E732442313146 Files-11 File Serialization lock for file [2606,*,0] on volume LOGFILE File specification: DISK$LOGFILE:[000000]LOGS.DIR;1 Convert queue: 0, Wait queue: 3891

Page 68: OpenVMS Distributed Lock Manager Performance

Example: Fragmented File Header

Symptom: High queue length on File Serialization Lock for application data file

Cause: CONVERTs onto disk without sufficient contiguous space resulted in highly-fragmented files, increasing I/O load on disk array. File was so fragmented it had 3 extension file headers

Fix: Defragment disk, or do an /IMAGE Backup/Restore

Page 69: OpenVMS Distributed Lock Manager Performance

Lock Queue Program ExampleHere's an example of the result of reorganizing RMS indexed files with$CONVERTs over a weekend without enough contiguous free space available,causing a lot of file fragmentation, and dramatically increasing theI/O load on a RAID array on the next busy day (we had to fix this witha backup/restore cycle soon after). The file shown here had gotten sofragmented as to have 3 extension file headers. The lock we're queueingon here is the file serialization lock for this RMS indexed file:

'F11B$s....' 0000000E732442313146 Files-11 File Serialization lock for file [14,*,0] on volume THDATA File specification: DISK$THDATA:[TH]OT.IDX;1 Convert queue: 0, Wait queue: 28

Page 70: OpenVMS Distributed Lock Manager Performance

Future Directions for this Investigation Work

Concern: Locking down remastering with PE1 (to avoid lock mastership thrashing) can result in sub-optimal lock master node selections over time

Page 71: OpenVMS Distributed Lock Manager Performance

Future Directions for this Investigation Work

Possible ways of mitigating side-effects of preventing remastering using PE1: Adjust PE1 value as high as you can without producing

noticeable delays Upgrade to 7.2-2 or above for more-efficient remastering Set PE1 to 0 for short periods, periodically Raise fixed threshold values in VMS data cells

LCK$GL_SYS_THRSH and particularly LCK$GL_ACT_THRSH

More-invasive automatic monitoring and control of remastering activity

Enhancements to VMS itself

Page 72: OpenVMS Distributed Lock Manager Performance

How to measure pent-up remastering demand

While PE1 is set to prevent remastering, sub-optimal lock mastership may result VMS will “want” to move some lock trees

but cannot See example procedure LCKRM.COM

and program LCKRM.MAR, which measure pent-up remastering demand

Page 73: OpenVMS Distributed Lock Manager Performance

How to measure pent-up remastering demand

LCKRM example:

Time: 16:19

----- XYZB12: -----

'RMS$..I....SS1 ...' 000000202020202020202020315353020000084900B424534D52 RMS lock tree for file [180,2121,0] on volume SS1 File specification: DISK$SS1:[PDATA]PDATA.IDX;1 Pent-up demand for remaster operation is pending to node XYZB18 (CSID 00010031) Last CSID Index: 34, Same-count: 0 Average lock rates: Local 44, Remote 512 Status bits: RM_PEND

Page 74: OpenVMS Distributed Lock Manager Performance

Interrupt-state/stack saturation

Too much lock mastership workload can saturate primary CPU on a node

See with MONITOR MODES/CPU=0/ALL

Page 75: OpenVMS Distributed Lock Manager Performance

Interrupt-state/stack saturation FAST_PATH:

Can shift interrupt-state workload off primary CPU in SMP systems

IO_PREFER_CPUS value of an even number disables CPU 0 use Consider limiting interrupts to a subset of non-primaries

FAST_PATH for CI since 7.0 FAST_PATH for MC “never” FAST_PATH for SCSI and FC is in 7.3 and above FAST_PATH for LANs (e.g. FDDI & Ethernet) slated for 7.3-1 Even with FAST_PATH enabled, CPU 0 still receives the

device interrupt, but hands it off immediately via an inter-processor interrupt

7.3-1 is slated to allow FAST_PATH interrupts to bypass CPU 0 entirely and go directly to a non-primary CPU

Page 76: OpenVMS Distributed Lock Manager Performance

Dedicated-CPU Lock Manager

With 7.2-2 and above, you can choose to dedicate a CPU to do lock management work. This may help reduce MP_SYNC time.

LCKMGR_MODE parameter: 0 = Disabled >1 = Enable if at least this many CPUs are running

LCKMGR_CPUID parameter specifies which CPU to dedicate to LCKMGR_SERVER process

Page 77: OpenVMS Distributed Lock Manager Performance

Example programs

Programs referenced herein may be found: On the VMS Freeware V5 CD, under directories

[KP_LOCKTOOLS] or [KP_CLUSTERTOOLS] or on the web at:

http://www.openvms.compaq.com/freeware/freeware50/kp_clustertools/ http://www.openvms.compaq.com/freeware/freeware50/kp_locktools/

New additions & corrections may be found at:http://encompasserve.org/~parris/

Page 78: OpenVMS Distributed Lock Manager Performance

Example programs

Copies of this presentation (and others) may be found at: http://www.geocities.com/keithparris/

Page 79: OpenVMS Distributed Lock Manager Performance

Questions?

Page 80: OpenVMS Distributed Lock Manager Performance

Speaker Contact Info:

Keith ParrisE-mail: [email protected] [email protected] [email protected]: http://encompasserve.org/~parris/ and http://www.geocities.com/keithparris/