OpenVMS Distributed Lock Manager Performance
description
Transcript of OpenVMS Distributed Lock Manager Performance
![Page 1: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/1.jpg)
OpenVMS Distributed Lock Manager PerformanceSession ES-09-U
Keith ParrisHPQ
![Page 2: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/2.jpg)
Background
VMS system managers have traditionally looked at performance in 3 areas: CPU Memory I/O
But in VMS clusters, what may appear to be an I/O bottleneck can actually be a lock-related issue
![Page 3: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/3.jpg)
Overview
VMS keeps some lock activity data that no existing performance management tools look at
Locking statistics and lock-related symptoms can provide valuable clues in detecting disk, adapter, or interconnect saturation problems
![Page 4: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/4.jpg)
Overview The VMS Lock Manager does an excellent job under a
wide variety of conditions to optimize locking activity and minimize overhead, but: In clusters with identical nodes running the same
applications, remastering can sometimes happen too often In extremely large clusters, nodes can “gang up” on lock
master nodes and overload them Locking activity can contribute to:
CPU 0 saturation in Interrupt State Spinlock contention (Multi-Processor Synchronization time)
We’ll look at methods of detection, and solutions to, these types of problems
![Page 5: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/5.jpg)
Topics
Available monitoring tools for the Lock Manager
How to map VMS symbolic lock resource names to real physical entities
Lock request latencies How to measure lock rates
![Page 6: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/6.jpg)
Topics
Lock mastership, and why one might care about it
Dynamic lock remastering How to detect and prevent lock mastership
thrashing How to find the lock master node for a given
resource tree How to force lock mastership of a given
resource tree to a specific node
![Page 7: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/7.jpg)
Topics
Lock queues, their causes, and how to detect them
Examples of problem locking scenarios How to measure pent-up remastering
demand
![Page 8: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/8.jpg)
Monitoring tools MONITOR utility
MONITOR LOCK MONITOR DLOCK MONITOR RLOCK (in VMS 7.3 and above; not 7.2-
2) MONITOR CLUSTER MONITOR SCS
SHOW CLUSTER /CONTINUOUS DECamds / Availability Manager DECps (Computer Associates’ Unicenter
Performance Management for OpenVMS, earlier Advise/IT)
![Page 9: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/9.jpg)
Monitoring tools
ANALYZE/SYSTEMNew SHOW LOCK qualifiers for VMS 7.2 and above:
/WAITING Displays only the waiting lock requests (those blocked
by other locks) /SUMMARY
Displays summary data and performance counters
New SHOW RESOURCE qualifier for VMS 7.2 and above: /CONTENTION
Displays resources which are under contention
![Page 10: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/10.jpg)
Monitoring tools ANALYZE/SYSTEM
New SDA extension LCK for lock tracing in VMS 7.2-2 and above
SDA> LCK !Shows help text with command summaryCan display various additional lock manager statistics:
SDA> LCK STATISTIC !Shows lock manager statisticsCan show busiest resource trees by lock activity rate:
SDA> LCK SHOW ACTIVE !Shows lock activityCan trace lock requests:
SDA> LCK LOAD !Load the debug execlet SDA> LCK START TRACE !Start tracing lock requests SDA> LCK STOP TRACE !Stop tracing SDA> LCK SHOW TRACE !Display contents of trace
bufferCan even trigger remaster operations:
SDA> LCK REMASTER !Trigger a remaster operation
![Page 11: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/11.jpg)
Mapping symbolic lock resource names to real entities
Techniques for mapping resource names to lock types Common prefixes:
SYS$ for VMS executiveF11B$ for XQP, file systemRMS$ for Record Management Services
See Appendix H in Alpha V1.5 IDSM or Appendix A in Alpha V7.0 version
![Page 12: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/12.jpg)
Resource names
Example: XQP File Serialization Lock Resource name format is
“F11B$s” {Lock Basis}Parent lock is the Volume Allocation Lock “F11B$v”
{Lock Volume Name}
Calculate File ID from Lock BasisLock Basis is RVN and File Number from File ID
(ignoring Sequence Number), packed into 1 longword
Identify disk volume from parent resource name
![Page 13: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/13.jpg)
Resource names
Identifying file from File IDLook at file headers in Index File to get filespec:
Can use DUMP utility to display file header (from Index File)
$ DUMP /HEADER /IDENTIFIER=(file_id) /BLOCK=COUNT=0 disk:[000000]INDEXF.SYS
Follow directory backlinks to determine directory path See example procedure FILE_ID_TO_NAME.COM
(or use LIB$FID_TO_NAME routine to do all this, if sequence number can be obtained)
![Page 14: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/14.jpg)
Resource names
Example: RMS lock tree for an RMS indexed file: Resource name format is
“RMS$” {File ID} {Flags byte} {Lock Volume Name}
Identify filespec using File ID Flags byte indicates shared or private disk mount Pick up disk volume name
This is label as of time disk was mounted
Sub-locks are used for buckets and records within the file
![Page 15: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/15.jpg)
Internal Structure of an RMS Indexed File
Data Bucket Data Bucket Data Bucket
Level 2 Index Bucket
Data Bucket Data Bucket
Level 2 Index Bucket
Level 1 Index Bucket
Data Bucket Data Bucket
Level 2 Index Bucket
Data Bucket Data Bucket
Level 2 Index Bucket
Level 1 Index Bucket
Root Index Bucket
![Page 16: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/16.jpg)
RMS Data Bucket Contents
Data Bucket
Data Record Data Record
Data Record Data Record
Data Record Data Record
Data Record Data Record
Data Record Data Record
![Page 17: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/17.jpg)
RMS Indexed FileBucket and Record Locks
Sub-locks of RMS File Lock Have to look at Parent lock to identify file
Bucket lock: 4 bytes: VBN of first block of the bucket
Record lock: 8 bytes (6 on VAX): Record File Address
(RFA) of record
![Page 18: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/18.jpg)
Locks and File I/O
Lock requests and data transfers for a typical RMS indexed file I/O(prior to 7.2-1H1):1) Lock & get root index bucket2) Lock & get index buckets for any additional index
levels3) Lock & get data bucket containing record4) Lock record5) For writes: write data bucket containing recordNote: Most data reads may be avoided thanks to
RMS global buffer cache
![Page 19: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/19.jpg)
Locks and File I/O
Since all indexed I/Os access Root Index Bucket, contention on lock for Root Index Bucket of hot file can be a bottleneck
Lookup by Record File Address (RFA) avoids index lookup on 2nd and subsequent accesses to a record
![Page 20: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/20.jpg)
Lock Request Latencies
Latency depends on several things: Directory lookup needed or not
Local or remote directory node
$ENQ or $DEQ operation Local or remote lock master
If remote, type of interconnect
![Page 21: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/21.jpg)
Directory Lookups
This is how VMS finds out which node is the lock master
Only needed for 1st lock request on a particular resource tree on a given node Resource Block (RSB) remembers master node
CSID Basic conceptual algorithm: Hash resource
name and index into lock directory vector, which has been created based on LOCKDIRWT values
![Page 22: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/22.jpg)
Lock Request Latencies
Local requests are fastest Remote requests are significantly
slower: Code path ~20 times longer Interconnect also contributes latency Total latency up to 2 orders of magnitude
higher than local requests
![Page 23: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/23.jpg)
Lock Request LatencyClient process on same node:4-6 microseconds
Lock Master Node
Client
![Page 24: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/24.jpg)
Lock Request LatencyClient across CI star coupler:440 microseconds
Lock Master Client node
StarCoupler
Storage
Client
![Page 25: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/25.jpg)
Lock Request Latencies
4
94120
230270 285
333
440
050
100150200250300350400450500
Latency (micro-seconds)
Local node
Galaxy SMCI
MC 2
Gigabit Ethernet
FDDI GS-FDDI-GS
FDDI GS-ATM-GS
DSSI
CI
![Page 26: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/26.jpg)
How to measure lock rates
VMS keeps counters of lock activity for each resource tree but not for each of the sub-resources
So you can see the lock rate for an RMS indexed file, for example but not for individual buckets or records
within that file SDA extension LCK can trace all lock
requests if needed
![Page 27: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/27.jpg)
Identifying busiest lock trees in the cluster with a program
Measure lock rates based on RSB data: Follow chain of root RSBs from
LCK$GQ_RRSFL listhead via RSB$Q_RRSFL links
Root RSBs contain counters:RSB$W_OACT: Old activity field (average lock rate
per 8 second interval) Divide by 8 to get per-second average
RSB$W_NACT: New activity (locks so far within current 8-second interval)
Transient value, so not as useful
![Page 28: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/28.jpg)
Identifying busiest lock trees in the cluster with a program
Look for non-zero OACT values: Gather resource name, master node CSID,
and old-activity field Do this on each node Summarize data across the cluster See example procedure LOCK_ACTV.COM
and program LCKACT.MAR Or, for VMS 7.2-2 and above:
SDA> LCK SHOW ACTIVE Note: Per-node data, not cluster-wide summary
![Page 29: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/29.jpg)
Lock Activity Program Example
0000002020202020202020203153530200004C71004624534D52 RMS$F.qL...SS1 ... RMS lock tree for file [70,19569,0] on volume SS1 File specification: DISK$SS1:[DATA8]PDATA.IDX;1 Total: 11523 *XYZB12 6455 XYZB11 746 XYZB14 611 XYZB15 602 XYZB23 564 XYZB13 540 XYZB19 532 XYZB16 523 XYZB20 415 XYZB22 284 XYZB18 127 XYZB21 125
* Lock Master Node for the resource
{This is a fairly hot file. Here the lock master node is optimal.}
![Page 30: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/30.jpg)
Lock Activity Program Example
0000002020202032454C494653595302000000D3000C24534D52 RMS$.......SYSFILE2 ... RMS lock tree for file [12,211,0] on volume SYSFILE2 File specification: DISK$SYSFILE2:[SYSFILE2]SYSUAF.DAT;5 Total: 184 XYZB16 75 XYZB20 48 XYZB23 41 XYZB21 16 XYZB19 2 *XYZB15 1 XYZB13 1 XYZB14 0 XYZB12 0
{This reflects user logins, process creations, password changes, and such.Note the poor lock master node selection here (XYZB16 would be optimal).}
![Page 31: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/31.jpg)
Example: Application (re)opens file frequently
Symptom: High lock rate on File Access Arbitration Lock for application data file
Cause: BASIC program re-executing OPEN command for a file; BASIC dutifully closes and then re-opens file
Fix: Modify BASIC program to execute OPEN statement only once at image startup time
![Page 32: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/32.jpg)
Lock Activity Program Example
00000016202020202020202031505041612442313146 F11B$aAPP1 .... Files-11 File Access Arbitration lock for file [22,*,0] on volume APP1 File specification: DISK$APP1:[DATA]XDATA.IDX;1 Total: 50 *XYZB15 8 XYZB21 7 XYZB16 7 XYZB19 6 XYZB20 6 XYZB23 6 XYZB18 5 XYZB13 3 XYZB12 1 XYZB22 1 XYZB14 1
{This shows where the application is apparently opening (or re-opening) thisparticular file 50 times per second.}
![Page 33: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/33.jpg)
Lock Mastership (Resource Mastership) concept
One lock master node is selected by VMS for a given resource tree at a given time
Different resource trees may have different lock master nodes
![Page 34: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/34.jpg)
Lock Mastership (Resource Mastership) concept
Lock master remembers all locks on a given resource tree for the entire cluster
Each node holding locks also remembers the locks it is holding on resources, to allow recovery if lock master node dies
![Page 35: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/35.jpg)
Lock Mastership
Lock mastership node may change for various reasons: Lock master node goes down -- new master
must be elected VMS may move lock mastership to a
“better” node for performance reasonsLOCKDIRWT imbalance found, orActivity-based Dynamic Lock RemasteringLock Master node no longer has interest
![Page 36: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/36.jpg)
Lock Remastering
Circumstances under which remastering occurs, and does not: LOCKDIRWT values
VMS tends to remaster to node with higher LOCKDIRWT values, never to node with lower LOCKDIRWT
Shifting initiated based on activity counters in root RSBPE1 parameter being non-zero can prevent movement
or place threshold on lock tree size
Shift if existing lock master loses interest
![Page 37: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/37.jpg)
Lock Remastering
VMS rules for dynamic remastering decision based on activity levels:
assuming equal LOCKDIRWT values
1) Must meet general threshold of 80 lock requests so far (LCK$GL_SYS_THRSH)
2) New potential master node must have at least 10 more requests per second than current master (LCK$GL_ACT_THRSH)
![Page 38: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/38.jpg)
Lock Remastering
VMS rules for dynamic remastering: 3) Estimated cost to move (based on size of
lock tree) must be less than estimated savings (based on lock rate)except if new master meets criteria (2) for 3
consecutive 8-second intervals, cost is ignored
4) No more than 5 remastering operations can be going on at once on a node (LCK$GL_RM_QUOTA)
![Page 39: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/39.jpg)
Lock Remastering
VMS rules for dynamic remastering: 5) If PE1 on the current master has a
negative value, remastering trees off the node is disabled
6) If PE1 has a positive, non-zero value on the current master, the tree must be smaller than PE1 in size or it will not be remastered
![Page 40: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/40.jpg)
Lock Remastering
Implications of dynamic remastering rules: LOCKDIRWT must be equal for lock activity
levels to control choice of lock master node PE1 can be used to control movement of lock
trees OFF of a node, but not ONTO a node RSB stores lock activity counts, so even high
activity counts can be lost if the last lock is DEQueued on a given node and thus the RSB gets deallocated
![Page 41: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/41.jpg)
Lock Remastering
Implications of dynamic remastering rules: With two or more large CPUs of equal size
running the same application, lock mastership “thrashing” is not uncommon:10 more lock requests per second is not much
of a difference when you may be doing 100s or 1,000s of lock requests per second
Whichever new node becomes lock master may then see its own lock rate slow somewhat due to the remote lock request workload
![Page 42: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/42.jpg)
Lock Remastering
Lock mastership thrashing results in user-visible delays
Lock operations on a tree are stalled during a remaster operation
Locks and Resources were sent over 1 per SCS messageRemastering large lock trees could take a long time
e.g. 10 to 50 seconds for 15K lock tree size, prior to 7.2-2
Improvement in VMS in version 7.2-2 and above gives very significant performance gain
by using 64 Kbyte block data transfers instead of sending 1 SCS message per RSB or LKB
![Page 43: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/43.jpg)
How to Detect Lock Mastership Thrashing
Detection of remastering activity MONITOR RLOCK in 7.3 and above (not 7.2-2) SDA> SHOW LOCK/SUMMARY in 7.2 and above Change of mastership node for a given resource Check message counters under SDA:
SDA> EXAMINE PMS$GL_RM_RBLD_SENTSDA> EXAMINE PMS$GL_RM_RBLD_RCVD
Counts which increase suddenly by a large amount indicate remastering of large tree(s)
SENT: Off of this nodeRCVD: Onto this node
See example procedures WATCH_RBLD.COM and RBLD.COM
![Page 44: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/44.jpg)
How to Prevent Lock Mastership Thrashing
Unbalanced node power Unequal workloads Unequal values of LOCKDIRWT Non-zero values of PE1
![Page 45: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/45.jpg)
How to find the lock master node for a given resource tree
1) Take out a Null lock on the root resource using $ENQ VMS does directory lookup and finds out
master node 2) Use $GETLKI to identify the current lock
master node’s CSID and the lock count If the local node is the lock master, and the
lock count is 1 (i.e. only our NL lock), there’s no interest in the resource now
![Page 46: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/46.jpg)
How to find the lock master node for a given resource tree
3) $DEQ to release the lock 4) Use $GETSYI to translate the CSID to
an SCS Nodename See example procedure
FINDMASTER_FILE.COM and program FINDMASTER.MAR, which can find the lock master node for RMS file resource trees
![Page 47: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/47.jpg)
Controlling Lock Mastership Lock Remastering is a good thing
Maximizes the number of lock requests which are local (and thus fastest) by trying to move lock mastership of a tree to the node with the most activity on that tree
So why would you want to wrest control of lock mastership away from VMS? Spread lock mastership workload more evenly
across nodes to help avoid saturation of any single lock master node
Provide best performance for a specific job by guaranteeing local locking for its files
![Page 48: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/48.jpg)
How to force lock mastership of a resource tree to a specific node
3 ways to induce VMS to move a lock tree:1) Generate a lot of I/Os
For example, run several copies of a program that rapidly accesses the file
2) Generate a lot of lock requestswithout the associated I/O operations
3) Generate the effect of a lot of lock requests without actually doing themby modifying VMS’ data structures
![Page 49: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/49.jpg)
How to force lock mastership of a resource tree to a specific node
We’ll examine: 1) Method using documented features
thus fully supported
2) Method modifying VMS data structures
![Page 50: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/50.jpg)
Controlling Lock Mastership Using Supported Methods
To move a lock tree to a particular node (non-invasive method):
Assume PE1 non-zero on all nodes to start with
1) Set PE1 to 0 on existing lock master node to allow dynamic lock remastering of tree off that node
2) Set PE1 to negative value (or small positive value) on target node to prevent lock tree from moving off of it afterward
![Page 51: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/51.jpg)
Controlling Lock Mastership Using Supported Methods
3) On target node, take out a Null lock on root resource
4) Take out a sub-lock of the parent Null lock, and then repeatedly convert it between Null and some other mode
Check periodically to see if tree has moved yet (using $GETLKI)
5) Once tree has moved, free locks 6) Set PE1 back to original value on former
master node
![Page 52: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/52.jpg)
Controlling Lock Mastership Using Supported Methods
Pros: Uses only supported interfaces to VMS
Cons:Generates significant load on existing lock master, from
which you may have been trying to off-load work. In some cases, node may thus be saturated and unable to initiate lock remastering
Programs running locally on existing lock master can generate so many requests that tree won’t move because you can’t generate nearly as many lock requests remotely
See example program LOTSALOX.MAR
![Page 53: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/53.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
Goal: Reproduce effect of lots of lock requests without the overhead of the lock requests actually occurring
General Method: Modify activity-related counts and remastering-related fields and flags in root RSB to persuade VMS to remaster the resource tree
![Page 54: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/54.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
1) Run program on node which is presently lock master
2) Use $GETSYI to get CSID of desired target node, given nodename
3) Lock down code and data 4) $CMKRNL, raise IPL, grab LCKMGR
spinlock
![Page 55: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/55.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
5) Starting at LCK$GQ_RRSFL listhead, follow chain of root RSBs via RSB$Q_RRSFL links
6) Search for root RSB with matching resource name, access mode, and group (0=System)
![Page 56: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/56.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
7) Set up to trigger remaster operation: Set RSB$L_RM_CSID to target node‘s CSID Set RSB$B_LSTCSID_IDX to low byte of
target node’s CSID Set RSB$B_SAME_CNT to 3 or more so
remastering occurs regardless of cost
![Page 57: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/57.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
Zero our activity counts RSB$W_OACT and RSB$W_NACT so local lock rate seems low
Set new-master activity count RSB$W_NMACT to maximum possible (hex FFFF) to simulate tons of locking activity
Set RSB$M_RM_PEND flag in RSB$L_STATUS field to indicate a remaster operation is now pending
8) Release LCKMGR spinlock, lower IPL, and let VMS do its job
![Page 58: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/58.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
Problem (for all methods): Once PE1 is set to zero to allow the desired lock
tree to migrate, other lock trees may also migrate, unwanted
Solution: To prevent this, in all other resource trees
mastered on this node:Clear RM_PEND flag in L_STATUS if set, and
Set W_OACT and W_NACT to max. (hex FFFF) Zero W_NMACT, L_RM_CSID, B_LSTCSID_IDX, and
B_SAME_CNT
![Page 59: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/59.jpg)
Controlling Lock Mastership By Modifying VMS Data Structures
Pros: Does the job reliablyCan avoid other resource trees “escaping”
Cons:High-IPL code presents some level of risk of
crashing a system
See example program REMASTER.MAR One might instead use (in 7.2-2 & above)
SDA> LCK REMASTER
![Page 60: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/60.jpg)
Causes of lock queues
Program bug (e.g. not freeing a record lock)
I/O or interconnect saturation “Deadman” locks
![Page 61: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/61.jpg)
How to detect lock queues
Using DECamds / Availability Manager Using SDA Using other methods
![Page 62: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/62.jpg)
Lock contention & DECamds
DECamds can identify lock contention if a lock blocks others for 15 seconds
AMDS$LOCK_LOG.LOG file in AMDS$SYSTEM: contains a log of occurrences of suspected contention
Resource name decoding techniques shown earlier can sometimes be used to identify the file involved
Deadman locks can be filtered out
![Page 63: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/63.jpg)
Detecting Lock Queues with ANALYZE/SYSTEM (SDA)
New qualifier added to SHOW RESOURCE command in SDA for 7.2 and above: SHOW RESOURCE/CONTENTION shows blocking
and blocked lock requests
New qualifier was added to SHOW LOCK command in SDA for 7.2 and above: SHOW LOCK/WAITING displays blocked lock
requests (but then you must determine what’s blocking them)
![Page 64: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/64.jpg)
Detecting Lock Queues with a program
Traverse lock database starting with LCK$GQ_RRSFL listhead and following chain of root RSBs via RSB$Q_RRSFL links
Within each resource tree, follow RSB$Q_SRSFL chain to examine all sub-resources, recursively
![Page 65: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/65.jpg)
Detecting Lock Queues with a program
Check the Wait Queue (RSB$Q_WTQFL and RSB$Q_WTQBL)
Check the Convert Queue (RSB$Q_CVTQFL and RSB$Q_CVTQBL)
If queues are found, display: Queue length(s) Resource name Resource names for all parent locks, up to the root lock
See example DCL procedure LCKQUE.COM and program LCKQUE.MAR
![Page 66: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/66.jpg)
Example: Directory File Grows Large
Symptom: High queue length on file serialization lock for .DIR file
Cause: Directory file has grown to over 127 blocks (VMS version 7.1-2 or earlier; 7.2 and later
are much less sensitive to this problem) Fix: Delete or rename files out of
directory
![Page 67: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/67.jpg)
Lock Queue Program ExampleHere are examples where a directory file got very large under 7.1-2:
'F11B$vAPP2 ' 202020202020202032505041762442313146 Files-11 Volume Allocation lock for volume APP2 'F11B$sH...' 00000148732442313146 Files-11 File Serialization lock for file [328,*,0] on volume APP2 File specification: DISK$APP2:[]DATA.DIR;1 Convert queue: 0, Wait queue: 95
'F11B$vLOGFILE ' 2020202020454C4946474F4C762442313146 Files-11 Volume Allocation lock for volume LOGFILE'F11B$s....' 00000A2E732442313146 Files-11 File Serialization lock for file [2606,*,0] on volume LOGFILE File specification: DISK$LOGFILE:[000000]LOGS.DIR;1 Convert queue: 0, Wait queue: 3891
![Page 68: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/68.jpg)
Example: Fragmented File Header
Symptom: High queue length on File Serialization Lock for application data file
Cause: CONVERTs onto disk without sufficient contiguous space resulted in highly-fragmented files, increasing I/O load on disk array. File was so fragmented it had 3 extension file headers
Fix: Defragment disk, or do an /IMAGE Backup/Restore
![Page 69: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/69.jpg)
Lock Queue Program ExampleHere's an example of the result of reorganizing RMS indexed files with$CONVERTs over a weekend without enough contiguous free space available,causing a lot of file fragmentation, and dramatically increasing theI/O load on a RAID array on the next busy day (we had to fix this witha backup/restore cycle soon after). The file shown here had gotten sofragmented as to have 3 extension file headers. The lock we're queueingon here is the file serialization lock for this RMS indexed file:
'F11B$s....' 0000000E732442313146 Files-11 File Serialization lock for file [14,*,0] on volume THDATA File specification: DISK$THDATA:[TH]OT.IDX;1 Convert queue: 0, Wait queue: 28
![Page 70: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/70.jpg)
Future Directions for this Investigation Work
Concern: Locking down remastering with PE1 (to avoid lock mastership thrashing) can result in sub-optimal lock master node selections over time
![Page 71: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/71.jpg)
Future Directions for this Investigation Work
Possible ways of mitigating side-effects of preventing remastering using PE1: Adjust PE1 value as high as you can without producing
noticeable delays Upgrade to 7.2-2 or above for more-efficient remastering Set PE1 to 0 for short periods, periodically Raise fixed threshold values in VMS data cells
LCK$GL_SYS_THRSH and particularly LCK$GL_ACT_THRSH
More-invasive automatic monitoring and control of remastering activity
Enhancements to VMS itself
![Page 72: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/72.jpg)
How to measure pent-up remastering demand
While PE1 is set to prevent remastering, sub-optimal lock mastership may result VMS will “want” to move some lock trees
but cannot See example procedure LCKRM.COM
and program LCKRM.MAR, which measure pent-up remastering demand
![Page 73: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/73.jpg)
How to measure pent-up remastering demand
LCKRM example:
Time: 16:19
----- XYZB12: -----
'RMS$..I....SS1 ...' 000000202020202020202020315353020000084900B424534D52 RMS lock tree for file [180,2121,0] on volume SS1 File specification: DISK$SS1:[PDATA]PDATA.IDX;1 Pent-up demand for remaster operation is pending to node XYZB18 (CSID 00010031) Last CSID Index: 34, Same-count: 0 Average lock rates: Local 44, Remote 512 Status bits: RM_PEND
![Page 74: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/74.jpg)
Interrupt-state/stack saturation
Too much lock mastership workload can saturate primary CPU on a node
See with MONITOR MODES/CPU=0/ALL
![Page 75: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/75.jpg)
Interrupt-state/stack saturation FAST_PATH:
Can shift interrupt-state workload off primary CPU in SMP systems
IO_PREFER_CPUS value of an even number disables CPU 0 use Consider limiting interrupts to a subset of non-primaries
FAST_PATH for CI since 7.0 FAST_PATH for MC “never” FAST_PATH for SCSI and FC is in 7.3 and above FAST_PATH for LANs (e.g. FDDI & Ethernet) slated for 7.3-1 Even with FAST_PATH enabled, CPU 0 still receives the
device interrupt, but hands it off immediately via an inter-processor interrupt
7.3-1 is slated to allow FAST_PATH interrupts to bypass CPU 0 entirely and go directly to a non-primary CPU
![Page 76: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/76.jpg)
Dedicated-CPU Lock Manager
With 7.2-2 and above, you can choose to dedicate a CPU to do lock management work. This may help reduce MP_SYNC time.
LCKMGR_MODE parameter: 0 = Disabled >1 = Enable if at least this many CPUs are running
LCKMGR_CPUID parameter specifies which CPU to dedicate to LCKMGR_SERVER process
![Page 77: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/77.jpg)
Example programs
Programs referenced herein may be found: On the VMS Freeware V5 CD, under directories
[KP_LOCKTOOLS] or [KP_CLUSTERTOOLS] or on the web at:
http://www.openvms.compaq.com/freeware/freeware50/kp_clustertools/ http://www.openvms.compaq.com/freeware/freeware50/kp_locktools/
New additions & corrections may be found at:http://encompasserve.org/~parris/
![Page 78: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/78.jpg)
Example programs
Copies of this presentation (and others) may be found at: http://www.geocities.com/keithparris/
![Page 79: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/79.jpg)
Questions?
![Page 80: OpenVMS Distributed Lock Manager Performance](https://reader035.fdocuments.us/reader035/viewer/2022062217/56815747550346895dc4ea9b/html5/thumbnails/80.jpg)
Speaker Contact Info:
Keith ParrisE-mail: [email protected] [email protected] [email protected]: http://encompasserve.org/~parris/ and http://www.geocities.com/keithparris/