Agenda
● What’s DLM?
● Where is DLM in HA cluster?
● How to configure DLM?
● DLM in userspace
● DLM in kernel
● Interaction between userspace and kernel
2
What is DLM?
4
Distributed Locking Manager is used to coordinate nodes to access shared resource in cluster.
● Node level locking
● Not only for file lock– Protects the whole filesystem
– supports some filesystem semantics
History brief
● 1982: VAX (3.0), developed by Digital Equipment Corporation (DEC)
● 1992: OpenVMS (5.1), acquired by Compaq
● 2001: redhat developed opendlm for GFS after giving up working on IBM’s dlm
● 2005: David Teigland push DLM into mainline
● After 2012: DLM becomes stable and not in heavy development since then
5
Some highlights for DLM
● Availability: recovery ability via keeping a duplicated cluster-wide lock database
● Performance: achieves excellent performance by increasing the likehood of local processing
● Elimination of bottleneck: distributes lock server workload among every nodes in cluster, eliminating the bottlenecks of memory, cpu and network
● kernel implementation: the main users of DLM OCFS2 and GFS2 are kernel filesystems
6
Where is DLM in HA cluster? (1/2)
8
● dlm_controld– an agent deamon for dlm kernel code: membership and user
interface– based on the services provided by corosync, i.e. closed process
group(cpg), quorum and configuration
● Resource agent– Provided by pacemaker to control dlm deamon
● Libdlm– transfers locking operations from ocfs2 tools, clvm,etc, to kernel by
dlm devices
● dlm_tool– Admin and debug
● DLM kernel module– DLM core, used by ocfs2, gfs2, clvm and clusterMD
Some Resources for DLM
● Source code– DLM userspace code:
https://git.fedorahosted.org/cgit/dlm.git
– DLM kernel module: fs/dlm/*
● RPM packages– dlm-kmp-default-4.4.19-60.1.x86_64
– libdlm-4.0.4-15.2.x86_64
– libdlm3-4.0.4-15.2.x86_64
● DLM RA– /usr/lib/ocf/resource.d/pacemaker/controld
10
Configure DLM
0. We assume the pacemaker and sbd are already properly configured.
1. Add DLM resource#crm configure primitive dlm ocf:pacemaker:controld \op start interval=0 timeout=90 \op stop interval=0 timeout=100 \op monitor interval=20 timeout=600
2. Put DLM resource into group#crm configure group base-group dlm
3. Clone the group#crm configure clone base-clone base-group
4. Check#crm status full
12
DLM tools – usage overview
# dlm_tool -hUsage:
dlm_tool [command] [options] [name]
Commands:ls, status, dump, dump_config, fence_acklog_plock, plocksjoin, leave, lockdebug
15
DLM tools – Dump dlm_controld daemon state
# dlm_tool status
cluster nodeid 1084783247 quorate 1 ring seq 28 28
daemon now 348817 fence_pid 0
node 1084783118 M add 3523 rem 0 fail 0 fence 0 at 0 0
node 1084783247 M add 3523 rem 0 fail 0 fence 0 at 0 0
● seq x y
– x: cluster_ringid – ring ID from corosync quorum service
– y: daemon_ringid – to indicate if dlm_controld is in sync with quorum change
● Role
– M: member, U: starting-up node, X:others
16
DLM tools – List lockspace
# dlm_tool lsdlm lockspacesname 2E76BB09DD314581A62C032436F58344id 0xc0dc6d2aflags 0x00000000 change member 1 joined 1 remove 0 failed 0 seq 1,1Members 1084783247
● seq x,y
– x: the most recent completed change sequence number
– y: the most recent change sequence number
17
DLM tools – Dump dlm_controld debug buffer(1/3)
# dlm_tool dump3523 dlm_controld 4.0.4 started3523 our_nodeid 1084783247…3523 cmap totem.cluster_name = 'cluster'3523 set cluster_name cluster…3523 cluster quorum 1 seq 28 nodes 23523 cluster node 1084783118 added seq 283523 set_configfs_node 1084783118 192.168.122.14 local 03523 cluster node 1084783247 added seq 283523 set_configfs_node 1084783247 192.168.122.143 local 13523 cpg_join dlm:controld …...3523 daemon joined 1084783247…3523 daemon joined 1084783118…
18
DLM tools – Dump dlm_controld debug buffer(2/3)
342259 uevent: add@/kernel/dlm/2E76BB09DD314581A62C032436F58344
342259 kernel: add@ 2E76BB09DD314581A62C032436F58344
342259 uevent: online@/kernel/dlm/2E76BB09DD314581A62C032436F58344
342259 kernel: online@ 2E76BB09DD314581A62C032436F58344
342259 2E76BB09DD314581A62C032436F58344 cpg_join dlm:ls:2E76BB09DD314581A62C032436F58344 ...
…
342259 2E76BB09DD314581A62C032436F58344 start_kernel cg 1 member_count 1
342259 write "3235671338" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/id"
342259 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/2E76BB09DD314581A62C032436F58344/nodes/1084783247"
342259 write "1" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"
342259 write "0" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/event_done"
19
DLM tools – Dump dlm_controld debug buffer(3/3)
344456 2E76BB09DD314581A62C032436F58344 add_change cg 2 counts member 2 joined 1 remove 0 failed 0344456 2E76BB09DD314581A62C032436F58344 stop_kernel cg 2344456 write "0" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"…344456 2E76BB09DD314581A62C032436F58344 check_fencing done...344456 2E76BB09DD314581A62C032436F58344 match_change 1084783118:1 matches cg 2344456 2E76BB09DD314581A62C032436F58344 wait_messages cg 2 need 1 of 2344456 2E76BB09DD314581A62C032436F58344 receive_start 1084783247:2 len 80344456 2E76BB09DD314581A62C032436F58344 match_change 1084783247:2 matches cg 2344456 2E76BB09DD314581A62C032436F58344 wait_messages cg 2 got all 2344456 2E76BB09DD314581A62C032436F58344 start_kernel cg 2 member_count 2344456 dir_member 1084783247344456 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/2E76BB09DD314581A62C032436F58344/nodes/1084783118"344456 write "1" to "/sys/kernel/dlm/2E76BB09DD314581A62C032436F58344/control"
20
DLM tools – display of locks from the lockspace(1/3)
#dlm_tool lockdebug 2E76BB09DD314581A62C032436F58344
Resource len 12 "version_lock"Master LVB len 64 seq 101 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Granted00000001 PR…Resource len 31 "P000000000000000000000000000000"Master Granted00000007 EX
21
DLM tools – display of locks from the lockspace(2/3)
● On node 1084783247:
Resource len 31 "M0000000000000000000005b1a11150"Master LVB len 64 seq 905 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 15 f8 96 97 92 f1 32 7b 15 f8 96 95 9a f4 7e ff 15 f8 96 95 9a f4 7e ff 00 00 00 00 00 00 0f 38 41 ed 00 03 00 00 00 00 b1 a1 11 50 00 00 00 00Granted00000011 EX Remote: 1084783118 00000005 00000006 NL
22
DLM tools – display of locks from the lockspace(3/3)
● On node 1084783118
Resource len 31 "M0000000000000000000005b1a11150"
Master:1084783247
Granted
00000005 EX Master: 1084783247 00000011
23
DLM library
DLM library is mainly used by ocfs2-tools, gfs2-utils and clvm code.
[1] Programming Locking Applications:
http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
24
core concepts (1/2)
26
● Lockspace– independent namespaces for different DLM service instance
● Resource block(RSB):– Represents a shared resource– master RSB and copy RSB
● Resource Directory– stores the mapping of resource names to master node id
● Lock block(LKB)– represents a lock request
– LKB mode: NL, PR and EX
– lock queues: granted queue, convert queue and waiting queue
● (Blocking) Asynchronous system trap (AST/BAST):– Support unblock-mode locking
core concepts (2/2)
27
● Lock Status Block (LKSB)– Used by DLM client to communicate lock status with DLM
● Lock value block (LVB):– A 32/64 bit memory block– Always keep up-to-date among lock holders
LVB (2/2)
31
Node Operation type Mode Copying
1 initial request any A → B
2 initial request any A → C, C → E, E → D
1 conversion NL to higher A → B
1 conversion EX to lower B → A
2 conversion NL to higher A → C, C → E, E → D
2 conversion EX to lower D → F, F → C, C → A
1 unlock EX B → A
2 unlock EX D → F, F → C, C → A
DLM operations - Lockspace Operations
32
1. int dlm_new_lockspace(const char *name, const char *cluster,
uint32_t flags, int lvblen,
const struct dlm_lockspace_ops *ops, void *ops_arg,
int *ops_result, dlm_lockspace_t **lockspace);
2. int dlm_release_lockspace(void *lockspace, int force);
DLM operations - acquiring or converting a Lock
33
1. int dlm_lock(dlm_lockspace_t *lockspace, int mode, struct dlm_lksb *lksb, uint32_t flags, void *name, unsigned int namelen, uint32_t parent_lkid, void (*ast) (void *astarg), void *astarg, void (*bast) (void *astarg, int mode));
2. int dlm_unlock(dlm_lockspace_t *lockspace, uint32_t lkid, uint32_t flags, struct dlm_lksb *lksb, void *astarg)
DLM operations - posix lock operations
34
1. int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, int cmd, struct file_lock *fl);
2. int dlm_unlock(dlm_lockspace_t *lockspace, uint32_t lkid, uint32_t flags, struct dlm_lksb *lksb, void *astarg)
3. int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file, struct file_lock *fl)
configfs# tree /sys/kernel/config/dlmdlm└── cluster ├── buffer_size ├── cluster_name ├── comms │ ├── 1084783118 │ │ ├── addr │ │ ├── addr_list │ │ ├── local │ │ └── nodeid │ └── 1084783247 ... ├── log_debug ├── log_info ├── new_rsb_count ├── protocol ├── recover_callbacks ├── recover_timer ├── rsbtbl_size ├── scan_secs ├── spaces │ ├── 2E76BB09DD314581A62C032436F58344 │ │ └── nodes │ │ └── 1084783247 │ │ ├── nodeid │ │ └── weight │ └── clvmd …. ├── tcp_port ├── timewarn_cs ├── toss_secs └── waitwarn_us
37
debugfs
#tree /sys/kernel/debug/dlm//sys/kernel/debug/dlm/├── BC65B6D742274FEDA223B6E605EF962C├── BC65B6D742274FEDA223B6E605EF962C_all├── BC65B6D742274FEDA223B6E605EF962C_locks├── BC65B6D742274FEDA223B6E605EF962C_toss├── BC65B6D742274FEDA223B6E605EF962C_waiters├── clvmd├── clvmd_all├── clvmd_locks├── clvmd_toss└── clvmd_waiters
38
sysfs
# tree /sys/kernel/dlm//sys/kernel/dlm/├── BC65B6D742274FEDA223B6E605EF962C│ ├── control│ ├── event_done│ ├── id│ ├── nodir│ ├── recover_nodeid│ └── recover_status└── clvmd ├── control ├── event_done ├── id ├── nodir ├── recover_nodeid └── recover_status
39
dlm device and udev
#tree /dev/misc//dev/misc/├── dlm_BC65B6D742274FEDA223B6E605EF962C -> ../dlm_BC65B6D742274FEDA223B6E605EF962C├── dlm_clvmd -> ../dlm_clvmd├── dlm-control -> ../dlm-control├── dlm-monitor -> ../dlm-monitor├── dlm_plock -> ../dlm_plock
# cat /usr/lib/udev/rules.d/51-dlm.rulesKERNEL=="dlm-control", MODE="0666", SYMLINK+="misc/dlm-control"KERNEL=="dlm-monitor", MODE="0666", SYMLINK+="misc/dlm-monitor"KERNEL=="dlm_plock", MODE="0666", SYMLINK+="misc/dlm_plock"KERNEL=="dlm_*", MODE="0660", SYMLINK+="misc/%k"
40
Top Related