Post on 06-Jan-2016
7/17/2019 Slurm Migration
1/19
SLURM Migration Experienceat
Lawrence Berkeley National Laboratory
Presented by
Jacqueline Scoggins
System Analysts, HPCS Group
Slurm Conference September 22-23, 2014
Lugano, Switzerland
7/17/2019 Slurm Migration
2/19
Who are we?
National Laboratory ringing Science
Solution! to t"e world
7/17/2019 Slurm Migration
3/19
HPCS ro!p
High Per"or#ance Co#p!ting Ser$ice%
# $e began pro%iding &'CS in 2003 !upporting 10departmental clu!ter!(
# )oday we manage t"roug"out LNL campu! and a
!ub!et of *C er+eley campu!
o%er 30 clu!ter! ranging from 4 node clu!ter! to240 node clu!ter!(
o%er 2000 u!er account!
o%er 140 proect!
7/17/2019 Slurm Migration
4/19
&UR EN'(R&NMEN)
$e u!e t"e term - .SuperClu!ter/ Single a!ter Node - pro%i!ioning node and
!c"eduler !y!tem
ultiple interacti%e login node! ultiple Clu!ter!
# n!titutional and epartmental purc"a!ed node!
lobal &ome ile!y!tem!
latten Networ+ )opology !o e%eryt"ingcommunicate wit" t"e management5interacti%e
node!
7/17/2019 Slurm Migration
5/19
lobal ile!y!tem nteracti%e *!er
Node! 67)' acce!!8
anagement
9 Sc"eduler
Node
Compute Node!
:;C&)
7/17/2019 Slurm Migration
6/19
&UR SE) UP
# Node! are !tatele!! and pro%i!ioned u!ing
$arewulf 6interacti%e, compute and !toragenode!8
# =ob! could not !pan acro!! clu!ter!
# :ll u!er account! e>i!t on t"e interacti%enode!
# :ll u!er account! e>i!t on t"e !c"eduler node
but wit" nologin acce!!(# *!er account! are only added to t"e compute
node! t"ey are allowed to run on(
7/17/2019 Slurm Migration
7/19
7/17/2019 Slurm Migration
8/19
Moab Con"ig!ration ,cont-*.
/CL on u!er!5group! re!ource! t"ey could
u!e in t"e CL:SSC and S;C( )"eywere al!o configured wit"in tor?ue for eac"
?ueue
Li#it%0Policie% -
7/17/2019 Slurm Migration
9/19
Moab Con"ig!ration ,cont-*.
/cco!nt /llocation% old ban+ing forc"arging C'* cycle! u!ed, !etting up rate!
for node type!5feature!
Back"ill
Stan*ing Re%er$ation%
Con*o Co#p!ting
7/17/2019 Slurm Migration
10/19
What #a*e !% #igrate?
# )oo many unpredictable !c"eduler i!!ue!
# i!ting Support contract wa! up for renewal
# Slurm "ad t"e feature! and capabilitie! we
# needed
# Slurm arc"itectural de!ign wa! le!! comple>
#
7/17/2019 Slurm Migration
11/19
What *i* we nee* to #igrate?
# Single !c"eduler capable of managing
department owned and in!titutional owned
node!
# Sc"eduler needed to be able to !eparate
u!er! acce!! from node! t"ey do not "a%eaccount! on(
# Sc"eduler needed to be able to a%oid
!c"eduling conflict! wit" u!er! from ot"er
group!
7/17/2019 Slurm Migration
12/19
What *i* we nee* to #igrate?
# Sc"eduler needed to be able to "a%e limit! on
ob!5partition!5node! independently
# Needed an :llocation management !c"eme
to be able to continue c"arging u!er!
# Need to be able to u!e Node &ealt" C"ec+ met"odfor managing un"ealt"y node!
7/17/2019 Slurm Migration
13/19
What were %o#e o" o!r Challenge%?
igrating o%er A00B u!er! and 1 different clu!ter!wit"in a 4 mont" period to a new !y!tem
mplementation de!ign
&ow will we configure t"e ?ueue!D
&ow will we get fair!"are and bac+fill to wor+D
&ow will we implement our :ccounting5C"arging
modelD 6Condo %! Non-condo u!age on in!titutional
clu!ter8
Can we control *!er :cce!! to t"e node!D
7/17/2019 Slurm Migration
14/19
How *i* we con"ig!re %l!r#?
# Jobs run on nodes exclusively or shared
# Multiple Partitions to separate the clusters from each other
# QoSs to setup the limits for the jobs/partition and which one
a user can use within a partition# Backfill
# Fairshare
#Priority Job Scheduling
# Network topology configured to group the nodes and
switches to best place the jobs
7/17/2019 Slurm Migration
15/19
How *i* we #igrate?
$e ran bot" !c"eduler! during t"e 4 mont" proce!!
!imultaneou!ly
$e migrated a !et of clu!ter! to !lurm or we did a !ingle
clu!ter o%er to !lurm
Create !cript! for adding account! and u!er!$e "ad to educate our u!er!
7/17/2019 Slurm Migration
16/19
Proble#% Enco!ntere*
:ccounting "ad to create a !eparate mec"ani!m for
managing c"arging( *!ed or e>i!ting u!er data te>tfile by adding a few field! to identify a u!er wit" all of
t"e proect! t"ey were a!!ociated wit"( $e need to
"a%e +ey %alue!E u!ername 'roect id and account
name for billing t"e u!er!( Slurm doe! not "a%e t"e
a!!ociation in it $F
7/17/2019 Slurm Migration
17/19
Proble#% enco!ntere*
ac+fill wa! not properly getting all of t"e ?ueued ob!
t"at could fit into t"e !c"eduler( &ad to c"ange !ome
of t"e parameter!
)ree$idt" I 40JK
Sc"eduler'arameter I bfcontinue
efault5in ob limit! mi!!ing t"e ability to !et t"e
default or min limit! are not a%ailable( t would be
good to "a%e at lea!t t"e default !etting a%ailable for$allcloc+ limit! in ca!e a u!er forget! to re?ue!t it(
7/17/2019 Slurm Migration
18/19
Con*o $% Non2Con*o Co#p!ting
Condo Computing - a PI purchase nodes, and we incorporatethem into the compute node environment. The users of the
condo are subject to run for free only on their contribution of
nodes. If they run outside of their contributed condo resources,
they are subjected to the same recharge rates as all other users.
Non-condo Computing Institutional compute nodes + condo
compute nodes not in use by the condo users. Users are
charged a recharge rate of $0.01 per Service Unit (1 cent per
service unit, SU).
NOTE: We reduce the charge rate by 25% for each generation of nodes
lr1 - 0.50 SU per Core CPU hour ($0.005 per core/hr)
mako and lr2 - 0.75 SU per Core CPU hour ($0.0075 per core/hr)
lr3 1 SU per Core CPU hour ($0.01 per core/hr)
7/17/2019 Slurm Migration
19/19
Pl!% %i*e o" SLURM
Capabilitie! for !eparating u!er! re!ource pool %ia
accounting a!!ociation
Capabilitie! for !etting up air!"are from a parent le%el
and t"e c"ildren in"erit e?ual !"are! and get
indi%idually lower priority if t"ey u!e t"e re!ource!
o%er a window of time
So far it "a! been !imple and ea!y to u!e