gompute

download gompute

of 42

Transcript of gompute

  • 8/11/2019 gompute

    1/42

    Introduction to Gompute

  • 8/11/2019 gompute

    2/42

    Table of Contents

    Introduction......................................................................................................................................5Commonly used commands.............................................................................................................5

    qstat.............................................................................................................................................5qrsh..............................................................................................................................................5qmon............................................................................................................................................5qmod............................................................................................................................................5qalter............................................................................................................................................5qhold............................................................................................................................................6qrls...............................................................................................................................................6

    Frequent Operations.........................................................................................................................6

    Disable a node.............................................................................................................................6Enable a node..............................................................................................................................6Increase job priority....................................................................................................................6Decrease job priority...................................................................................................................6Suspend a job..............................................................................................................................6Resume a suspended job.............................................................................................................

    !dditional in"ormation....................................................................................................................!dditional support pro#ided by $omputeSer#er.............................................................................%clicenses.........................................................................................................................................&

    Setup............................................................................................................................................&Con"i%uration..............................................................................................................................&

    $CS$EFle'()D..........................................................................................................................**Introduction...............................................................................................................................**Setup..........................................................................................................................................**Con"i%urin% the inter"ace..........................................................................................................**

    Optional trans"ormation commands.....................................................................................*+De#eloper In"ormation...................................................................................................................*+)odulari,ed En#ironment.............................................................................................................*-

    here is $omputeSer#er located/................................................................................................*-Standard components in $omputeSer#er......................................................................................*-Sun $rid En%ine.............................................................................................................................*-)odules.........................................................................................................................................*-0a#a................................................................................................................................................*-1uic2 3ips......................................................................................................................................+4Con"i%uration.................................................................................................................................+4

    rea2do n................................................................................................................................+4)odule...........................................................................................................................................+5

    De"inition..................................................................................................................................+5Constructor................................................................................................................................+5Resource Calculations...............................................................................................................+67rolo% $eneration.....................................................................................................................+6!pplication Command $eneration...........................................................................................+6

    Gridcore AB Orgnr: 556629-7866 Ashebergsgatan 46 Telephone: 0 ! !8 2! 604!! G"teborg

  • 8/11/2019 gompute

    3/42

  • 8/11/2019 gompute

    4/42

    Introduction$omputeSer#er is $ridcore@s pre pac2a%ed application distribution.

    $omputeSer#er ran%es "rom basic mathematical librariesA C!DA meshin% and #isuali,ation so"t areto sel"Bcontained continuum mechanics sol#ers. !ll $omputeSer#er so"t are components arebundled ith a compliant en#ironmentA necessary "or proper e'ecution o" the so"t are .$omputeSer#er is a complement o" the usual operatin% systems supported by the so"t arepro#idersA "or the time bein% Suse (inu' Enterprise Ser#er and Red 9at Enterprise (inu'. Some o"the applications supported by $omputeSer#er are all !nsys products e.%. FluentA !nsys FE)ACF A etc... A !baqusA (SBDynaA OpenFO!)A )SC productsA etc...

    Gridcore AB Orgnr: 556629-7866 Ashebergsgatan 46 Telephone: 0 ! !8 2! 604!! G"teborg

  • 8/11/2019 gompute

    5/42

    Queue System

    Introduction$omputeSer#er includes $rid En%ine as the queue system "or load mana%ement and optimalutili,ation o" resources. $rid En%ine by itsel" is an open source product and pro#ides a lot o""le'ibility in policy mana%ement. In this section o" the $omputeSer#er documentationA a "ecommonly used $rid En%ine commands are hi%hli%htedA ho e#erA users are stron%ly encoura%ed toread the man pa%es included ith the $omputeSer#er installation to %ain a better understandin% o"the queue system.

    Commonly used commands

    qstatqstat can be used to display the status o" the jobs and queues.

    qstat Displays your jobsqstat u "*" Displays all jobs

    qstat -f Full listing of queuesman qstat Exhaustive information

    qrshqrsh can be used to obtain an interacti#e shell scheduled throu%h the queue system.

    man qrsh Exhaustive information

    qmon$raphical administration tool "or %rid en%ine.

    qmod)odi"y queue states.

    qmod -d all.q Disables all.qqmod -e all.q Enables all.qman qmod Exhaustive information

    qalterqalter can be used to alter job requests and priorities "or jobs already in the queue. ?ote thatordinary users can only decrease the priority o" their o n jobs hile mana%ers can increase or

    Gridcore AB Orgnr: 556629-7866 Ashebergsgatan 46 Telephone: 0 ! !8 2! 604!! G"teborg

  • 8/11/2019 gompute

    6/42

    decrease priorities "or all jobs.

    man qalter Exhaustive information

    qholdqhold is used to set a queued job in hold. 9oldin% a job in the queue ill pre#ent the scheduler "romattemptin% to schedule the job.

    3his is use"ul hen you ha#e a lot o" jobs in the queued and aitin% stateA aitin% to run on theclusterA and you reali,e that you ould ant to let a job at the bac2 o" the queue %et scheduledbe"ore jobs hich are at the head o" the queue. E#ery user can per"orm this operation on his

  • 8/11/2019 gompute

    7/42

    qalter -p -100 146

    Suspend a jobSuspend a job ith id *46

    qmod -s 146

    Sends a suspend si%nal to the job i.e similar to kill -S O! . 3he beha#ior o" suspendin% a job is #erydependent on ho the application handles the kill -S O! si%nal.

    "esume a suspended job

    Resume a job ith id *46qmod -us 146

    Resumes a job pre#iously suspended i.e send kill -CO# to the job. !s in the case o" jobsuspensionA the beha#ior o" the application is dependent on ho it handles these si%nals.

    $dditional in%ormation!dditional in"ormation on $rid En%ine can be "ound athttp:%%$i&is's(n'co)%displa*%Grid+ngine%,o)e

    $dditional support pro&ided by GomputeSer&erE#en thou%h the queue system ill send rele#ant 2ill si%nals to the jobsA it is up to the job to handlethese si%nals in an e""icient ay to handle suspension and resumption. 9o e#erA it is possible toin#o2e custom commands durin% the suspension and resumption procedure to tri%%er customsuspensionA resumption and termination procedures. 3his can be done by placin% scripts ith thenames terminate Asuspend or resume in the or2in% directory o" the job. 3hese scriptsA i" presentA

    ill be in#o2ed ith the job@s 7ID bein% passed as an ar%ument to them.

    3he de"ault procedures hich are in#o2ed are located in the directory 'opt'(cdistro'(cportal'sbin .

    Gridcore AB Orgnr: 556629-7866 Ashebergsgatan 46 Telephone: 0 ! !8 2! 604!! G"teborg

  • 8/11/2019 gompute

    8/42

    )icense Inte(ration3he $omputeSer#er license inte%ration consists o" t o main parts

    (clicenses is responsible "or startin% the license ser#er.

    GCSGEFle*)+D inte%rates the queue system ith the license ser#ers.

  • 8/11/2019 gompute

    9/42

    (clicenses

    Setup%clicenses is a %eneric init script pro#ided ith $omputeSer#er to con"i%ure start

  • 8/11/2019 gompute

    10/42

    lmreread -c opt gcdistro login0# app cdadapco license.dat ( # emember status an be verbose rc status -+ fi ''

    ## Status section for A$S%S an !D"a apco license managersstatus) echo "Status of gclicenses: "

    if -! " #" -o " #" $$ "ans%s" & ' then echo -n " ans%s " s in chec proc opt gcdistro app ans%s licensing ans%slmd # $&'E( rc)status *no+s that +e calle this init script +ith # ,status, option an a apts its messages accor ingly. rc status -+ fi if -! " #" -o " #" $$ "star" & ' then echo -n " star " s in chec proc opt gcdistro app cdadapco , /lm 11. in cdlmd # $&'E( rc)status *no+s that +e calle this init script +ith # ,status, option an a apts its messages accor ingly. rc status -+ fi ''

    ## estart section for A$S%S an !D"a apco license managers

    restart) ## Stop the service an regar less of +hether it +as ## running or not- start it again. 0 stop # sleep 7 0 start # # emember status an be quiet rc status ''

  • 8/11/2019 gompute

    11/42

    GCSGEFle*)+D

    Introduction$CS$EFle'()D is the inter"ace bet een $ompute Ser#er and Fle'() based license ser#ers. 3his inter"ace

    is responsible "or ma2in% sure that the count o" a#ailable and "ree licenses is maintained ithin the queuesystem. 3he license "eatures are con"i%ured automatically as consumables ithin the queue system. >serssubmittin% jobs to the queue system can request these consumables to ma2e sure that their jobs are startedonly hen su""icient resources are a#ailable. 3hese consumables can be requested usin% the standard $ridEn%ine comple' request "la%s. man . comple* "or more details.

    Setup3he %cs%e"le'lmd is located in the "ollo in% directory

    E'ecutable

  • 8/11/2019 gompute

    12/42

    Optional trans%ormation commands

    For certain applications li2e ?astranA !?SGS Fluent etcA there is a possibility to combine di""erent sets o"license "eatures "or runnin% the application. !s an e'ampleA !?SGS Fluent can use either ac"dKsol#er orac"dK"luentKsol#er "eatures to start the sol#er. In these casesA it is easier to be able to present both these"eatures ith the same "eature name in the queue system. 3he trans"ormation options in the %cs%e"le'lmdcon"i%uration are intended "or this purpose.

    3o combine and present both ac"dKsol#er and ac"dK"luentKsol#er as ac"dK"luentKsol#er in the queue systeman entry as belo needs to be made in the %cs%e"le'lmd con"i%uration.

    , 89 5 S 1$"acfd fluent sol+er anshpc pac " # Flex / features to monitor; S ,< 1$" 1077@licser+ " # etails of license server

    /,5; 1$"acfd fluent:acfd fluent sol+er" # optional transformation comman s

    De&eloper In%ormationFE$ /"ES01

    a space separated list o" license "eatures to trac2 ()K(ICE?SEKFI(EK* B a colon separated list o" F(E lmser#ers to query FR)K* B a space separated list o" "eature name trans"ormations o" the "ormoldname ne name

    FE$ /"ES02

    a space separated list o" license "eatures to trac2 ()K(ICE?SEKFI(EK+ B a colon separated list o" F(E lmser#ers to query FR)K+ B a space separated list o" "eature name trans"ormations o" the "ormoldname ne name 3he second set o" #ariables is used i" the "eatures reported are incompatible bet eensome license ser#ers

    )+S $

    the lmstat command to use hile queryin% ser#ers

    !"OD/C S01a space separated list o" products to trac2 R()K(ICE?SEK* B a colon separated list o" R() ser#ers to query

    FR)K7K* B a space separated list o" product name trans"ormations o" the "orm oldname ne name

    !"OD/C S02

    a space separated list o" products to trac2 R()K(ICE?SEK+ B a colon separated list o" R() ser#ers to queryFR)K7K+

    a space separated list o" product name trans"ormations o" the "orm oldname ne name 3he second set o"#ariables is used i" the products reported are incompatible bet een some license ser#ers

    ")+S $

    the rlmstat command to use hile queryin% ser#ers

    S)EE!

    the inter#al to sleep bet een updatin% "eature status

    mailto:1055@licservmailto:1055@licserv
  • 8/11/2019 gompute

    13/42

  • 8/11/2019 gompute

    14/42

    Sometimes non commercial so"t are can also be installed here but al ays under its o n sub directory.

    !s an e'ample 3he Fluent installation on $omputeSer#er has 'opt'(cdistro'app'%luent set as theF)/E# 0I#C directory.

    'opt'(cdistro's(e

    3his is the directory here Sun $rid En%ine is installed and this directory is set as the 7SGE0"OO in the

    en#ironment settin%. 'opt'(cdistro'modules

    3he directory under hich modules is installed. 3he #ersion installed is al ays the latest stable #ersiona#ailable at the time o" installation o" your $omputeSer#er.

    'opt'(cdistro'modules'module%iles

    3he directory hich contains the module"iles %enerated "or each application installed in $omputeSer#er.?ote that al ays application module "iles must e'ist in their o n subdirectory hich ma2es it easier toinclude ne module "iles "or ne #ersions o" the same application. !s an e'ample

    3he module "iles "or Fluent could loo2 somethin% li2e

    'opt'(cdistro'modules'module%iles'%luent'8,9,28

    'opt'(cdistro'modules'module%iles'%luent'8,2,18

    'opt'(cdistro'modules'module%iles'%luent'12

    'opt'(cdistro'modules'module%iles'%luent',&ersion

    Each o" the numbered "iles abo#e contains en#ironment settin%s "or each o" the respecti#e #ersions o" Fluentand the de"ault #ersion is speci"ied in the ,&ersion "ile.

    'opt'(cdistro'packa(es

    3his is the directory here all the so"t are installation "iles "or the di""erent applications installed on

    $omputeSer#er are placed.

    /sin( +odulesIn"ormation on modules and ritin% ne module "iles can be "ound at http:%%)od(les'so(rce1orge'net

    man moduleA man 4 module"ile

    Some e'amples usin% modules

    module a+ail

    Displays the list o" all a#ailable modules and their #ersions.

    module list

    Displays the list o" currently loaded modules

    module load Amodule name

    >pdates the shell en#ironment ith #alues described in the module "ile called LmoduleKnameM

    E'ample

  • 8/11/2019 gompute

    15/42

    module load fluent 1#

    >pdates the en#ironment ith settin%s required "or Fluent #ersion *+.

    module unload Amodule name

    >pdates the shell en#ironment and remo#es #alues described in the module "ile called LmoduleKnameM

    E'ample

    module unload fluent 1#

    Clears the en#ironment ith settin%s required "or Fluent #ersion *+.

    o: to Install a ne: $pplication in GomputeSer&erFollo the steps belo to install a ne application on $omputeSer#er.!s "ar as possible per"orm the steps belo as the user %cadmin. I" you ha#e the root pass ord to your clusteryou can become %cadmin by doin%

    su B gcadmin

    *. Create a subdirectory "or the application you ant to install under 'opt'(cdistro'app . ?ame thedirectory in such a ay that its easily understood hich application is installed there. For e'ample i"you are installin% FluentA you ould call the directory as 'opt'(cdistro'app'%luent .

    +. Run the application and any o" its associated pro%ram@s installer and speci"y the directory to install inas the one you created in the step abo#e. For e'ample "or a "luent installation you ould speci"yFluent.Inc as 'opt'(cdistro'app'%luent .

    -. ?e#er place any application speci"ic en#ironment #ariables directly in your ,bashrc or ,pro%ile orsimilar pre"erences "iles. !ll the application related en#ironment settin%s must be done inmodule"iles. Each application installed must ha#e a subirectory to hold its module"iles under

    'opt'(cdistro'modules'module%iles . >nder this directory you must ha#e #ersion speci"ic module"iles"or each #ersion o" the application installed. For samples you may loo2 at the module "iles in

    'opt'(cdistro'modules'module%iles'%luent hich contains module "iles "or the "luent application.

    4. !ny settin%s "or license ser#ers "or the installed application must be done in the module "ilesdirectory called as 'opt'(cdistro'modules'module-%iles';app0name

  • 8/11/2019 gompute

    16/42

    CC modules modulefileCCproc ;odulesEelp F G F

    glo al +ersion modrootputs stderr "2t,luent 6.H.#6. "puts stderr "2n2tIersion +ersion2n"

    G

    module-3hatis ",luent 6.H.#6."

    C for 9cl script use onl%set +ersion H.#.6set modroot opt gcdistro modulesseten+ , >9 85=E lnamd64prepend-path J89E opt gcdistro app fluent ,luent.9 S ,< #41@masterprepend-path J89E opt gcdistro app fluent ,luent.

  • 8/11/2019 gompute

    17/42

    GSubGSub is the $ompute !pplication Inte%ration "rame or2A aimed at pro#idin% a comprehensi#e ande'tensible "rame or2 "or application inte%ration ith the queue system.

    Quick Start3he command line structure "or $Sub is

    gsu options& application application options&

    Start a sin%le core interacti#e "luent job

    gsu -i fluent Hd

    Start a = core batch "luent job

    gsu -n N fluent Hd -i test.Oou

    OptionsBB#ersionA BhA NhelpA Br RE(E!SEA Nrelease RE(E!SEA Bn 7ROCESSESA Nprocesses 7ROCESSESA B??!)EA Nname ?!)EA BiA Ninteracti#eA BbA NbatchA BdA Ndebu%A BBde#A Nde#elopA B%A N%raphicsBaBBnoB%raphicsBaccelerationA Nn%aA BB%raphicsBspoilin%A N%sA BBnoB%raphicsBspoilin%A Nn%sA BB#ersion :$(K:ERSIO?A BBlistBapplicationsA BBlistB#ersionsA Bp 7RO0EC3K?!)EA NprojectBname 7RO0EC3K?!)EA BBlistBprojectsA Bq 1>E>EA Nqueue 1>E>EA BBe'clusi#eBschedulin%A BB aitB"o

    jobs !I3KFORK0O SA BBsmpA BBlo%"ile (O$FI(EA BBpostBoperations 7OS3KO7ER!3IO?SA BBs2ipBpoperationsA BBhi%hBpriorityA BBcon"i% CO?FI$A BB'mlA BBterseA BB d ORPI?$KDIREC3ORGA BBs

    BBar RESER:!3IO?A Bt 3I)EA Ntime 3I)EA Bm )!I(A Nmail )!I(A BB%etBapplicationBnameA BB%etBinput"ileA BBresources RESO>RCESA BBe'traBslotA BBapplicationBoptionsA BBdes2top DESP3O7.

    BB#ersion

    7rint the $Sub #ersion and e'it.

    gsu --+ersion

    BhA BBhelp

    7rint the help and e'it.

    gsu -h

    Br RE(E!SEA BBrelease RE(E!SE

    Request "or a speci"ic #ersion o" the application

    gsu -r #011a matla

    Bn 7ROCESSESA BBprocesses 7ROCESSES

    ?umber o" processes to reques.

  • 8/11/2019 gompute

    18/42

    gsu -n N fluent Hd -i test.Oou

    B? ?!)EA BBname ?!)E

    3he job name as sho n by qstat

    gsu -n test Oo 3rf.e?e

    BiA BBinteracti#eRun the job in interacti#e mode. 3his means that the current terminal is used "or I

  • 8/11/2019 gompute

    19/42

    BBnoB%raphicsBspoilin%A BBn%s

    Disables %raphics spoilin%.

    gsu --ngs -g gl?gears

    BB#%lB#ersion :$(K:ERSIO?

    Selects ith bersion o" :irtual$( to use. 3his can "or e'ample be used "or testin% a ne er #ersion than thesystem de"ault.

    gsu -g --+gl-+ersion #.#.1 gl?gears

    BBlistBapplications

    (ist the applications the current $Sub installation ha#e support "or. 3his only includes applications that ha#ea $Sub module present or applications ith an section present in the con"i%uration "iles. Gou mi%ht still beable to use $Sub to launch other applications.

    gsu --list-applications

    BBlistB#ersions(ist the #ersions installed o" the %i##en application. 3his option uses the )odules system to determine the#ersions present in the system.

    gsu --list-+ersions lsd%na

    Bp 7RO0EC3K?!)EA BBprojectBname 7RO0EC3K?!)E

    Speci"y the project that you ant to couple to job to. 3his is used by the accountin% en%ine in $ompute )D.

    gsu -p 9 S9 J5LP =9 date

    BBlistBprojects

    (ist the di""erent projects that you ha#e access to.

    gsu --list-proOects

    Bq 1>E>EA BBqueue 1>E>E

    Speci"y the queue in hich you ant to run your job. 3his can include ildcards and speci"y speci"ic nodes.

    3he "ollo in% e'ample ill tar%et the queue called all.q on any node ith a name startin% ith node.

    gsu -q all.q@node* date

    BBe'clusi#eBschedulin%

    Request "or e'clusi#e schedulin%. 3his ill ma2e sure your job is scheduled alone on a machine and don@tshare it ith any other job.

    gsu --e?clusi+e-scheduling fluent

    BB aitB"orBjobs !I3KFORK0O S

    (ets you speci"y jobs that this job depends on. I" speci"iedA this job ill not start until the the speci"ied jobsha#e e'ited the queue. 3he jobs are identi"ied either by id or name.

    gsu --3ait-for-Oo s 1#H47 date

  • 8/11/2019 gompute

    20/42

    BBsmp

    Request to run the job as a S)7 job. 3his ill schedule the job on one machine opposed to spreadin% it o#ermultiple machines.

    gsu --smp -n N fluent

    BBlo%"ile (O$FI(E

    Speci"ies the lo%"ile that output ill be redirected to. 3he de"ault is Q0O KID.lo%

    gsu --logfile date.log date

    BBpostBoperations 7OS3KO7ER!3IO?S

    !llo s the user to speci"y commands that ill run a"ter the job command is completed. 3his can be used toper"orm actions such as copyin% result "iles or remo#in% temporary "iles created by the job.

    gsu --post-operations "rm *.tmp" fluent

    BBs2ipBpostBoperations

    3his option tells $Sub to s2ip the de"ault post operations. 3he application inte%ration mi%ht include postoperations that is added to all jobs by de"ault.

    gsu --s ip-post-operations lsd%na

    BBhi%hBpriority

    !pplication inte%rations can chose to submit jobs ith decreased priority by de"ault. u apsci"yin% thisoption the job ill %et submited ith the system de"ault prioroty instead.

    gsu --high-priorit% gam it

    BBcon"i% CO?FI$

    Speci"y a custom directory "or the $Sub con"i%uration "iles. It is intended to be used by system administratorand de#elopers.

    gsu --config Q gsu cfg date

    BB'ml

    Format the $Sub output as )( here applicable. 3his is intended to be hen callin% $Sub "rom anapplication or script.

    gsu --?ml -h

    BBterseEnable terse printout. 3his is intended to be used hen callin% $Sub "rom an application or script.

    gsu --terse date

    BB d ORPI?$KDIREC3ORG

    Speci"y the or2in% directory to use "or the application. 3he de"ault is to use the current directory.

    gsu --3d home gdpt gompute shared Oo / date

    BBscratch

  • 8/11/2019 gompute

    21/42

    Enables the use o" the local scratch dis2 here applicable. 3he de"ault is the use the shared dis2 hich can beslo hen runnin% applications that per"orms a lot o" scratch IO.

    gsu --scratch a aqus

    BBar RESER:!3IO?

    Speci"y an ad#anced reser#ation to use. See man qrsub "or details on ad#anced reser#ations.

    gsu --ar 1#H date

    Bt 3I)EA BBtime 3I)E

    Set a time limit on the job a"ter hich the job ill be 2illed by the queue system. 3he "ormat is 99 mm ss.

    gsu -t 1:0:0 lsd%na

    Bm )!I(A BBmail )!I(

    Speci"y mail address es that the queue ill send status updates to. 3he de"ault con"i%uration is to sendupdates on job startA endA abort and suspend.

    gsu -m user@ser+er.com date

    BB%etBapplicationBname

    7rints the application name "or the %i#en command line. 3his is intended to be used hen callin% $Sub "roman application or script.

    gsu --get-application-name lsd%na

    BB%etBinputB"ile

    7rints the application input "ile "rom the %i#en command line. 3his is intended to be used hen callin% $Sub"rom an application or script.

    gsu --get-intput-file lsd%na i$infile.in

    BBresources RESO>RCES

    Comma separated list o" resources to request "rom the queue system in addition to those added by thecon"i%uration "ile and application module.

    gsu --resources infini and mppd%na

    BBe'traBslot

    Request an additional slot "rom the queue system to use "or the control process. 3his can be use"ul "or

    applications here a simulation usin% "or e'ample *8 coresA launches one control process and *8 or2ers. yallocation one additional slot "or the control process e pre#ent the system to be o#erloaded and slo in%do n the simulation.

    ?ot all modules are a are o" this option and ill in those cases use the e'tra slot as i" the user had requestedit usin% Bn 7ROCESSESA B Bprocesses 7ROCESSES.

    gsu -n 10 --e?tra-slot starccmR

    BBapplicationBoptions

    7rints a subset o" options that the application supports. 3his is intended to be used hen callin% $Sub "roman application or script.

  • 8/11/2019 gompute

    22/42

    gsu --application-options fluent

    BBdes2top DESP3O7

    Speci"y the ** des2top to use "or %raphical output. 3he de"ault is to use the des2top "rom hich $Sub islaunched.

    gsu --des top login01:1H ?term

  • 8/11/2019 gompute

    23/42

    E*tendin( (sub

    Introduction$Sub is desi%ned to be e'tendableA the result is that almost e#ery part o" the job submission process can bemodi"ied. 3he de#elopment is done in python +.4. hen e'tendin% $SubA you should i" possible ma2e surethat your additions or2 on python +.4. E'tendin% $Sub requires 2no led%e o" python de#elopment and$rid En%ine job submission.

    3he (ic& #tart contains a al2 throu%h o" the "luent module. 3his can be used as a startin% point "or youro n module.

    3he 3e/eloper G(ide pa%e contains detailed in"ormation on ho to e'tend $sub.

  • 8/11/2019 gompute

    24/42

    Quick Start

    Introductione ill %i#e a al2 throu%h o" the "luent $Sub con"i%uration and module. Gou should ha#e basic 2no led%e

    about $rid En%ine and 7ython "or this.

    Quick ips$Sub pro#ides t o options that are use"ul hen de#elopin% a moduleA -d and --de&. 3he -d enables debu%modeA this ill print arnin%s and errors to the console "or easy debu%%in%. 3he --de& enabled de#elopermodeA this ill run the module as normal but instead o" submittin% the rapper script to $rid En%ine it illprint the submission command and the script to the console.

    3he typical test command line hen de#elopin% the "luent module in serial mode ould be

    gsu -d --de+ fluent Hddp -i test.Oou

    Or "or parallel

    gsu -d --de+ -n N fluent Hddp -i test.Oou

    Con%i(uration3his is the complete con"i%uration "or the "luent module. e ill pro#ide a brea2do n o" the lines ande'plain hat they do.

    fluent&peS;J: smppe;Jame: , >9logfile: fluent PLM < .outapp>ame: fluent

    =reakdo:n

    peS;J: smp

    3his con"i%ures $Sub to use the parallel en#ironment smp "or sin%le machine jobs.

    pe;J

  • 8/11/2019 gompute

    25/42

    3he lo% "ile $rid En%ine ill use. atch jobs ill redirect stdout and stdin here.

    app>ame: fluent

    3he application name. 3his is used "or accountin% purposes

    +odule9ere e pro#ide a step by step brea2do n o" the P complete script

    De%inition

    import liase $ li .getKSu T)

    Retrie#e the base class that e should e'tend. 3his is needed since the base $Sub module can be o#erriddenby site and

  • 8/11/2019 gompute

    26/42

    "esource Calculations

    def resource=alculationTself): ase.KSu .resource=alculationTself)

    Start by callin% the base implementation. 3his ill add resource requests "or %raphics acceleration etc.

    C icense resource calculationfeatures $ FGC We al3a%s need one sol+er licensefeatures "acfd fluent sol+er"& $ 1.0

    C ,or parallel runs 3e also need EJ= pac sif self.numJrocesses 1: features "anshpc pac "& $ 1.0if self.numJrocesses N: features "anshpc pac "& R$ 1.0if self.numJrocesses H#: features "anshpc pac "& R$ 1.0if self.numJrocesses 1#N: features "anshpc pac "& R$ 1.0if self.numJrocesses 71#: features "anshpc pac "& R$ 1.0

    Calculates the total number o" licenses e ill need to run.C >ormalise % the num er of =J s for SKfor feature in features. e%sT): features feature& $ self.numJrocesses

    $rid En%ine ill allocate the number o" resources requested multiplied by the number o" cores requested. Inorder to ma2e it request the ri%ht number o" licenses e then need to di#ide the total licenses needed by thenumber o" cores requested.

    for feature in features. e%sT): self.resources .appendT"Ds$D.Ns" DTfeatureU features feature&))

    !dd the needed resources to the resource list. 3hese ill be added to the rapper script by the baseimplementation.

    !rolo( Generation

    def prologKenerationTself): ase.KSu .prologKenerationTself)

    Call the base implementation "irst.

    self.3rapper ines.appendT"e?port SK =XJ9

  • 8/11/2019 gompute

    27/42

    3his method don@t call the base implementation since e ant to ha#e "ull control o" the command line.

    command $ self.+gl=ommand

    !dd the :$( start command i" applicable. 3his is usually somethin% li2e #%lrun Bc pro'y Bsp

    command R$ self.args 0&

    !dd the binary.

    command R$ " -rDs" D self.fluent5elease

    !dd -r parameter ith the application #ersion.

    if self.graphics8cceleration: command R$ " -dri+er opengl"

    3ell "luent to use Open$( i" e ha#e requested "or accelerated %raphics.

    if not self.is

  • 8/11/2019 gompute

    28/42

    import s%s

    import liase $ li .getKSu T)

    class KSu T ase.KSu ):

    > 9 ;8J $ F

    (6.1.##(:(-pnmpi(U (6.#.16(:((U (6.H.#6(:(-pi .ofed(U (6.H.H7(:(-pi .ofed(U (1#.0. (:(-pinfini and(U (1#.0.16(:(-pinfini and(U (1#.1.#(:(-pinfini and( G

    def init TselfU optionsU argsU configU section): ase.KSu . ini t TselfU optionsU argsU configU section)

    if T1 A self.numJrocesses and self.graphics8cceleration): print "Vou cannot com ine a parallel launch 3ith a request for a graphics node" s%s.e?itT1)

    if T#04N A self.numJrocesses):

    print "Vou cannot run on more than #04N cores using 8>SVS EJ= Jac s." s%s.e?itT1)

    self.fluent5elease $ self.get5eleaseT)

    C M% default 3e 3ill tr% our luc 3ith no net3or specified. self.p+ersion$"" if self.> 9 ;8J.has e%Tself.fluent5elease): self.p+ersion $ " " R self.> 9 ;8J.getTself.fluent5elease)

    def resource=alcula tionTself): ase.KSu .resource=alculationTself)

    C icense resource calculation features $ FG C We al3a%s need one sol+er license

    features "acfd fluent sol+er"& $ 1.0

    C ,or parallel runs 3e also need EJ= pac s if self.numJrocesses 1: features "anshpc pac "& $ 1.0 if self.numJrocesses N: features "anshpc pac "& R$ 1.0 if self.numJrocesses H#: features "anshpc pac "& R$ 1.0 if self.numJrocesses 1#N: features "anshpc pac "& R$ 1.0 if self.numJrocesses 71#: features "anshpc pac "& R$ 1.0

    C >ormalise % the num er of =J s for SK for feature in features. e%sT): features feature& $ self.numJrocesses

    for feature in features. e%sT): self.resources .appendT"Ds$D.Ns" DTfeatureU features feature&))

    def prologKenerationTself): ase.KSu .prologKenerationTself) self.3rapper ines.appendT"e?por t SK =XJ9

  • 8/11/2019 gompute

    29/42

    def application=ommandKenerationTself): command $ self.+gl=ommand

    command R$ self.args 0&

    command R$ " -rDs" D self.fluent5elease

    if self.graphics8cceleration: command R$ " -dri+er opengl"

    if not self.is

  • 8/11/2019 gompute

    30/42

    De&eloper Guide

    $d&ancede ill start by e'plainin% the di""erent concepts e use in $Sub and than sho ho to rite a custom

    application module.

    3here are "our main parts that to%ether de"ines the beha#ior o" $Sub. Con"i%uration "iles oot module ase module !pplication module

    3he steps in#ol#ed are 7arse command line. Done by the boot module. Find the module to load. Read "rom the con"i%uration "iles $enerate a rapper script and submit to $rid En%ine. Done by the base and application module.

    hen e tal2 about module e mean a python module containin% a class names $Sub oot or $Subdependin% on the module type. y e'tendin% a module e mean that your $Sub oot class e'tends the classin the module you e'tend.

    Con%i(uration Files3he de"ault con"i%uration directory here $Sub loads the con"i%uration "iles is 'opt'(cdistro'etc'(sub

    3his "older contains t o con"i%uration "ilesA (sub,c%( and mappin(,c%( .

    In addition to these $Sub ill try to load "iles ith the same name "rom a sub directory ith the current linu'%roup name. E.%. i" a user in the %roup %dptK%ompute runs %subA it ill load the con"i%uration "iles "ound in;con%i(Dir

  • 8/11/2019 gompute

    31/42

    $pplication +appin(3he "irst un2no n command line ar%ument is assumed to be the binary the user ant to launch. For usin% thesame application module to handle di""erent binaries or i" you "or some other reason ant to name yourmodule somethin% other than the binary name you de"ine mappin%s bet een binary names and applicationmodules in mappin%.c"%.

    =ase +oduleI" no application module i" "oundA $Sub ill load the base module. 3his module ill %enerate the a basic

    rapper script that ill launch the speci"ied binary.

    3he base module consists o" a number o" di""erent methods each responsible "or di""erent parts o" the jobsubmission. 3his includes methods "or the di""erent parts o" the rapper scriptsA methods "or sendin% thescript to $rid En%ine and utility methods used by the module itsel".

    $pplication +odule

    )any applications ill be able to run usin% the base module and the con"i%uration "ile. !pplications that usesstart command e.%. mpirun or ha#e some custom resource calculation "ormula ill require an applicationmodule in order to or2 properly.

    4rapper Script3he rapper script contains all in"ormation $rid En%ine needs in order to run the users command. e ill

    al2 throu%h the script that is %enerated hen runnin% the command (sub

    Sections

    3he rapper script consists o" a number o" distinct sections. 9ere e %i#e a short description o" the di""erentsections and hat our sample script contains.

    eader3he script headers de"ines the interpreter the system should use. 3he de"ault beha#ior should not be modi"ied"or this section.

    CZ in ash --login

    Grid En(ine3he $rid En%ine section contains options that $rid En%ine ill interpret. 3his is here custom $rid En%ineoptions should be added.

    CSK parametersC -S in shC -c3dC -O %C -o PLM < .logC -> gsuC -q all.qC -p 0C -8 KL;J8JJ$un no3n

  • 8/11/2019 gompute

    32/42

    !rolo(3he prolo% contains commands that ill run be"ore the users command. 3he de"ault is to ma2e bac2ups o"the rapper scriptA the or2in% directory and the machines used "or the job.

    3his is here en#ironment initiali,ation should be done. 3his includes loadin% modulesA settin% en#ironment#ariables etc.

    CJrologm+ home gcadmin gsu cM4

  • 8/11/2019 gompute

    33/42

    4rapper Script +ethods3hese are the methods in#ol#ed in creatin% the rapper script.

    Instead o" ritin% directly to the rapper "ile the di""erent methods should add their lines to the list o" strin%s

    self.3rapper,ile

    resourceCalculation Calculate the resources that the command needs in order to run. 3his usually means thenumber o" cpu cores e are %oin% to use and in the case o" license inte%ration the licenses the application

    ill request. 3he #ariable sel%,resources contains the resources that ill be requested as a list o" strin%s. 3he"ormat o" the resource strin% is the same as in $rid En%ine.

    E'ample Request "or one test license per cpu core

    self.resource.appendT"test license$1"G

    headerGeneration !dd the script header to sel%,:rapper)ines .

    s(eGeneration !dds the $rid En%ine section to sel%,:rapper)ines 3his method ill add $rid En%ineresource requests i" sel%,resources contains anythin%.

    prolo(Generation !dds the prolo% to sel%,:rapper)ines .

    applicationCommandGeneration Frames the command and adds the required lines to the list o" strin%s

    sel%,command)ines .

    epilo(Generation !dds the epiolo% to the script.

    System +ethods3hese are the methods settin% up the system "or submission and per"ormin% cleanup.

    (enerate4rapperFile Creates the rapper "ile and stores the path in

    self.3rapperJath

    3his only creates the "ileA the content is ritten in by preSubmissionOperations

    preSubmissionOperations 3his is called be"ore the script is submitted to $rid En%ine. this is here therapper script is ritten to dis2.

    postSubmissionOperations Called a"ter the script is submitted to $rid En%ine. 3his is here you ouldper"orm cleanup etc.

    elper methodsescape>strin(?@ strin( Escapes a stin% "or use in the rapper script.

    (et"elease@ strin( I" there are multiple #ersions o" an application installedA the user can select the releaseusin% the BBrelease option. 3his method ill return the release that should be loaded.

    =ase +odule Aariables3he base module de"ines a number o" #ariables that can be used by application modules.

    app#ame@ strin( 3he name o" the application. 3his is read "rom the con"i%uration "ile.

    ar(s@ strin(B 3his contains the command line application the user ants to run.

  • 8/11/2019 gompute

    34/42

    backupDir@ strin( 3he directory here e store the bac2ups.

    command)ine@ strin(B 3he command lines used to start the application.

    con%i(@ Con%i(!arser 3he con"i% parser used to read the con"i%uration "ile.

    debu(@ boolean 3his is set to true i" e are runnin% in debu% mode.

    departmentId@ strin( 3he current users department

    de&elop@ boolean 3his is set to true i" e are runnin% in de#eloper mode.

    display@ strin( 3he display that ill be passed to the command en#ironment. 3his is read "rom theen#ironment by the base module. 3he user can speci"y a di""erent #alue ith BBdisplay

    e*traSlot@ boolean Set to true i" the application should use one e'tra slot to compensate "or a control processetc.

    (raphics$cceleration@ boolean Set to true i" the application should use accelerated %raphics.

    (raphicsSpoilin(@ boolean Set to true i" %raphics spoilin% is enabled.

    hostname@ strin( 3he hostname o" the machine here $Sub is started.

    inputFile@ strin( 3he name o" the input "ile. 3his should be set by the application module i" possible. 3his isused by BB%etBinputB"ile.

    isInteracti&e@ boolean Set to true i" this is an interacti#e job.

    job#ame@ strin( 3he name o" the job.

    loadDe%ault+odule@ boolean Is set to true i" the de"ault en#ironment module should be loaded "or theapplication.

    lo(%ile@ strin( 3he lo% "ile to here stdout and stderr is redirected.

    mailE&ents@ strin( 3he e#ents on hich $rid En%ine ill send a mail.

    module@ strin( 3he module e are currently runnin% as.num!rocesses@ int 3he number o" cores the job is %oin% to request.

    options@ OptionAalues 3he parsed command line options.

    priority@ int 3he priority the job ill request "rom $rid En%ine.

    queue@ strin( 3he queue the job ill run in.

    resources@ strin(B Resource to be requested "rom $rid En%ine.

    scratchDir@ strin( ! a path to a directory that can be used as a scratch directory.

    useOld4D@ boolean ill be set to true on old $rid En%ine installations.

    user#ame@ strin( 3he current users username.&(lCommand@ strin( 3he :irtual$( command that should be used to start the application.

    &(lAersion@ strin( 3he #ersion o" :irtual$( that ill be loaded.

    :rapper)ines@ strin(B 3he lines that ill be ritten to the rapper script.

    :rapper!ath@ strin( 7ath o" the rapper script.

  • 8/11/2019 gompute

    35/42

    GSub Con%i(uration

    Con%i(ure GSub3he $Sub con"i%uration consists o" t o "iles mappin(,c%( and (sub,c%( . mappin(,c%( maps commands to$Sub modulesA (sub,c%( contains the settin%s to use "or a speci"ic module.

    3he name o" the binary is read "rom the command line and used as the module name. $Sub then chec2smappin(,c%( o" there is an entry "or that name and i" there isA use the ne #alue as module name. $Sub thenuses the module name to determine hich section in (sub,c%( to read.

    mappin(,c%(3he mappin% "ile speci"ies mappin% bet een commands and $Sub modules. 3his can be used hen anapplication ha#e multiple commands or i" di""erent #ersions ha#e di""erent names.

    For !baqus

    a q6 4: a aqusa q6NH: a aqusa q6[#: a aqusa q6[ef#: a aqus

    (SBDyna

    lsd%na s: lsd%nalsd%na d: lsd%namppd%na s: lsd%namppd%na d: lsd%na

    (sub,c%((sub,c%( contains a section called DEF!>(3 ith all the system de"ault #alues. Each module speci"icsection inherits the #alues "rom the DEF!>(3 section. )any o" the options can be o#erridden by commandline options. See Options "or a listin% o" the a#ailable options.

    Aalues

    backupDir 3he path here bac2ups are ritten. 3his should normally not be chan%ed in a module section.

    scratchDir 3he directory to use "or scratch "iles. 3his should normally not be chan%ed in a module section.

    projectDir 3he directory containin% the project "iles. 3his should normally not be chan%ed in a module

    section.&(lAersion 3he #ersion o" :irtual$( to load.

    (raphics$cceleration Controls i" e request %raphics acceleration by de"ault or not.

    (raphicsSpoilin( Controls i" e use %raphics spoilin% by de"ault or not.

    loadDe%ault+odule I" this is set to true the "ollo in% line ill be added to the rapper prolo%ue

    modue load )OD>(EK?!)E

  • 8/11/2019 gompute

    36/42

    peS+! 3he parallel en#ironment to use "or sin%le machine jobs.

    queue 3he queue to submit the job to.

    priority 3he priority %i#en to %rid en%ine "or the submitted job.

    resources Resources to request "rom %rid en%ine. )ultiple resources can be speci"ied as a comma separatedlist.

    mailE&ents 3ells %rid en%ine hen to send mail.

    interacti&e 3his can be used to ma2e an application start in de"ault mode by de"ault.

    %orceS+! Force sin%le machine mode. 3his can be used "or applications that can@t run distributed o#ermultiple machines.

    smp+a*Si3e 3he lar%es number o" cores a sin%le machine job can request.

    e*traSlot Controlls i" e should allocate an e'tra slot. See BBe'traBslot.

    app#ame 3he application name as it ill appear in accountin% etc.

    job#ame 3he de"ault name o" the job as it ill appear in the job listin% etc.

    lo(%ile 3he name o" the lo% "ile to hich stdout and stderr ill be ritten.

    postOperations Commands that ill be e'ecuted a"ter the application "inished.

    useOld4D Enables an or2around "or older %rid en%ine installations.

  • 8/11/2019 gompute

    37/42

    )inu* tips

    )inks(inu' cheat sheet. http:%%1iles'1oss$ire'co)%2007%08%1$(ni re1'pd1

    ! more comprehensi#e %uide. http:%%cb'/(%(ni toolbo ' ht)l

    4orkin( :ith archi&es

    ipCreate a compressed archi#e "oo.,ip containin% the directory "oo and it@s content.

    !ip -r foo.!ip foo

    >npac2 "oo.,ip

    un!ip foo.!ip

    arCreate an uncompressed archi#e "oo.tar containin% the directory "oo and it@s content.

    tar cf foo.tar foo

    >npac2 "oo.tar

    tar ?f foo.tar

    ar'GCreate a compressed archi#e "oo.t%, .tar.%, can be used as ell containin% the directory "oo and it@s content.

    tar c!f foo.tg! foo

    >npac2 "oo.t%,

    tar ?cf foo.tg!

  • 8/11/2019 gompute

    38/42

    $pplications

    +$ )$=

    )E "iles 7recompilation From )!3(! Standalone >sin% the inary Compile in 0ob Pno n 7roblems 7lottin% Crashes the :?C Session

    +E %ileshen or2in% "rom a indo s or2station ith a (inu' cluster you need to ma2e sure your )E "iles are

    compiled to the correct binary "ormat. Files compiled on indo s ill not or2 on (inu' and #ice #ersa.

    Gou can choose to either precompile the me' "iles or compile them "rom ithin a batch job.

    !recompilationhen you precompile a me' "ile you copy the source to the cluster and compile it manually.

    Compilation on the cluster is done in the same ay as on a indo s or2station. Gou either do it "romithin% )!3(! or use the standalone command mex .

    From +$ )$=3he "ollo in% command ill start )!3(! ith the %raphical inter"ace.

    gsu -i matla -des top

    Compilin% a C method.

    me? %prime.c

    Compilin% a "ortran method.

    me? %primef., %primefg.,

    Standalone>sin% the standalone me' command "rom )!3(! +8**a

    module load matla #011ame? %prime.c

  • 8/11/2019 gompute

    39/42

    /sin( the =inaryOnce you ha#e the binary you can either copy it bac2 to your or2station and include it in jobs you send tothe cluster or you can tell the jobs to read it "rom a "older on the cluster.

    Include the binary as a part o" the job

    Oo $ atchT(script(U (,ile ependencies(U (%prime.me?a64()'

    >se the binary on the cluster

    Oo $ atchT(script(U (Jath ependencies(U (A,older=ontaining; /Minaries ()'

    Compile in 6obhen compilin% as a part o" a job you need to add the source "iles as "ile dependencies. 3he dependent "ilesill not be in the or2in% directory on the cluster so you ill need to do a cd be"ore you can access them.

    ! script that compiles a me' "ile and calls the compiled method

    function compile testT)

    tmp $ get,ile ependenc% irT)'

    3d $ cdTtmp)'me? %prime.c'cdT3d)'

    %primeT1U 1:4)

    3he script to submit the job

    Oo $ atchT(compile test(U (,ile ependencies(U (%prime.c(U (=urrent irector%(U (.()'

    3aitTOo )'diar%TOo )destro%TOo )

    no:n !roblems

    !lottin( Crashes the A#C Session3he system mesa library can cause )!3(! to crash the :?C ser#er. ! or2around "or this it to tell)!3(! to use the internal mesa #ersion. Gou can do this by includin% the "ollo in% line in your scripts

    opengl soft3are

    )ore in"ormation can be "ound herehttp:%%$$$')ath$or&s'co)%)atlabcentral%ne$sreader%/ie$ thread%!58572 http:%%$$$')ath$or&s'co)%help%techdoc%re1%opengl'ht)l

  • 8/11/2019 gompute

    40/42

    roubleshootin(

    Introduction9ere are some tips that can be used to troubleshoot hen there is some problem ith the cluster.

    Check the connection3he "irst thin% to chec2 is that you can connect to the cluster. 3he easiest ay is to either use$ompute plorer or SS9 directly. Gou should either be able to connect or %et some error messa%e.

    3here are three di""erent types o" error messa%es

    #et:ork errorI" the error is that the host is unreachableA connection timed out or similar it@s most li2ely somethin% ron%

    ith your net or2 and you should tal2 to your I3 support since.

    Connection re%usedI" the messa%e says that the connection is re"used there could be somethin% ron% ith the cluster and youshould contact $ridcore help des2.

    $uthentication errorI" you %et an error messa%e sayin% somethin% about authentication "ailure it means that the connection is

    or2in% but you most li2ely ha#e the ron% pass ord. 3ry resettin% your pass ord. 3his can be done trou%h

    .%ompute.com i" your cluster is connected to the %ompute authentication ser#er. Other ise you shouldcontact the cluster administrator "or "urther help.

    Check the job3he "irst thin% e usually do i" there is a speci"ic job that the user e'perience some problem ith is to chec2the queue state. Runnin% qstat ill sho you a list o" all your jobs in the queue. I" you are troubleshootin%some other users job you need to run either qstat -u ;user< "or sho in% that users jobA or qstat -u "orsho in% all users jobs.

    I" there are jobs in error stateA the state column contains an EA or in queued state you can run qstat -j

    ;jobID< to %et in"ormation on the job includin% the schedulin%A includin% possible errors.I" e#erythin% loo2s OP here you can %o to the jobs or2in% directory. Gou can "ind this "or a runnin% job byloo2in% at the c d line o" the qstat -j printout. For "inished jobs submitted ith $Sub you can run cat

    'opt'(cdistro'(sub'backups'c:ds';jobID< A this requires that you either are that user or root.

    Once in the directory you can loo2 "or lo% "iles. (oo2 "or "iles endin% ith .lo% or similar.

    Check the systemGou can chec2 the status o" the computin% nodes by runnin% qstat -% and loo2 at the usa%eA load and statecolumns.

  • 8/11/2019 gompute

    41/42

    3he load should not be hi%her than the used number o" slots on a machine. I" it is hi%her you can run qstat -Fto %et in"ormation on hich jobs should be runnin% on that machine. Do an SS9se the $an%lia %raphsto loo2 at the memory usa%e and load o" the machines in#ol#ed in the run.

    Other causes can be that the "ile system is runnin% slo or in the case o" distributed job that there is someproblem ith In"ini and.

    6ob not startin(

    ! job mi%ht be in the q state and ne#er start. 3his can be caused by lac2 o" licensesA no "ree slots etc. Runqstat -j ;jobID< to %et detailed schedulin% details.

    6ob in error stateSometimes jobs mi%ht end up in the error state. Details to hy should be printed by qstat -j ;jobID< . 3hiscan be caused by problems ith the "ile system.

    6ob crashin((oo2 at the or2in% directory and see o" you can "ind lo%s e'plainin% hy the application ha#e e'ited. !dead job ithout any lo% messa%es can in some cases be 2illed by the system out o" memory 2iller. Gou cano"ten determine i" this is the case by loo2in% at the $an%lia memory %raphs.

  • 8/11/2019 gompute

    42/42

    o: to (et elp

    "eportin( !roblems and Feedback7roblemsA bu%sA impro#ement requests and %eneral "eedbac2 can be sent to helpdes2U%ridcore.se.