Speed up your research: How to get 40 computers to do your...
Transcript of Speed up your research: How to get 40 computers to do your...
![Page 1: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/1.jpg)
Speed up your research: How to get 40 computers to do your work for you
Bingbing YuanJune 19, 2008
![Page 2: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/2.jpg)
• barra: 4 GB RAM
• LSF (Load Sharing Facility) Cluster– 36 machines (+ 42 lab specific machines )
• 34: 4 GB RAM per machine• 2: 8 GB RAM per machine
ncc001
ncc005 ncc64003 ncc64026 ncc64010
User Jobs submitted from barra
![Page 3: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/3.jpg)
bsub – submit jobs• bsub myscript• Send notification to specified email
– bsub –u [email protected] myscript• Send error and standard output to files
– bsub –e error_file –o std_file myscript– bsub –e error_file –o std_file “myscript >result”
• Send job with specific queue– bsub –q sq32hp myscript
• Send job to a host– bsub –m ncc64022 myscript
![Page 4: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/4.jpg)
Check the job status • bjobs: pending, running and suspending jobs
![Page 5: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/5.jpg)
bjobsShow all the running jobs
![Page 6: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/6.jpg)
bjobs
• also show finished jobs: -a
![Page 7: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/7.jpg)
bkill – kill jobs
• bkill JOBID– bkill 124047
• Kill all jobs– bkill 0
• kill all jobs running as ‘normal’ queue– bkill –q normal 0
![Page 8: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/8.jpg)
• bpeek – peek at the stdout and stderr output of unfinished job– bpeek JOBID
• bpeek 124047• bstop - suspends unfinished jobs
– bstop 124047
• bresume - resumes suspended jobs– bresume 124047
![Page 9: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/9.jpg)
LSF commands are shown in black; Job states are shown in blue; Job events are shown in red;
host resource information is shown in purple.
Picture from Los Alamos National Laboratory ( http://asci-training.lanl.gov/LSF/ )
![Page 10: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/10.jpg)
LSF selects which job to run next based on:
• Resources requirements of the applications – queue– job requirement
• Current load conditions• How important you are
![Page 11: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/11.jpg)
bqueues -- queue
TotalJobSlots
JobSlotsPending
JobSlotsRunning
JobSlotssuspended
![Page 12: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/12.jpg)
32-bit nodes
highpriority
Mediumpriority
Lowpriority
sq32hp<20 min
lq32hp>20 min
sq32mp<20 min
lq32mp> 20 min
sq32lp< 20 min
64-bitnodes64-bitnodes
highpriorityhigh
prioritylow
prioritylow
priority
sq64hp<20 minsq64hp<20 min
lq64hp>20 minlq64hp>20 min
lq64lp>20 minlq64lp
>20 min
normal : default queuepriority: 10restart to lq64lp after 30min
queues in the cluster
high: 50medium: 40low: 20-30
![Page 13: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/13.jpg)
Only available on 32-bit nodes
• RepeatMasker– Mask repetitive DNA
• EMBOSS applications– A suite for sequence analysis– http://iona.wi.mit.edu/bio/tools/emboss/
![Page 14: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/14.jpg)
LSF selects which job to run next based on:
• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement
• Current load conditions• How important you are
![Page 15: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/15.jpg)
Request 1G of memory– bsub -R “rusage[mem=1000]” myscript
Standard out from previous job……
……
![Page 16: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/16.jpg)
CPU factornumber of processors
lshosts –static resource information for the machines
![Page 17: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/17.jpg)
LSF selects which job to run next based on:
• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement
• Current load conditions• How important you are
![Page 18: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/18.jpg)
lsload-current dynamic load activity
…
…
Accept job ? Load index
CPU utilization
available RAM
available swap space
free space in /tmp
![Page 19: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/19.jpg)
bhosts –static and dynamic resources
Accept job?
Max jobSlots per user
Max jobSlots per host
JobSlots started
![Page 20: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/20.jpg)
LSF selects which job to run next based on:
• Resources requirements of the applications – queue: bsub –q sq32hp myscript– job requirement
• Current load conditions• How important you are
![Page 21: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/21.jpg)
User priority
• bqueues -l normal…
…
![Page 22: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/22.jpg)
Commands we have learned
• bsub• bjobs• bpeek• bstop• bresume• bkill
• bqueues• lshosts• lsload• bhosts
![Page 23: Speed up your research: How to get 40 computers to do your ...barc.wi.mit.edu/education/hot_topics/lsf/Running... · • barra: 4 GB RAM • LSF (Load Sharing Facility) Cluster –](https://reader034.fdocuments.us/reader034/viewer/2022042301/5eccb0eaa0af283cb576e7d5/html5/thumbnails/23.jpg)
References
• Platform LSF Reference:– Descriptions of all commands
• Running Jobs with Platform LSF– Introduction to basic concepts of LSF software to
run and monitor jobs
http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.html