Cluster Computing Applications for Bioinformatics

8
Cluster Computing Applications for Bioinformatics • Thurs., Sept. 20, 2007 • process management • shell scripting • Sun Grid Engine • running parallel programs

description

Cluster Computing Applications for Bioinformatics. Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs. Accessing the Cluster. ssh username@server -X to enable X forwarding ssh compute-#-# to access specific node - PowerPoint PPT Presentation

Transcript of Cluster Computing Applications for Bioinformatics

Page 1: Cluster Computing Applications for Bioinformatics

Cluster Computing Applications for Bioinformatics

• Thurs., Sept. 20, 2007

• process management

• shell scripting

• Sun Grid Engine

• running parallel programs

Page 2: Cluster Computing Applications for Bioinformatics

Accessing the Cluster

• ssh username@server– -X to enable X forwarding

• ssh compute-#-# to access specific node

• qrsh to access the least busy node

• cluster-fork command to run on every node

Page 3: Cluster Computing Applications for Bioinformatics

Managing Processes

• ps – list your running processes– -f : show file information– -e : list everyone's processes

• top – current top processes by CPU and memory use

• kill – terminate a process by number– killall to kill by program name

• command & - run in background– bg - show background tasks

• nice / renice – set priority

Page 4: Cluster Computing Applications for Bioinformatics

The Shell

•Unix command interpreter

•bash – Bourne Again Shell

•.bashrc and .bash_profile

–settings for your shell environment

cd ~

ls -a

vi .bash_profile

echo $PATH

Page 5: Cluster Computing Applications for Bioinformatics

Shell Scripting

•Automate common tasks

– create directory structure required for sequence assembly

mkdir ~/bin

cd /share/bio/examples/

cp makeseqdir ~/bin

cd TFL

makeseqdir

Page 6: Cluster Computing Applications for Bioinformatics

Distributed Shell Scripts

•Preface CPU intensive commands with qrsh -cwd

•qtcsh– shell that does this

automatically based on ~/.qtask file

– Does not work

cd /share/bio/examples/

cp assemble ~/bin

assemble

Page 7: Cluster Computing Applications for Bioinformatics

Sun Grid Engine - SGE

• Job queue and load balancing

• commands:– qrsh / qtcsh– qstat -f : show status of jobs / queues– qdel : delete a job from the queue– qmon : graphical interface– qsub : submit job

Page 8: Cluster Computing Applications for Bioinformatics

Running Parallel Programs•MPI – Message Passing Interface

•must be launched with mpirun or as a script with qsub

•mpiblast - parallel version of BLAST

– modify ~/.ncbirc– first run mpiformatdb

–nfrags=n

cd /share/bio/examples

cp .ncbirc ~

cp mpiblast.sh ~

cd ~

qsub -pe mpich 8 mpiblast.sh