4
X-connection to Corona
• possibility to use graphical interfaces • requires locally installed X-emulator• needs more bandwidth
5
Can I log into Corona if I don’t know unix well?
• you can only delete your own files• you need to be an expert to cause big damage
• Security is important! Keep your password fresh and safe.
6
Directories:
$HOME (/mnt/mds/univX/group/username)- permanent, size limit 200 Mb
$METAWRK (/mnt/mds/metawrk/username)- storage time 1 month, no size limit
$WRK (/wrk/username)- storage time 1 week, no size limit
$TMP (/tmp/username)- storage time 1 day, no size limit
$ARCHIVE (/mnt/fs/archive/univX/group/username)- permanent, no size limit, only for storage
Project directory - a spcial large area of permanent disk spacefor the common usage of the group (needs an
application)
7
homehome u1u1
wrkwrk
univ1univ1 oyoy kkayttajkkayttaj
metawrkmetawrk kkayttajkkayttaj
kkayttajkkayttaj
//
tmptmp kkayttajkkayttaj
archive1archive1 univ1univ1 oyoy
project1project1
Fasta_results
data.dat
proj04.tar
Gradu.tar
own_programsown_programs
report.txt
kkayttajkkayttaj
test.rubbish
run.tmp
$HOME
$ARCHIVE
$METAWRK
$WRKDIR
$TMPDIR
Directories of Kalle Käyttäjä (kkayttaj)
fsfs
archivearchive
9
Commands for directories:
cd change directory
ls list the contents of a directory
pwd print (=show) working directory
mkdir make directory
rmdir remove directory
10
Commands for files:
cat print file to screencp copyless view text filerm removemv move/rename a filehead show beginning of a filetail show end of a filegrep find lines containing given text
11
cd Go back to home directory from anywhere.cd .. Move one level up in the directory hierarchy. (cd .. in ”structures” directory moves you to directory ”directory1”)
cp thesis.txt directory1/structures Copies file ”thesis.txt” to the subdirectory “structures”.
cp casein.phy ../directory1/ Copies file “casein.phy” to subdirectory “directory1”
directory1casein.fasta
directory2bunnies.txtcasein.phy
structures
$HOME thesis.txt
Examples of using files and directories
12
Use command “less” to view text files
less filenamereturn (next line)space (next screen)b (previous screen)h (show help for less)q (quit) /string (find string from the file)
ls -la | less (pipe ls output to less)
13
Nano (or pico) text editor
nano filenamectrl-c (line number)ctrl-g (help menu)ctrl-k (cut a line)ctrl-o (save)ctrl-r (read a file)ctrl-v (next page)ctrl-c (find a word)ctrl-x (exit)ctrl-y (previous page)
14
Use eog and ggv for displaying images
Eog can display e.g. jpg, tiff, gif and png files.
eog filename.pgn
Ggv can display ps and pdf files
ggv filename.ps
ps2pdf converts a PostScript file into a pdf-fileps2pdf filename.ps
Note: eog and ggv require X connectionYou can use Scientist’s Interface too ( Settings: show)
15
General features:arrow keys browse previous commandstabulator auto-fille commands or file namesmanual pages man commandcontrol-c stops the currently running program (or process)
Special characters:* (asterisk), wild card, means any text
ls *.fasta
| (pipe) guides output of a command to an input of another commands
ls *.fasta | less
> Writes output to a new file
ls > files_of_the_directory.txt
~ (tilde) means your home directory as does $HOME
cp test.txt ~/file.txt
cp text.txt $HOME
16
Batch queue jobs at CSC
Batch queues in Corona and Sepeli• maximum time limit for interactive jobs is 2 h (CPU h)• longer jobs must be submited through the batch queue system• even rather small jobs can cause overload to the the front node of sepeli
Queue systems aim to optimize the usage of the computing resoiurces- customer defines, how much computing time, memory and processors the job needs- the queue system starts the job when suitable resources are available- during the execution the job can effectively utilize the reserved resources
17
N1 grid engine in Corona and Sepeli
• Both Corona and Sepeli use N1 grid engine queue system
• The maximum time and memory limits are different in Sepeli and Corona
Max. time Max. mem Max. procCorona 168 h ( 7 days) 192 Gb 32Sepeli 240 h (10 days) 4 Gb/subjob 128
18
N1 grid engine in Corona and Sepeli
In minimum, a batch job script must include a computing time estimate and all the commands needed to run the program:
#!/bin/tcsh#$ -l h_rt=24:00:00raxml -n test1 -s ratite.phy -m HKY85
The script file is submitted with command:
qsub batch_job.file
The job can be followed with commands
qstatqstat -u username
19
N1 grid engine in Corona and Sepeli
Structure of a batch queue file#!/bin/tcsh “shebang” tells what command shell to use
The lines containing the batch queue definitions start with #$Most common definitions
#$ -l h_rt=h:min:sec reserved time#$ -l v_mem=max_mem(M,G) maximum memory size#$ -pe cre n_proc Number of processors#$ -o run.log output file#$ -e error.log error file#$ -cwd run job in the directory where it was
submitted (works only in corona)
20
N1 grid engine in Corona and Sepeli
Note that batch jobs start from the home directory with the sameSettings as what the user has just after login.
In the batch job file you must take care of:• Moving to right directory (cd $METAWRK/ or -cwd )• Setting up the program environment (use emboss etc.)• Giving all the parameters what the execution of the commands needs
#!/bin/tcsh#$ -l h_rt=24:00:00#$ -o ratite_run.log#$ -e ratite_run.log cd $METAWRK/birds/raxml -n test1 -s ratite.phy -m HKY85
21
N1 grid engine in Corona and Sepeli
For “interactive” programs you can use <<EOF -structure
#!/bin/tcsh#$ -l h_rt=24:00:00#$ -o mrbayes_run.log#$ -e mrbayes_run.log cd $METAWRK/birds/mrbayes64 <<EOFlog start filename=data.logexecute rat1.nxsmcmcnosumpsumtquitEOF
22
N1 grid engine in Corona and Sepeli
Note that in sepeli batch jobs can only use files that locate in the $WRKDIRDirectory. ($WRKDIR is the “home directory in computing nodes)
For short or interactive jobs you can use interactive batch jobs
qrsh -l h_rt=4:00:00
Qrsh opens an interctive session to a one computong node.The maximum length of the session is defined by -l h_rt
23
More information about CSC Unix environment
Unix operating system:http://www.csc.fi/metacomputer/neuvonta.html.enhttp://www.csc.fi/oppaat/metakone/
Text editors:http://www.csc.fi/cschelp/kaytto/editorit.html.en
25
Advantages of unix EMBOSS
- more programs (e.g. Vienna, hmmer, meme)
- possibility to use list files
- big analysis tasks
- you can analyze the same data with other unix programs (Clustal, Phylip, BLAST, FASTA, etc.)
26
EMBOSS in Corona
• use emboss – initializes EMBOSS • showdb - displays the databases linked to EMBOSS• wossname term - finds programs related to a given term• wossname - lists descriptions of all EMBOSS programs
27
EMBOSS in Corona
• you can start a program by typing its name• you can give parameters interactively
corona > seqretReads and writes (returns) sequencesInput sequence(s): swiss:P12067Output sequence [lyc1_pig.fasta]
• or you can give parameters in command line (you can often feed in more parameters in command line)
corona ~> seqret swiss:P12067Reads and writes (returns) sequences
Output sequence [lyc1_pig.fasta]:
28
EMBOSS file formats
• EMBOSS uses USA (Uniform Sequence Address) description for sequence files.
format::database:name (e.g. fasta::swiss:CAS1_human)
• EMBOSS reads and writes several sequence formats including fasta, gcg, staden, swiss, text, clustal. The default format is fasta. One file can include several sequences
• EMBOSS can use list files, which contain sequence names in USA format. List file has to be indicated with @-character to the program (seqret @list.txt)
• short sequences can be fed in command line using asis::sequence
seqret asis::TGCAGCTGCTGCAGCTGCTGC
29
EMBOSS results
• results are stored to a new file (either text file or image)
• text files can be viewed with less- and pico- programs
• images can be viewed through X-term connections or stored as a postscript file
• Use Scientist’s interface to transport data between your machine and Corona
30
EMBOSS command options
-help short command help
-opt ask more parameters interactively-auto use default parameters
corona ~> seqret -help Mandatory qualifiers: [-sequence] seqall Sequence database USA [-outseq] seqoutall Output sequence(s) USA
Optional qualifiers: (none) Advanced qualifiers: -firstonly bool Read one sequence and stop
General qualifiers: -help bool report command line options. More information on associated and general qualifiers can be found with -help -verbos
31
EMBOSS general options
Many EMBOSS programs use general options that are not included in the help information. For example:
-sbegin starting point in the sequence
-send ending point in the sequence
-sreverse use reverse sequence
-sask ask -sbegin, -send and -sreverse parameters interactively
-osname name of the output file
-ossingle write sequences into separate files
32
Image output
• EMBOSS program asks for image format:
Graphics device[x11]:• x11 = show in the screen (requires X-term connection)• ps = write image into post-script file.• Data = write a data file instead of image
Top Related