MCB3895-004 Lecture #3 Sept 2/14

22
MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal

description

MCB3895-004 Lecture #3 Sept 2/14. Intro to UNIX terminal. Introduction to UNIX. Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS) Very little bioinformatics software runs on Windows - PowerPoint PPT Presentation

Transcript of MCB3895-004 Lecture #3 Sept 2/14

Page 1: MCB3895-004 Lecture  #3 Sept 2/14

MCB3895-004 Lecture #3Sept 2/14

Intro to UNIX terminal

Page 2: MCB3895-004 Lecture  #3 Sept 2/14

Introduction to UNIX

• Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS)

• Very little bioinformatics software runs on Windows

• Bioinformatics is very strongly tied to the open-source software movement

• Lots of help available on-line• Most programs are free• Windows is not very open-source friendly

Page 3: MCB3895-004 Lecture  #3 Sept 2/14

Windows users:

• Option 1: Do all of your work connected to the Biotechnology Cluster server. Download sshclient (ftp://ftp.uconn.edu/restricted/ssh/)

• Option 2: Install LINUX to run in parallel with Windows (e.g., Biolinux http://nebc.nerc.ac.uk/tools/bio-linux)

Page 4: MCB3895-004 Lecture  #3 Sept 2/14

Terminal

• The terminal is the primary way to do computational biology

• Mac: Utilities/Applications/

Terminal

• Linux: Applications/Accessories/

Terminal

• Windows: sshclient

Page 5: MCB3895-004 Lecture  #3 Sept 2/14

Assignment

• A handy resource to learn the basics of UNIX is the “Unix and Perl Primer for Biologists”, which can be found here: http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf

• The commands they demonstrate mainly involve creating, removing and moving around files and directories

• Once you learn them, these commands will take you far beyond what you can do with a more familiar GUI like Mac Finder or Windows Explorer

Page 6: MCB3895-004 Lecture  #3 Sept 2/14

Worthy of special comment

1. Directory trees

2. Using tab to autocomplete

3. Wildcard characters like * to perform the same operation to multiple files (this is insanely useful once you get the hang of it!)

4. Using nano as a very basic text editorNever, ever, ever use Word for this!

5. Use underscores “_” not spaces in your filenames

Page 7: MCB3895-004 Lecture  #3 Sept 2/14

Directory trees

• All computer files are organized hierarchically

• Each folder has an address

/Users/Jonathan/

Laptop_backup/Destop/

e-Books

Page 8: MCB3895-004 Lecture  #3 Sept 2/14
Page 9: MCB3895-004 Lecture  #3 Sept 2/14

A quick reference to where you are in UNIX• “/” - root

• “~” - your user home directory

• “.” - “here”, the directory you are in now

• “../” - one level up in the directory tree

Page 10: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• “>” (greater than) redirects the output of a command into a new file

e.g., ls * > list • a list of the files in this directory is now stored in the

file “list”

Page 11: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• cat joins multiple files together

e.g., cat file1 file2 > file3 • file3 contains file1 and file2 joined together• file1 and file2 still exist as they were

Page 12: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• grep extracts all lines containing a particular pattern from a file

e.g., grep “NP_” file1 • Prints every line that contains the pattern “NP_” to

the screen

Page 13: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• wc counts the newlines, words and bytes in a file

e.g., wc file1 • Prints an output like this:

10602 18921752002 file1

newlines words bytes filename

Page 14: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• “|” (pipe) directs the output of one command into another

e.g., grep “NP_” file1 | wc • Sounds the output of the grep command into wc,

because grep extracts lines from a file, can be used to count the number of lines matching the grep expression

e.g., grep “NP_” file1 | less• Displays grep result as a list you can scroll through

Page 15: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• gzip/gunzip: single file compressione.g., gunzip file.txt.gz• Decompresses file.txte.g., gzip file.txt• Creates compressed file file.txt.gz, removes file.txt

Page 16: MCB3895-004 Lecture  #3 Sept 2/14

More UNIX tricks

• tar: file archive managemente.g., tar -cf all.tar * • Creates tar archive all.tar containing all files in

that directory, individual files unchangede.g., tar -xf all.tar• Extracts all files from tar archive all.tar to the

current directory, all.tar not deleted

• tar is very commonly used before gzip - “tarballs”

Page 17: MCB3895-004 Lecture  #3 Sept 2/14

Connecting to the Bioinformatics facility server• UNIX command ssh

• e.g., ssh -l jlklassen bbcsrv3.biotech.uconn.edu

• Will ask for a password• If the first time connecting, will want you to authenticate

an RSA key (security feature)

• Your terminal now controls the bioinformatics facility server, not your own machine

• You can have multiple terminals open at the same time

Page 18: MCB3895-004 Lecture  #3 Sept 2/14

Transferring files to the Bioinformatics facility server• Method 1: Filezilla (

https://filezilla-project.org/)

• Nice GUI

• Works on all platforms

• Install the client, not the server

Page 19: MCB3895-004 Lecture  #3 Sept 2/14

Transferring files to the Bioinformatics facility server

• Method 2: UNIX command scp• e.g., scp [email protected]:all.tar all.tar

• Copy all.tar from my computer to the biotech server• e.g., scp -r [email protected]:dir/ .

• Copy the directory “dir” from the biotech server to the current working directory

• “-r” flag indicates “recursive”, needed for directories

Page 20: MCB3895-004 Lecture  #3 Sept 2/14

Text editors

• Using nano works, but can be cumbersome for complex tasks

• Word is always bad! Adds layers you don’t see.

• Mac and LINUX have TextEdit and Gedit as default text editors, both work well

• Windows: Notepad and Wordpad are insufficient. I suggest downloading Gedit for Windows (https://wiki.gnome.org/Apps/Gedit)

• Other options exist for all platforms

Page 21: MCB3895-004 Lecture  #3 Sept 2/14

Assignment

• See instructions posted on the website at http://wp.mcb3895.mcb.uconn.edu

• Part 1: work through Korf manual sections U1-U27 (some commands require external files, ignore these but understand what they do)

• Part 2: log on to the Biotech server, download a genome from NCBI and answer the questions given

• The assignment is due at the start of class 1 week from today

Page 22: MCB3895-004 Lecture  #3 Sept 2/14

Command line power!

• The simplest way to download these data is to use the terminal command wget

$ wget –r --no-directories --retr-symlinks -P Acaricomes_phytoseiuli/ ftp://ftp.ncbi.nlm.gov/genomes/refseq/bacteria/Acaricomes_phytoseiuli/latest_assembly_versions/GCF_000376245.1_ASM37624v1/

• Deconstructed:• -r – “recursive”, i.e., download everything in this directory• --no-directories – does not create the entire ftp directory

structure• --retr-symlinks – NCBI uses a fancy file structure using

something called “symbolic links”, where a file points to another file somewhere else. “--retr-symlinks” gets the actual files, not just the links

• -P Acaricomes_phytoseuili/ – where to put the output