MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.
-
Upload
baldric-sutton -
Category
Documents
-
view
227 -
download
5
Transcript of MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.
![Page 1: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/1.jpg)
MCB3895-004 Lecture #3Sept 2/14
Intro to UNIX terminal
![Page 2: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/2.jpg)
Introduction to UNIX
• Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS)
• Very little bioinformatics software runs on Windows
• Bioinformatics is very strongly tied to the open-source software movement
• Lots of help available on-line• Most programs are free• Windows is not very open-source friendly
![Page 3: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/3.jpg)
Windows users:
• Option 1: Do all of your work connected to the Biotechnology Cluster server. Download sshclient (ftp://ftp.uconn.edu/restricted/ssh/)
• Option 2: Install LINUX to run in parallel with Windows (e.g., Biolinux http://nebc.nerc.ac.uk/tools/bio-linux)
![Page 4: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/4.jpg)
Terminal
• The terminal is the primary way to do computational biology
• Mac: Utilities/Applications/
Terminal
• Linux: Applications/Accessories/
Terminal
• Windows: sshclient
![Page 5: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/5.jpg)
Assignment
• A handy resource to learn the basics of UNIX is the “Unix and Perl Primer for Biologists”, which can be found here: http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf
• The commands they demonstrate mainly involve creating, removing and moving around files and directories
• Once you learn them, these commands will take you far beyond what you can do with a more familiar GUI like Mac Finder or Windows Explorer
![Page 6: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/6.jpg)
Worthy of special comment
1. Directory trees
2. Using tab to autocomplete
3. Wildcard characters like * to perform the same operation to multiple files (this is insanely useful once you get the hang of it!)
4. Using nano as a very basic text editorNever, ever, ever use Word for this!
5. Use underscores “_” not spaces in your filenames
![Page 7: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/7.jpg)
Directory trees
• All computer files are organized hierarchically
• Each folder has an address
/Users/Jonathan/
Laptop_backup/Destop/
e-Books
![Page 8: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/8.jpg)
![Page 9: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/9.jpg)
A quick reference to where you are in UNIX• “/” - root
• “~” - your user home directory
• “.” - “here”, the directory you are in now
• “../” - one level up in the directory tree
![Page 10: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/10.jpg)
More UNIX tricks
• “>” (greater than) redirects the output of a command into a new file
e.g., ls * > list • a list of the files in this directory is now stored in the
file “list”
![Page 11: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/11.jpg)
More UNIX tricks
• cat joins multiple files together
e.g., cat file1 file2 > file3 • file3 contains file1 and file2 joined together• file1 and file2 still exist as they were
![Page 12: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/12.jpg)
More UNIX tricks
• grep extracts all lines containing a particular pattern from a file
e.g., grep “NP_” file1 • Prints every line that contains the pattern “NP_” to
the screen
![Page 13: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/13.jpg)
More UNIX tricks
• wc counts the newlines, words and bytes in a file
e.g., wc file1 • Prints an output like this:
10602 18921752002 file1
newlines words bytes filename
![Page 14: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/14.jpg)
More UNIX tricks
• “|” (pipe) directs the output of one command into another
e.g., grep “NP_” file1 | wc • Sounds the output of the grep command into wc,
because grep extracts lines from a file, can be used to count the number of lines matching the grep expression
e.g., grep “NP_” file1 | less• Displays grep result as a list you can scroll through
![Page 15: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/15.jpg)
More UNIX tricks
• gzip/gunzip: single file compressione.g., gunzip file.txt.gz• Decompresses file.txte.g., gzip file.txt• Creates compressed file file.txt.gz, removes file.txt
![Page 16: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/16.jpg)
More UNIX tricks
• tar: file archive managemente.g., tar -cf all.tar * • Creates tar archive all.tar containing all files in
that directory, individual files unchangede.g., tar -xf all.tar• Extracts all files from tar archive all.tar to the
current directory, all.tar not deleted
• tar is very commonly used before gzip - “tarballs”
![Page 17: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/17.jpg)
Connecting to the Bioinformatics facility server• UNIX command ssh
• e.g., ssh -l jlklassen bbcsrv3.biotech.uconn.edu
• Will ask for a password• If the first time connecting, will want you to authenticate
an RSA key (security feature)
• Your terminal now controls the bioinformatics facility server, not your own machine
• You can have multiple terminals open at the same time
![Page 18: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/18.jpg)
Transferring files to the Bioinformatics facility server• Method 1: Filezilla (
https://filezilla-project.org/)
• Nice GUI
• Works on all platforms
• Install the client, not the server
![Page 19: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/19.jpg)
Transferring files to the Bioinformatics facility server
• Method 2: UNIX command scp• e.g., scp [email protected]:all.tar all.tar
• Copy all.tar from my computer to the biotech server• e.g., scp -r [email protected]:dir/ .
• Copy the directory “dir” from the biotech server to the current working directory
• “-r” flag indicates “recursive”, needed for directories
![Page 20: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/20.jpg)
Text editors
• Using nano works, but can be cumbersome for complex tasks
• Word is always bad! Adds layers you don’t see.
• Mac and LINUX have TextEdit and Gedit as default text editors, both work well
• Windows: Notepad and Wordpad are insufficient. I suggest downloading Gedit for Windows (https://wiki.gnome.org/Apps/Gedit)
• Other options exist for all platforms
![Page 21: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/21.jpg)
Assignment
• See instructions posted on the website at http://wp.mcb3895.mcb.uconn.edu
• Part 1: work through Korf manual sections U1-U27 (some commands require external files, ignore these but understand what they do)
• Part 2: log on to the Biotech server, download a genome from NCBI and answer the questions given
• The assignment is due at the start of class 1 week from today
![Page 22: MCB3895-004 Lecture #3 Sept 2/14 Intro to UNIX terminal.](https://reader036.fdocuments.us/reader036/viewer/2022062716/56649de55503460f94addebd/html5/thumbnails/22.jpg)
Command line power!
• The simplest way to download these data is to use the terminal command wget
$ wget –r --no-directories --retr-symlinks -P Acaricomes_phytoseiuli/ ftp://ftp.ncbi.nlm.gov/genomes/refseq/bacteria/Acaricomes_phytoseiuli/latest_assembly_versions/GCF_000376245.1_ASM37624v1/
• Deconstructed:• -r – “recursive”, i.e., download everything in this directory• --no-directories – does not create the entire ftp directory
structure• --retr-symlinks – NCBI uses a fancy file structure using
something called “symbolic links”, where a file points to another file somewhere else. “--retr-symlinks” gets the actual files, not just the links
• -P Acaricomes_phytoseuili/ – where to put the output