INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · •...

25
INTRODUCTION TO UNIX Vivek Krishnakumar JCVI Genomic Science and Leadership Workshop Presented on: 05/26/2016

Transcript of INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · •...

Page 1: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

INTRODUCTION TO UNIX

Vivek Krishnakumar

JCVI Genomic Science and Leadership Workshop

Presented on: 05/26/2016

Page 2: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

Overview

• What is Unix? Where is it used?

• Bioinformatics and Unix

• Paradigm of Local Computers vs Remote Servers

• Unix Command Line Interface (CLI) Issuing commands

• Brief overview of Directory structures

• Brief overview of Unix File permissions

• General Unix rules

• FIN

• Skim through Supplementary slides (will serve as reference material for hands-on)

Page 3: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

WHAT IS UNIX?

• Portable, multi-tasking, multi-user, time-sharing operating system

• It is the leading operating system of choice for servers such as

supercomputers.

• More than 90% of the top 500 fastest computers are based on Unix.

• Available for free (Open Source) or nearly free: Unix-like OSs like Linux,

Minix, BSD, Android

• Mac computers are related to Unix because they are also based on Unix

• Depending on the purpose of the Unix machine, it may or may not have a

Desktop environment that we are familiar with on our personal computers.

• Unix uses X Window System to provide the Desktop environment.

Page 4: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

Source: http://www.jcsystemsconsulting.com/pictures

Page 5: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

BIOINFORMATICS AND UNIX

• Many bioinformatics core tools are written for use with Unix

• BLAST, CLUSTALW, PHRAP, etc.

• Many web applications are also supported on web servers hosted on

Unix-based machines

• Unix supports development and use of software using many

different programming languages (Python, Perl, Java, R, C, C++).

• Multiple users can log in at the same time

• A user logging in over the network can do just about anything a user

sitting in front of the computer can do.

• Which means it can multitask (run many processes in parallel).

Page 6: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

DB

IGV

Server/Remote Computer (Unix)

Personal/Local Computer

(Mac/Windows/Unix)

Terminal

Data

filesBLAST

SSH

WEB

WEB

App

Service

Data

files

SCP

Window

APPLICATIONS AND SERVERS

Thanks to Manpreet Katari (NYU) for slide(s)

Page 7: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

UNIX COMMAND LINE INTERFACE (SHELL)

• Users communicate with the OS via the shell

• The shell interprets the commands typed by the user on the keyboard

• Different shells are available for Unix systems

o sh - bourne shell

o csh - C shell

o bash - bourne-again shell

o zsh - Z shell

• You can type the commands directly at the shell or build scripts to accomplish certain tasks

* During the workshop hands-on sessions, we will be working with the bash shell, one of the most popular unix shells.

Page 8: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

ISSUING COMMANDS

• The command prompt (signified by a $ symbol) requires

that you enter/type the command followed by arguments

(if necessary)

• If there is an output to the program, it usually prints it on

the screen, often referred to as standard output (stdout)

Page 9: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

DIRECTORY STRUCTURES IN UNIX

• Directories are used to organize

files. Analogous to folders in

Windows and Mac OS.

• Important diretories:

o / - 'root’

o /home – Personal user

directories

o /bin - Common programs,

shared by the system and

users

o /lib - Library files required by

the programs

o /usr - Programs, libraries,

documentation etc. for all

user-related programs

o /mnt - Mount point for external

file systems

o /tmp - Temporary space;

cleaned up periodically and on

reboot

Reference: http://www.redhat.com/mirrors/LDP/LDP/intro-

linux/html/sect_03_01.html

Page 10: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

Learn more: http://ccn.ucla.edu/wiki/index.php/UNIX_Permissions,

http://sc.tamu.edu/help/general/unix/unix.html

GRANULAR FILE PERMISSIONS

READ (R), WRITE (W), EXECUTE (X)

Page 11: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

GENERAL UNIX RULES

• Unix is CaSe sEnSiTiVe

o file.txt and File.txt are not one and the same

o This applies to the commands as well (e.g.: ls is not Ls)

o In any directory, you can have only one file with a given name

• Filenames should only contain letters [A-z], numbers [0-9],

underscores [_], dots [.] and hyphens [-].

Absolutely no-spAc3s_what_so.ever

• File extensions are optional but recommended because most

programs follow some standards. Example:

o .txt - File containing plain text data

o .sh - File containing commands to be executed

o .pl/.py - Perl/Python scripts

Page 12: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

Following slides are for your reference

To be used in the Hands-On Session(s)

Page 13: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

GETTING HELP IN UNIX

• Unix is not a very user-friendly system

• The whatis database provides a description for each command:

• If you don't know what command to use, search by keyword:

$ apropos KEYWORD

• Similar to a 'User Manual' which you get with household appliances

and electronic devices, Unix also offers help in the form of "manual"

pages for every command

• These man pages describe all available command line options and

how each option modifies its behavior

• For help at the shell, type man followed by the name of the

command:

$ man ls

Page 14: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

AUTO-COMPLETION & HISTORY

• Unix offers file/directory name auto-completion

o When typing out a file/directory name partially, hit the <tab> key for possible matches

• Unix shell tracks the command history (upto a certain limit, size can be controlled by environment variables).

o Get the last 5 executed commands

$ history | tail -n5

o Re-execute the 25th command

$ !25

$ cd /export

Page 15: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

NAVIGATING THRU DIRECTORIES

• pwd - Print the present working directory (sometimes your

pwd will be visible at the prompt)

• cd DIR - Change to specified directory

• cd .. - Change directory to one level up

• cd or cd ~ - Change directory to the user home

. (current directory)

.. (parent directory)

~ (home directory)

• mkdir DIR - Make a directory

• rmdir DIR - Remove a directory (only if it is empty)

Page 16: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

VIEWING FILE CONTENTS

• less FILE - Page through a file (alternative to more). less allows you to

navigate up/down with the arrow keys on the keyboard. space to page

down. Esc to exit

• cat FILE - Dump the entire file contents to standard out (stdout)

• wc FILE – Perform a word count on the file(s)

(line, word, byte count)

• head FILE - Show first 10 lines of a file (-n to control the number of lines)

• tail FILE - Show last 10 lines of a file

(-n to control the number of lines)

Page 17: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

PIPING & FILE REDIRECTIONS

Unix allows serializing commands using the pipe (I) operator and

redirection of the standard input/output streams (>, >> and <)

Retrieve lines

15 through 20

of the file

>Redirect standard output

(stdout) to file

>> Append stdout to file

<Redirect standard input (stdin) to command

cat Concatenate to stdout

Page 18: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

• Searches through a file or standard input (stdin) stream for patterns

• All matched lines are returned to standard output (stdout)

• Syntax: grep [-options] <pattern> <file name>

• Options:

o -c - Count the number of matches

o -i - Make search case-insensitive

o -v - Invert-match

o Providing search context:

-A4 - extract 4 lines after match

-B3 - extract 3 lines before match

-C4 - extract 4 lines before and after match

• Pattern (Plaintext string or Regular Expressions)

o ^ - Specify the beginning of a line

o $ - Specify the end of a line

Nice introductory article to using regular expressions with grep

http://www.cyberciti.biz/faq/grep-regular-expressions/

PATTERN SEARCHING (GREP)

Page 19: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

CUTTING & PASTING

cut - Extract specific columns from a multi-column delimited file

• Syntax: cut [-options] FILE

• Available options:

o -f1,2,3,6 - Specify the index of the columns

o -d"," - specify the input column delimiter

o --output-delimiter="\t" - used to modify the output delimiter

paste - Join multiple files in desired order

• Syntax: paste FILE1 FILE2 ...

• It writes lines which consists of sequentially corresponding lines from each input FILE[12...]

Page 20: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

SORTING & UNIQ-ING

• sort is a program used to sort the lines of standard input

• Syntax: sort [-options] FILE

• Several options available (investigate using man or invoke --help):

o -k2,2 - sort file by 2nd column

o -n - sort by numeric order

o -r - reverse the sort order

o -t; - specify an alternative field separator

o -u - print only unique (uniq) lines

• uniq is a program used to discard all but one successive identical lines from the input

• Syntax: uniq [-options] FILE

• Available options:o -c - prefix line with count of number of occurrences

o -d - print out only duplicate lines

Page 21: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

• sed (stream editor) is used to modify input streams (stdin or file contents)

• Syntax: sed [-options] PATTERN FILE

• Some example commands

$ sed s/exon/CDS/ old.gff > new.gff There are four parts to this

substitute command:

s Substitute command

/../../ Delimiter

exon Search Pattern

CDS Replacement string

Changes made only to the first occurrence of the pattern on each line

• Some more examples:

$ sed s/exon/CDS/ -i old.gff # in-place change

$ sed s/exon/CDS/g -i old.gff # change globally

$ sed '1/50 s/exon/CDS' old.gff # modify first 50 lines

• Enclose the substitute command within quotes when dealing with complex

patterns

Comprehensive sed manual: http://www.grymoire.com/Unix/Sed.html

STRING SUBSTITUTION USING SED

Page 22: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

FINDING/LOCATING FILES

• Syntax: find PATH EXP

• Searches recursively through all subfolders

• Options:

o -name specify file/folder name

o -iname for case insensitive search

o -type f finds only files and -type d only folders

o -print will print out that path of the file(s) found

o -exec allows you to execute a command on the files found

• Examples

$ find /home/train01 -name ”file.txt”

$ find . -type f -iname "*.sh"

$ find . -name "rc.conf" -print

$ find . -name "*.sh" \

-exec chmod +x '{}' \;

Page 23: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

USEFUL KEYBOARD SHORTCUTS

Manipulate the current command

Ctrl + A Go to the beginning of the command prompt line

Ctrl + E Go to the end of the command prompt line

Ctrl + L Clears the screen, similar to the ‘clear’ command

Ctrl + U Clears the line before the cursor position

Ctrl + K Clears the line after the cursor position

Ctrl + W Deletes the word before the cursor position

Search

Ctrl + R Lets you search through previously used commands

Job Control

Ctrl + C Kills the currently running job

Ctrl + Z Puts the current job into a suspended state

Ctrl + D Exits the current shell

Page 24: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

JOB CONTROL

• The jobs command shows you all the jobs running in the current terminal (with status info)

$ jobs

[1]- Stopped vi run.sh

[2]+ Stopped less file.txt

• Each job is given a number. They can be run in the background or foreground:

$ bg 2 # Run in bg and gives you control of the shell

$ fg 2 # Run in the foreground

• Launch a job in the background directly like so:

$ run.sh &

• List all running jobs (filter by user if necessary):

$ ps -u train01

PID TTY TIME CMD

19231 pts/21 00:00:00 vi

19233 pts/21 00:00:00 less

• Kill any job (by PID, name or job number)

$ kill 19233

$ kill %2

$ killall less

Page 25: INTRODUCTION TO UNIX - JCVImaize.jcvi.org/cellgenomics/outreach/2016/notes/Intro_Unix.pdf · • Different shells are available for Unix systems o sh - bourne shell o csh - C shell

QUESTIONS?