Applied Bioinformatics Introduction to Linux and R Bing Zhang
Department of Biomedical Informatics Vanderbilt University
[email protected]
Slide 2
Quick summary of the introduced Linux commands 2 CommandMeaning
rsh Remote shell passwdModify a users password exitExit the shell
pwdDisplay the path of the current directory lsList files and
directories ls -aList all files and directories ls -a -lList all
files and directories in a long listing format mkdir Make a
directory cd Change to named directory cdChange to home directory
cd ~Change to home directory cd..Change to parent directory rmdir
Remove a directory moreView the contents of a file cp Copy file1
and name the copied file file2 mv Move or rename file1 to file2 rm
Remove a file man Display manual pages for a command
Slide 3
Getting help man (display manual pages for a command) space bar
to show next page up and down arrows to move up and down q to exist
3
Slide 4
Exercise 4 TaskCommand Go to home directorycd Display manual
pages for the command lsman ls List the contents of the current
directoryls List the contents of the current directory, including
entries starting with. and using a long listing format ls -a -l
Create a test directory if you dont have one yet, ignore this if
you already have it mkdir test Go to the test directorycd test Copy
the file sample_data.txt under directory /home/igptest to current
directory with the same name cp /home/igptest/sample_data.txt. View
the content of the created filemore sample_data.txt Make a copy of
the filecp sample_data.txt sample_data_copy.txt View the content of
the new copymore sample_data_copy.txt List the contents of the
current directoryls Remove the new copyrm sample_data_copy.txt List
the contents of the current directoryls
Slide 5
Data manipulation with filters Filters: programs that accept
textual data and then transform it in a particular way. head, tail,
cut, sort, uniq, sed 5 TaskCommand View the content of a filemore
sample_data.txt Get the first 10 lines of the filehead
sample_data.txt Get the first 5 lines of the filehead -n 5
sample_data.txt Get all but the last 5 lines of the filehead -n -5
sample_data.txt Get the last 10 lines of the filetail
sample_data.txt Get the last 5 lines of the filetail -n 5
sample_data.txt Get all lines starting from line 5tail -n +5
sample_data.txt Get the first three columns of the filecut -f 1-3
sample_data.txt Get selected columns of the filecut -f 1,3,5
sample_data.txt Sort all lines based on the numerical values in the
second column (non-numeric entries are interpreted as zero) sort -k
2 -n sample_data.txt
Slide 6
Data manipulation with piping and redirection Piping (|) :
sending data from one program to another program. Redirection:
sending output from one program to a file >: save output to a
file >>: append output to a file 6 TaskCommand Get the first
10 lines of the file and then get the first three columns head
sample_data.txt | cut -f 1-3 Get the first 10 lines of the file,
then get the first three columns of these lines, and then redirect
the content to a new file head sample_data.txt | cut -f 1-3
>sample_data_subset.txt View the new filemore
sample_data_subset.txt Append the last 10 lines of the old file to
the end of the new file tail sample_data.txt >>
sample_data_subset.txt View the new filemore
sample_data_subset.txt
Slide 7
Editing files with nano nano is a user-friendly text editor A
quick tutorial
http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.htmlhttp://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html
7 TaskCommand Open sample_data.txt for editingnano sample_data.txt
Delete the text Line_01 and the space after it, save the file, and
then exit In nano, ^O for saving and ^X for exit View the edited
filemore sample_data.txt View the content of the.bashrc file, which
is located under your home directory. The file includes commands
that are executed when starting the system. more ~/.bashrc
Open.bashrc file under your home directory for editing.nano
~/.bashrc Add setpkgs a R to the end of this file. This will allow
you to use the R environment which has been installed in the ACCRE
system for statistical computing. In nano, ^O for saving and ^X for
exit View the edited.bashrc filemore ~/.bashrc Run the.bashrc
filesource ~/.bashrc
Slide 8
What is R R is a free software environment for statistical
computing and graphics. It includes: an effective data handling and
storage facility a suite of operators for calculations on arrays,
in particular matrices a large, coherent, integrated collection of
intermediate tools for data analysis graphical facilities for data
analysis and display either on-screen or on hardcopy a
well-developed, simple and effective programming language which
includes conditionals, loops, user-defined recursive functions and
input and output facilities 8
Slide 9
R Installation and tutorial Download and install R
http://www.r-project.org/ http://www.r-project.org/ Choose a CRAN
(Comprehensive R Archive Network) mirror Binary distributions of
the base system and contributed packages Windows version Mac OS X
version Linux version (already installed on the ACCRE cluster, will
be used for this module) Tutorials
http://cran.r-project.org/doc/manuals/r-release/R-intro.html
http://cran.r-project.org/doc/manuals/r-release/R-intro.html An
introduction to R 9
Slide 10
R interface 10 Command-line R: Linux/OS X Type R in your Linux
shell to start R; Type q() in the R interface to close R. R Gui: OS
X (Windows Gui is similar) Download and Install on your laptop
Rstudio: Power and user-friendly user interface for R. Excellent
for both beginners and developers (http://www.rstudio.com/)
Slide 11
Install and load packages CRAN packages
http://cran.r-project.org/web/packages/
http://cran.r-project.org/web/packages/ >6000 packages
BioConductor packages http://www.bioconductor.org/
http://www.bioconductor.org/ ~1000 packages for the analysis of
high-throughput genomics data 11 TaskR code Install a CRAN
packageinstall.packages (package name) Install a BioConductor
packagesouce (http://www.bioconductor.org/biocLite.R) biocLite
(package name) Load a package/librarylibrary (package name)
Slide 12
Basic R syntax Object
Operators and calculations Comparison operators: ==, !=,, =
Logical operators: & (AND), | (OR), ! (NOT) Calculations
Arithmetic operators: +,-,*,/,^ Arithmetic functions: log, exp,
sqrt, mean, var, sd, sum, etc. 15 TaskR code Comparisons3==5 3!=5
30 & y>0 Calculations(4+2^2)/(2*2) x