“IBS574-ComputationalBiology&Bioinformatics”
Spring2018,Tuesday(02/01),2:00-4:00PM
Linuxshell&shellscripting-II
AshokR.DinasarapuPh.DScientist,Bioinformatics
Dept.ofHumanGenetics,EmoryUniversity,Atlanta
102/01/2018,2-4PM
WhatisShellscripting?
2
• Ascriptisacollectionofcommandsstoredinafile.
• Shellisacommand-lineinterpreter.
• Toexecuteashellscript,ex.“hello.sh”./hello.sh ../hello.sh/home/user_name/script/hello.shbashhello.sh(orshhello.sh)
02/01/2018,2-4PM
3
Easiestwaytodothisis…
Starttyping!
02/01/2018,2-4PM
• SSHallowsyoutoconnecttoyourserversecurelyandperformLinuxcommand-lineoperations.
Alternatively,https://blnx1.emory.edu:22443/
user_name@blnx1:~$
• ~meansyourhomedirectory(/home/user_name) • Windows:downloadPuTTY
https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
4
ConnectingServerviaSSH
1/30/2018,2-4PM
“mkdir”&directorystructure
• Createdirectoriesfromyourhomedirectory(i.e/home/user_name)
Usage:mkdir-pproject/{data,script,out,log} project/data project/script project/out project/log
Usage:cdproject/script502/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vihello.sh
INSERTmode:presskeyslikeiORa&starttyping."i"willletyouinserttextjustbeforethecursor."I"insertstextatthebeginningofthecurrentline."a"willletyouinserttextjustafterthecursor,and"A"willletyoutypeattheendofthecurrentline.
602/01/2018,2-4PM
HowTo:Create/Edittextfiles
Typethefollowingtext:
#!/bin/sh
#Myfirstscript
echo"HelloWorld!"
7
“#!/bin/sh”aspecialcluegiventotheshellindicatingwhatprogramisusedtointerpretthescript.
02/01/2018,2-4PM
HowTo:Create/Edittextfiles
Typethefollowingtext:
HowTo:Create/Edittextfiles
#!/bin/sh
#Myfirstscript
echo"HelloWorld!"
8
#!/bin/bash#!/usr/bin/perl#!/usr/bin/Rscript#!/usr/bin/envpython
“#!/bin/sh”aspecialcluegiventotheshellindicatingwhatprogramisusedtointerpretthescript.
02/01/2018,2-4PM
Viewsymboliclinkusingls–la/bin/sh
SAVEmode:pressesckeyAND:q!fornottosaveOR
:xtosavealltypedcontent.
902/01/2018,2-4PM
Alternatively,use:wqforwriteandquit(saveandexit)
HowTo:SavefileinViTextEditor
• Changethepermissionsoffiles
• Read(r),write(w),andexecute(x)• 3typesofusers(user,group&other)
Usage:ls-l
“chmod”
1002/01/2018,2-4PM
11
“makefileexecutable”
• TomakeitexecutableUsage:chmod+rwxr-xr-xhello.sh
Usage:chmod+xhello.sh
Usage:./hello.sh
• Tomakeitun-executable Usage:chmod-xhello.sh
02/01/2018,2-4PM
12
“makefileexecutable”
• TomakeitexecutableUsage:chmod+rwxr-xr-xhello.sh
Usage:chmod+xhello.sh
Usage:./hello.sh
• Tomakeitun-executable Usage:chmod-xhello.sh
+755
02/01/2018,2-4PM
VariablesinLinux/Shell
#!/bin/bash
#Myfirstscript
i=22
echo“Hello,Iam“$i#echo“Hello,Iam$i”
13
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
VariablesinLinux/Shell
#!/bin/bash
#Myfirstscript
i=22
j="Hello,Iam”echo$j$i
14
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
VariableexpansioninLinux/Shell
#!/bin/bash
#Myfirstscript
i=22j=${i}0K=${i}1echo$i$j$k
15
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
the{}in${}areusefulifyouwanttoexpandthevariable
#!/bin/bash#Myfirstscript
PASS=“test1234”
if[$PASS==“test1234”];thenecho“Correctpassword!!”
fi
16
ConditionsinLinux/Shell
Rememberthatthespacingisveryimportantintheifstatement.
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
#!/bin/bash#Myfirstscript
PASS=“test123”
if[$PASS==“test1234”];thenecho“Correctpassword!!”
elseecho“entercorrectpassword!!”
fi
17
ConditionsinLinux/Shell
Usage:vihello.sh
Usage:./hello.sh02/01/2018,2-4PM
#!/bin/bash#Myfirstscript
foriin{1..10};do echo$i#echo${i}a
done
18
“for”loopinLinux/Shell
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
#!/bin/bash#Myfirstscript
foriin{1..10};do echo$i#echo${i}a
done
19
“for”loopinLinux/Shell
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
Alternatively,$(seq0110)$(seq0210)
12345678910
#!/bin/basharr=(‘A’‘B’‘C’‘D’‘E’)foriin{0..4};doecho$iecho${arr[$i]}
done
20
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
“Array”inLinux/Shell
#!/bin/sh#Myfirstscript
foriin123xyzdo echo$idone
21
“for”loopinLinux/Shell
Usage:vihello.sh
Usage:./hello.sh
loop indices doesn't have to be just numbers!!!
02/01/2018,2-4PM
{1..3}xyzbashsupports
#!/bin/bash#Myfirstscript
i=0
while[$i-le5];do#echo“before$i”i=$(($i+1))echo“after$i”
done
22
Usage:vihello.sh
“while”loopinLinux/Shell
02/01/2018,2-4PM
Usage:./hello.sh
#!/bin/bash#functiondefinitionadd_user(){
USER=$1PASS=$2echo“Passwd$PASScreatedfor$USERon$(date)”
}#functioncallecho$(add_userbobletmein)
23
FunctionsinLinux/Shell
Usage:vihello.sh
Usage:./hello.sh
02/01/2018,2-4PM
#!/bin/bash#functiondefinitionadd_user(){
#USER=$1#PASS=$2echo“Passwd$1createdfor$2on$3”
}#functioncallecho$(add_userbobletmein“$(date)”)
24
FunctionsinLinux/Shell
Usage:vihello.sh
02/01/2018,2-4PM
Usage:./hello.sh
downloadafastqfile
Usage:cdproject/datawgethttps://github.com/CGATOxford/UMI-tools/releases/download/v0.2.3/example.fastq.gz• ViewUsage:zcatexample.fastq.gz|head
• CheckfilesizeUsage:ls-lhexample.fastq.gz
• CountnumberofsequencesinafastqfileUsage:grep-c"^@"example.fastq.gz
2502/01/2018,2-4PM
02/01/2018,2-4PM 26
zcatexample.fastq.gz|[email protected]+SRR2057595.71=DFFFFHHHHHJJJFGIJIJJIJ@SRR2057595.9TTGGTTCAATCTGATGCCCTCTTCTGGTGCATCTGAAGACAGCTACAGTGTACTTAGATATAATAAATAAATCTT+SRR2057595.94=DFDBDHHFHHIGGEHJGGIHGHGGCAFCHGIGEHIJJJJIJJJIHIIIIIIJIIIIIGHIIGGIJGIIJIIJ@@SRR2057595.14TGGGTTAATGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT…
SequenceID
Sequence
• AFASTQfilenormallyusesfourlinespersequence
#!/bin/bash#cdproject/log(torunthisscriptfromlogdirectory)
zcat../data/example.fastq.gz|\
awk‘{if(NR%4==2)printlength($1)}’>../out/length.txt
27
“Calculatethelengthofreads”
Usage:vifastq.sh
Usage:../script/fastq.sh
Createthefollowingexecutablefileatproject/script
02/01/2018,2-4PM
28
Usage:vihist.R
02/01/2018,2-4PM
“Plot-Histogram”Createthefollowingexecutablefileatproject/script
#!/usr/bin/Rscript#cdproject/log(torunthisscriptfromlogdir)t.dat<-read.table(‘../out/length.txt’)jpeg(‘../out/rplot.jpg’)hist(t.dat[,1])dev.off()
Usage:../script/hist.R
#!/bin/bash#cdproject/log(torunthisscriptfromlogdirectory)
zcat../data/example.fastq.gz|\
awk‘{if(NR%4==2)printlength($1)}’>../out/length.txt
#includeRscriptandrun
../script/hist.R
29
“Calculatethelengthofreads”
Usage:vifastq.sh
Usage:../script/fastq.sh
Createthefollowingexecutablefileatproject/script
02/01/2018,2-4PM
#!/bin/bash#cdproject/log(torunthisscriptfromlogdirectory)
zcat../data/example.fastq.gz|\
awk‘{if(NR%4==2)printlength($1)}’>../out/length.txt
#includeRscriptandrun
../script/hist.R
30
“Calculatethelengthofreads”
Usage:vifastq.sh
Usage:../script/fastq.sh2>error.log
Createthefollowingexecutablefileatproject/script
02/01/2018,2-4PM
Redirecterroroutputtoafile
Open/ViewImage
02/01/2018,2-4PM 31
• LoginasinteractivemodeUsage:[email protected]
Usage:xdg-openrplot.jpg
-XEnablesX11forwarding-YEnablestrustedX11forwardingX11istheXWindowSystem
[email protected]:~/project/out/rplot.jpg/Users/adinasarapu/
Todownloadfromserver:
AWK
02/01/2018,2-4PM 32
• AWKisaninterpretedprogramminglanguage&designedfortextprocessing.
Theword"AWK"isderivedfromtheinitialsofthelanguage'sthreedevelopers:A.Aho,B.W.KernighanandP.Weinberger.
• FollowingarethevariantsofAWK• AWK-the(veryold)originalfromAT&T• NAWK-Anewer,improvedversionfromAT&T• GAWK–ItisGNUAWK.TheFreeSoftware
foundation'sversion
BasicStructureofAWK
02/01/2018,2-4PM 33
pattern{action}
• Thepatternspecifieswhentheactionisperformed.
• Bydefault,AWKexecutecommandsoneveryline.
• AWKreadsalinefromtheinputstream(file,pipe,orSTDIN)
• Thisprocessrepeatsuntilthefilereachesitsend.
BasicStructureofAWK
02/01/2018,2-4PM 34
BEGIN{print"START"}pattern{print}END{print"STOP"}
• BEGINblockexecutesonlyonce.Thisisgoodplacetoinitializevariables.
• ThebodyblockappliesAWKcommandsoneveryinputline.• TheENDblockexecutesattheendoftheprogram.
• Twootherimportantpatternsarespecifiedbythekeywords"BEGIN"and"END”
pattern{action}
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
Fileprocessing/Printingrecords
35
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
awk'{print}'<marks.txt
awk'BEGIN{printf"SNo\tName\tSub\tMarks\n"}{print}'<marks.txt
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
PrintingFields(Columns)
36
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
awk'{print}'<marks.txtawk'{print$0}'<marks.txtawk'{print$1,$2,$3,$4}'<marks.txt (space)awk'{print$1"\t"$2"\t"$3"\t"$4}'<marks.txt (tab)
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
PrintingColumnsbyPattern
37
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
awk’/E/{print}'<marks.txtawk’/E/'<marks.txtawk'/A/{print$2"\t"$4}'<marks.txt (tab)
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
CountbyPatternmatch
38
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
awk'BEGIN{i=0}/A/{i+=1}END{printi}'<marks.txt
awk'BEGIN{i=0}/A/{i+=1}END{printi}'<marks.txt
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
CountbyPatternmatch
39
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
awk'BEGIN{i=0}/A/{i+=1}END{printi}'<marks.txt
awk\'BEGIN{i=0}\/A/{i+=1}\END{printi}\'<marks.txt
awkisone-liner
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
CountbyPatternmatch
40
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89#!/bin/bash
awk'BEGIN{i=0}/A/{i+=1}END{printi}'<marks.txt
./marks.sh
02/01/2018,2-4PM
Chooseatexteditor:emacs,Vim
Usage:vimarks.txt
CountbyPatternmatch
41
1 Amit Physics 882 Ralf Maths 903 ShyamBiology 874 Kedar English 855 Ashok History 89
#!/usr/bin/awk–fBEGIN{i=0}/A/{i+=1}END{printi} awk-fmarks.awk<marks.txt
02/01/2018,2-4PM
StandardAWKvariables
02/01/2018,2-4PM 42
envXDG_SESSION_ID=38TERM=xterm-256colorSHELL=/bin/bashSSH_CLIENT=10.110.52.302895222SSH_TTY=/dev/pts/8USER=adinasarapu…
awk'BEGIN{printENVIRON["USER"]}'awk'BEGIN{printENVIRON["SHELL"]}'awk'BEGIN{printENVIRON["HOSTNAME"]}’
awk'END{printFILENAME}'marks.txt(PleasenotethatFILENAMEisundefinedintheBEGINblock)
StandardAWKvariables
02/01/2018,2-4PM 43
FS:InputFieldSeparator(defaultisspace)OFS:OutputFieldSeparator(defaultisspace)
FS,OFS,RS,ORS,NRandNF
env|awk'BEGIN{FS="=";}{print$1}’env|awk'BEGIN{FS="=";OFS=":"}{print$1,$2}'
StandardAWKvariables
02/01/2018,2-4PM 44
FS:InputFieldSeparator(defaultisspace)OFS:OutputFieldSeparator(defaultisspace)
awk'{print$1,$2,$3,$4}'<marks.txt (space)
awk'{print$1"\t"$2"\t"$3"\t"$4}'<marks.txt (tab)awk'BEGIN{FS="\t";OFS="\t"}{print$1,$2,$3,$4}'<marks.txt
FS,OFS,RS,ORS,NRandNF
StandardAWKvariables
02/01/2018,2-4PM 45
FS:InputFieldSeparator(defaultisspace)OFS:OutputFieldSeparator(defaultisspace)
awk'{print$1,$2,$3,$4}'<marks.txt (space)
awk'{print$1"\t"$2"\t"$3"\t"$4}'<marks.txt (tab)awk'BEGIN{FS="\t";OFS="\t"}{print$1,$2,$3,$4}'<marks.txt
awk-F'\t''BEGIN{OFS="\t"}{print$1,$2,$3,$4}'<marks.txt
FS,OFS,RS,ORS,NRandNF
StandardAWKvariables
02/01/2018,2-4PM 46
RS:InputRecordSeparator(defaultisNewline)ORS:OutputRecordSeparator(defaultisNewline)
awk'BEGIN{OFS=":";ORS=";"}{print$2,$4}END{print"\n"}'<marks.txt
awk'{print$1,$2,$3,$4}'<marks.txt (space)
awk'{print$1"\t"$2"\t"$3"\t"$4}'<marks.txt (tab)awk'BEGIN{FS="\t";OFS="\t"}{print$2,$4}'<marks.txt
FS,OFS,RS,ORS,NRandNF
StandardAWKvariables
02/01/2018,2-4PM 47
NR:RecordnumberNF:NumberofFieldsinaRecord
catmarks.txt|awk'{printNR,"->",NF}'
catmarks.txt|awk'NR<4' (getfirst3records)
catmarks.txt|awk'{printNR}'|tail-1(numberofrecords)
catmarks.txt|awk'{total=total+NF}END{printtotal}'
catmarks.txt|wc–l (Numberoflines/rows)catmarks.txt|wc–w (Numberofwords)
FS,OFS,RS,ORS,NRandNF
AWK-ControlFlow
02/01/2018,2-4PM 48
iforifelse
#!/bin/bashawk'BEGIN{ num =10; if (num % 2==0) printf"%disevennumber.\n",num; else printf"%disoddnumber.\n",num}' '
• AWKprovidesconditionalstatementstocontroltheflowofaprogram
AWK-ControlFlow
02/01/2018,2-4PM 49
forloop
#!/bin/bashawk'BEGIN{ arr[0] =11;arr[1] =22;arr[2] =33; for(iinarr)printf"%disanumber.\n",arr[i]}' '
• AWKprovidesconditionalstatementstocontroltheflowofaprogram
awk'BEGIN{arr[0]=11;arr[1]=22;for(iinarr)printf"%disanumber.\n",arr[i]}'
inisusedwhileaccessingarrayelements
AWK-ControlFlow
02/01/2018,2-4PM 50
zcatexample.fastq.gz|[email protected]+SRR2057595.71=DFFFFHHHHHJJJFGIJIJJIJ@SRR2057595.9TTGGTTCAATCTGATGCCCTCTTCTGGTGCATCTGAAGACAGCTACAGTGTACTTAGATATAATAAATAAATCTT+SRR2057595.94=DFDBDHHFHHIGGEHJGGIHGHGGCAFCHGIGEHIJJJJIJJJIHIIIIIIJIIIIIGHIIGGIJGIIJIIJ@@SRR2057595.14TGGGTTAATGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT…
SequenceID
Line1beginswitha'@'characterandisfollowedbyasequenceidentifierLine2istherawsequencelettersLine3beginswitha'+'characterandisoptionallyfollowedbythesamesequenceidentifierLine4encodesthequalityvaluesforthesequenceinLine2
Sequence
• AFASTQfilenormallyusesfourlinespersequence
AWK-ControlFlow
02/01/2018,2-4PM 51
zcatexample.fastq.gz|[email protected]+SRR2057595.71=DFFFFHHHHHJJJFGIJIJJIJ@SRR2057595.9TTGGTTCAATCTGATGCCCTCTTCTGGTGCATCTGAAGACAGCTACAGTGTACTTAGATATAATAAATAAATCTT+SRR2057595.94=DFDBDHHFHHIGGEHJGGIHGHGGCAFCHGIGEHIJJJJIJJJIHIIIIIIJIIIIIGHIIGGIJGIIJIIJ@@SRR2057595.14TGGGTTAATGCGGCCCCGGGTTCCTCCCGGGGCTACGCCTGTCTGAGCGTCGCT…
SequenceID
zcatexample.fastq.gz|awk'{if(NR%4==2)print}'|head-20zcatexample.fastq.gz|awk'{if(NR%4==2)print$1}'|head-20zcatexample.fastq.gz|awk'{if(NR%4==2){printsubstr($0,1,10)}}'|head-20zcatexample.fastq.gz|awk'END{printNR/4}’
Sequence
• AFASTQfilenormallyusesfourlinespersequence
AWK-ControlFlow
02/01/2018,2-4PM 52
#Separate(filter)FASTQreadsbasedontheirlength
#!/bin/bashzcatexample.fastq.gz|\
awk 'NR%4 == 1 {ID=$0}NR%4==2 {SEQ=$0}
NR%4==3{PLUS=$0}NR%4==0 {QUAL=$0}{
seqlen=length(SEQ) qualen=length(QUAL) if(seqlen==qualen&&seqlen>=75) printID"\n"SEQ"\n"PLUS"\n"QUAL|\ "gzip>filtered.fastq.gz" }'
AWK-ControlFlow
02/01/2018,2-4PM 53
#Separate(filter)FASTQreadsbasedontheirlength
#!/bin/bashzcatexample.fastq.gz|\
awk 'NR%4 == 1 {ID=$0}NR%4==2 {SEQ=$0}
NR%4==3{PLUS=$0}NR%4==0 {QUAL=$0}{
seqlen=length(SEQ) qualen=length(QUAL) if(seqlen==qualen&&seqlen>=75) printID"\n"SEQ"\n"PLUS"\n"QUAL|\ "gzip>filtered.fastq.gz" }'
echo$((1%4)) 1echo$((2%4)) 2echo$((3%4)) 3echo$((4%4)) 0echo$((5%4)) 1echo$((6%4)) 2echo$((7%4)) 3echo$((8%4)) 0……
54
Thank you
Practice Makes Perfect
02/01/2018,2-4PM
55
StringComparisonOperators
Operator Description Example
=or== IsEqualTo if["$1"=="$2"]!= IsNotEqualTo if["$1"!="$2"]> IsGreaterThan(ASCIIcomparison) if["$1">"$2"]>= IsGreaterThanOrEqualTo if["$1">="$2"]< IsLessThan if["$1"<"$2"]<= IsLessThanOrEqualTo if["$1"<="$2"]-n IsNotNull if[-n"$1"]-z IsNull(ZeroLengthString) if[-z"$1"]
02/01/2018,2-4PM
56
Operator Description Example
-eq IsEqualTo if[$1-eq200]-ne IsNotEqualTo if[$1-ne1]-gt IsGreaterThan if[$1-gt15]-ge IsGreaterThanOrEqualTo if[$1-ge10]-lt IsLessThan if[$1-lt5]-le IsLessThanOrEqualTo if[$1-le0]== IsEqualTo if(($1==$2))!= IsNotEqualTo if(($1!=$2))< IsLessThan if(($1<$2))<= IsLessThanOrEqualTo if(($1<=$2))> IsGreaterThan if(($1>$2))>= IsGreaterThanOrEqualTo if(($1>=$2))
IntegerComparisonOperators
02/01/2018,2-4PM
Top Related