python for biologists
-
Upload
satyadev-polisetti -
Category
Documents
-
view
229 -
download
0
Transcript of python for biologists
-
8/19/2019 python for biologists
1/227
-
8/19/2019 python for biologists
2/227
i
Copyright © 2013 Dr. Martin Jones
This work is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 3.0 Unported License.
For more information, visit http://pythonforbioogists.!om
"et in #$ "erif an% Source Code Pro
http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://pythonforbiologists.com/http://pythonforbiologists.com/http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GB
-
8/19/2019 python for biologists
3/227
ii
About the author
Martin starte% his programming !areer by earning #er %&ring the !o&rse ofhis #hD in evo&tionary bioogy, an% starte% tea!hing other peope to
program soon after. "in!e then he has ta&ght intro%&!tory programming to
h&n%re%s of bioogists, from &n%ergra%&ates to #'s, an% has maintaine% a
phiosophy that programming !o&rses m&st be frien%y, approa!habe, an%
pra!ti!a.
Martin has ta&ght intro%&!tory programming as part of the (ioinformati!sM"! !o&rse at )%inb&rgh *niversity for the past five years, an% is !&rrenty
+e!t&rer in (ioinformati!s.
-
8/19/2019 python for biologists
4/227
iii
Preface
e!ome to #ython for (ioogists.
(efore yo& rea% any f&rther, ma-e s&re that this is the most re!ent version of
the boo-. #ython for (ioogists is being !ontin&ay &p%ate% an% improve% to
ta-e into a!!o&nt !orre!tions, amen%ments an% !hanges to #ython itsef, so
its important that yo& are rea%ing the most &pto%ate version.
$his fie is revision n&mber 189. $he n&mber of the most re!ent revision !an
aays be fo&n% at:
http://pythonforbioogists.!om/in%e.php/version/
'f the revision n&mber iste% at the *+ is higher than the one in bo%, then
this is an o&tof%ate !opy, an% yo& nee% to %onoa% the atest version from
http://pythonforbioogists.!om
o& noti!e from the !opyright page that the !ontents of this boo- are
i!ense% &n%er a Creative Commons 4ttrib&tion "hare4i-e i!ense. $his
means that yo&re free to %o hat yo& i-e ith it 5 !opy it, emai it to yo&rfrien%s, apaper yo&r ab ith it 5 as ong as yo& -eep the attrib&tion. o&
!an aso mo%ify it, as ong as yo& i!ense yo&r mo%ifi!ation &n%er the same
terms. $he ony thing that the i!ense %oesnt ao is !ommer!ia &se 5 if
yo&% i-e to &se the !ontents of this !o&rse for !ommer!ia p&rposes, get in
to&!h ith me at
martin6pythonforbioogists.!om
7appy programming8
http://pythonforbiologists.com/index.php/version/http://pythonforbiologists.com/index.php/version/
-
8/19/2019 python for biologists
5/227
iv
Table of Contents
About the author » ii Preface » iii
1: Introduction and environment 1
Why have a programming book for biologists? » 1
Why Python? » 2
How to use this book » 5
!ercises an" solutions » #
$etting in touch » %
&etting up your environment » %'e!t e"itors » 11
(ea"ing the "ocumentation » 12
2: Printing and manipulating text 13
Why are we so intereste" in working with te!t? » 1)
Printing a message to the screen » 1*
+uotes are important » 15
,se comments to annotate your co"e » 1-
rror messages an" "ebugging » 1% Printing special characters » 21
&toring strings in variables » 21
'ools for manipulating strings » 2*
(ecap » )*
!ercises » )-
&olutions » ).
3: Reading and riting files !2
Why are we so intereste" in working with files? » 52 (ea"ing te!t from a file » 5)
/iles0 contents an" file names » 55
ealing with newlines » 5#
issing files » -3
-
8/19/2019 python for biologists
6/227
v
Writing te!t to files » -3
4losing files » -)
Paths an" fol"ers » -)
(ecap » -* !ercises » -5
&olutions » -#
": #ists and loops $"
Why "o we nee" lists an" loops? » #*
4reating lists an" retrieving elements » #-
Working with list elements » ##
Writing a loop » #.
n"entation errors » %2,sing a string as a list » %)
&plitting a string to make a list » %*
terating over lines in a file » %*
6ooping with ranges » %5
(ecap » %#
!ercises » %.
&olutions » .3
!: %riting our on functions 99Why "o we want to write our own functions? » ..
efining a function » 133
4alling an" improving our function » 13)
ncapsulation with functions » 135
/unctions "on7t always have to take an argument » 13-
/unctions "on7t always have to return a value » 13%
/unctions can be calle" with name" arguments » 13%
/unction arguments can have "efaults » 113
'esting functions » 111
(ecap » 11)
!ercises » 115
&olutions » 11-
-
8/19/2019 python for biologists
7/227
vi
&: Conditional tests 121
Programs nee" to make "ecisions » 121
4on"itions0 'rue an" /alse » 121if statements » 12*
else statements » 125
elif statements » 12-
while loops » 12%
8uil"ing up comple! con"itions » 12%
Writing true9false functions » 1)3
(ecap » 1)1
!ercises » 1))
&olutions » 1)5
$: Regular expressions 1"1
'he importance of patterns in biology » 1*1
o"ules in Python » 1*)
(aw strings » 1**
&earching for a pattern in a string » 1*5
!tracting the part of the string that matche" » 153
$etting the position of a match » 152
&plitting a string using a regular e!pression » 15) /in"ing multiple matches » 15*
(ecap » 155
!ercises » 15#
&olutions » 15%
8: 'ictionaries 1&8
&toring paire" "ata » 1-%
4reating a "ictionary » 1#)
terating over a "ictionary » 1#. (ecap » 1%2
!ercises » 1%)
&olutions » 1%*
9: (iles) programs) and user input 19!
-
8/19/2019 python for biologists
8/227
vii
/ile contents an" manipulation » 1.5
8asic file manipulation » 1.-
eleting files an" fol"ers » 1.%
6isting fol"er contents » 1.% (unning e!ternal programs » 1..
(unning a program » 233
&aving program output » 231
,ser input makes our programs more fle!ible » 231
nteractive user input » 23)
4omman" line arguments » 23*
(ecap » 235
!ercises » 23#
&olutions » 23%
-
8/19/2019 python for biologists
9/227
1 Chapter 1: 'ntro%&!tion an% environmen
1: Introduction and environment
Why have a programming book for biologists?
'f yo&re rea%ing this boo-, then yo& probaby %ont nee% to be !onvin!e% that
programming is be!oming an in!reasingy essentia part of the too -it for
bioogists of a types. o& might, hoever, nee% to be !onvin!e% that a boo- i-e
this one, %eveope% espe!iay for bioogists, !an %o a better 9ob of tea!hing yo& to
program than a generap&rpose intro%&!tory programming boo-. 7ere are a fe of
the reason hy ' thin- that is the !ase.4 bioogyspe!ifi! programming boo- aos &s to &se eampes an% eer!ises that
&se bioogi!a probems. $his serves to important p&rposes: firsty, it provi%es
motivation an% %emonstrates the types of probems that programming !an hep to
sove. )perien!e has shon that beginners ma-e m&!h better progress hen they
are motivate% by the tho&ght of ho the programs they rite i ma-e their ife
easier8 "e!on%y, by &sing bioogi!a eampes, the !o%e an% eer!ises thro&gho&t
the boo- !an form a ibrary of &sef& !o%e snippets, hi!h e !an refer ba!- to
hen e ant to sove reaife probems. 'n bioogy, as in a fie%s of
programming, the same probems ten% to re!&r time an% time again, so its very
&sef& to have this !oe!tion of eampes to a!t as a referen!e 5 something thats
not possibe ith a generap&rpose programming boo-.
4 bioogyspe!ifi! programming boo- !an aso !on!entrate on the feat&res of the
ang&age that are most &sef& to bioogists. 4 ang&age i-e #ython has many
feat&res an% in the !o&rse of earning it e inevitaby have to !on!entrate on some
an% miss others o&t. $he set of feat&res hi!h are important to &s in bioogy aresighty %ifferent to those hi!h are most &sef& for generap&rpose programming
5 for eampe, e are m&!h more intereste% in manip&ating tet in!&%ing things
i-e D;4 an% protein se
-
8/19/2019 python for biologists
10/227
2 Chapter 1: 'ntro%&!tion an% environmen
programming boo-, b&t hi!h are very &sef& to bioogists for eampe, reg&ar
epressions an% s&bpro!esses=. 7aving a bioogyspe!ifi! tetboo- aos &s to
in!&%e these feat&res, aong ith epanations of hy they are parti!&ary &sef&to &s.
4 reate% point is that a tetboo- ritten 9&st for bioogists aos &s to intro%&!e
feat&res in a ay that aos &s to start riting &sef& programs right aay. e !an
%o this by ta-ing into a!!o&nt the sorts of probems that repeate%y !rop &p in
bioogy, an% prioritising the feat&res that are best at soving them. $his boo- has
been %esigne% so that yo& sho&% be abe to start riting sma b&t &sef& programs
&sing ony the toos in the first !o&pe of !hapters.
Why Python?
+et me start this se!tion ith the fooing statement: programming ang&ages are
overrate%. hat ' mean by that is that peope ho are ne to programming ten% to
orry far too m&!h abo&t hat ang&age to earn. $he !hoi!e of programming
ang&age %oes matter, of !o&rse, b&t it matters far ess than peope thin- it %oes. $o
p&t it another ays, !hoosing the >rong> programming ang&age is very &ni-ey
to mean the %ifferen!e beteen fai&re an% s&!!ess hen earning. ?ther fa!tors
motivation, having time to %evote to earning, hepf& !oeag&es= are far more
important, yet re!eive ess attention.
$he reason that peope pa!e so m&!h eight on the :what language shoul" learn?:
-
8/19/2019 python for biologists
11/227
3 Chapter 1: 'ntro%&!tion an% environmen
&sing m&tipe ang&ages. #arty this is 9&st %on to the simpe !onstraints of
vario&s ang&ages 5 if yo& ant to rite a eb appi!ation yo& probaby %o it in
Javas!ript, if yo& ant to rite a graphi!a &ser interfa!e yo& probaby &sesomething i-e Java, an% if yo& ant to rite oeve agorithms yo& probaby
&se C.
"e!on%y, earning a first programming ang&age gets yo& @0A of the ay toar%s
earning a se!on%, thir%, an% fo&rth one. +earning to thin- i-e a programmer in the
ay that yo& brea- %on !ompe tas-s into simpe ones is a s-i that !&ts a!ross
a ang&ages 5 so if yo& spen% a fe months earning #ython an% then %is!over
that yo& reay nee% to rite in C, yo&r time ont have been aste% as yo& be
abe to pi!- it &p m&!h
-
8/19/2019 python for biologists
12/227
B Chapter 1: 'ntro%&!tion an% environmen
• 'ts &se of in%entation, hie annoying to peope ho arent &se% to it, is
great for beginners as it enfor!es a !ertain amo&nt of rea%abiity
#ython aso has a !o&pe of points to re!ommen% it to bioogists an% s!ientistsspe!ifi!ay:
• 'ts i%ey &se% in the s!ientifi! !omm&nity
• 't has a !o&pe of very e%esigne% ibraries for %oing !ompe s!ientifi!
!omp&ting atho&gh e ont en!o&nter them in this boo-=
• 't en% itsef e to being integrate% ith other, eisting toos
•
't has feat&res hi!h ma-e it easy to manip&ate strings of !hara!ters foreampe, strings of D;4 bases an% protein amino a!i% resi%&es, hi!h e as
bioogists are parti!&ary fon% of=
P*t+on vs, Perl
For bioogists, the
-
8/19/2019 python for biologists
13/227
Chapter 1: 'ntro%&!tion an% environmen
How to use this book
#rogramming boo-s generay fa into to !ategories referen!etype boo-s, hi!h
are %esigne% for oo-ing &p spe!ifi! bits of information, an% t&toriatype boo-s,
hi!h are %esigne% to be rea% !overto!over. $his boo- is an eampe of the atter
5 !o%e sampes in ater !hapters often &se materia from previo&s ones, so yo& nee%
to ma-e s&re yo& rea% the !hapters in or%er. )er!ises or eampes from one
!hapter are sometimes &se% to i&strate the nee% for feat&res that are intro%&!e%
in the net.
$here are a n&mber of f&n%amenta programming !on!epts that are reevant to
materia in m&tipe %ifferent !hapters. 'n this boo-, rather than intro%&!e these!on!epts a in one go, 've trie% to epain them as they be!ome ne!essary. $his
res&ts in a ten%en!y for earier !hapters to be onger than ater ones, as they
invove the intro%&!tion of more ne !on!epts.
4 !ertain amo&nt of 9argon is ne!essary if e ant to ta- abo&t programs an%
programming !on!epts. 've trie% to %efine ea!h ne te!hni!a term at the point
here its intro%&!e%, an% then &se it thereafter ith o!!asiona remin%ers of the
meaning.Chapters ten% to foo a pre%i!tabe str&!t&re. $hey generay start ith a fe
paragraphs o&tining the motivation behin% the feat&res that it i !over 5 hy %o
they eist, hat probems %o they ao &s to sove, an% hy are they &sef& in
bioogy spe!ifi!ayE $hese are fooe% by the main bo%y of the !hapter in hi!h
e %is!&ss the reevant feat&res an% ho to &se them. $he ength of the !hapters
varies
-
8/19/2019 python for biologists
14/227
G Chapter 1: 'ntro%&!tion an% environmen
footnotes1 to provi%e a%%itiona information that is interesting to -no b&t not
!r&!ia to &n%erstan%ing, or to give in-s to eb pages.
)ampe !o%e is highighte% ith a soi% bor%er:
Some example code goes here
an% eampe o&tp&t i.e. hat e see on the s!reen hen e r&n the !o%e= is
highighte% ith a %otte% bor%er:
Some output goes here
?ften e ant to oo- at the !o%e an% the o&tp&t it pro%&!es together. 'n these
sit&ations, yo& see a re%bor%ere% !o%e bo!- fooe% imme%iatey by a b&e
bor%ere% o&tp&t bo!-.
"ometimes its ne!essary to refer in the tet to in%ivi%&a ines of !o%e or o&tp&t, in
hi!h !ase 've &se% ine n&mberings on the eft:
first linesecond linethird line
?ther bo!-s of tet &s&ay fie !ontents or type% !omman% ines= %ont have any
-in% of bor%er an% oo- i-e this:
contents of a file
1 +i-e this.
1
2
3
-
8/19/2019 python for biologists
15/227
H Chapter 1: 'ntro%&!tion an% environmen
!ercises an" solutions
$he fina part of ea!h !hapter is a set of eer!ises an% so&tions. $he n&mber an%
!ompeity of eer!ises %iffer greaty beteen !hapters %epen%ing on the nat&re of
the materia. 4s a r&e, eary !hapters have a arge n&mber of simpe eer!ises,
hie ater !hapters have a sma n&mber of more !ompe ones. Many of the
eer!ise probems are ritten in a %eiberatey vag&e manner an% the ea!t %etais
of ho the so&tions or- is &p to yo& very m&!h i-e reaife programming8= o&
!an aays oo- at the so&tions to see one possibe ay of ta!-ing the probem,
b&t there are often m&tipe vai% approa!hes.
' strongy re!ommen% that yo& try ta!-ing the eer!ises yo&rsef before rea%ingthe so&tions there reay is no s&bstit&te for pra!ti!a eperien!e hen earning to
program. ' aso en!o&rage yo& to a%opt an attit&%e of !&rio&s eperimentation
hen or-ing on the eer!ises 5 if yo& fin% yo&rsef on%ering if a parti!&ar
variation on a probem is sovabe, or if yo& re!ognie a !oseyreate% probem
from yo&r on or-, try soving it8 Contin&o&s eperimentation is a -ey part of
%eveoping as a programmer, an% the
-
8/19/2019 python for biologists
16/227
I Chapter 1: 'ntro%&!tion an% environmen
$etting in touch
?ne of the most !onvin!ing arg&ments for presenting a !o&rse i-e this one in the
form of an eboo- is that it !an be !ontin&ay &p%ate% an% tea-e% base% on rea%er
fee%ba!-. "o, if yo& fin% anything that is har% to &n%erstan%, or yo& thin- may
!ontain an error, pease get in to&!h 5 9&st %rop me an emai at
martin6pythonforbioogists.!om an% ' promise to get ba!- to yo&.
&etting up your environment
4 that yo& nee% in or%er to foo the eampes an% eer!ises in this boo- is a
stan%ar% #ython instaation an% a tet e%itor. 4 the !o%e in this boo- i r&n on
either +in&, Ma! or in%os ma!hines. $he sight %ifferen!es beteen operating
systems are epaine% in the tet mosty in !hapter @=. 'f yo& have a !hoi!e of
operating systems on hi!h to earn #ython, ' re!ommen% +in&, Ma! ?" an%
in%os in that or%er, simpy be!a&se the *;'base% operating systems +in&
an% ?"= are more amenabe to programming in genera.
Installing P*t+on$he pro!ess of instaing #ython %epen%s on the type of !omp&ter yo&re r&nning
on. 'f yo&re r&nning a mainstream +in& %istrib&tion i-e *b&nt&, #ython is
probaby area%y instae%. $o fin% o&t, open a termina an% type
python
'f yo& see some o&tp&t aong these ines:
Python 2.7.3 (default, Apr 10 2013, 0!13!1"#$%&& '.7.2 on linux2)ype *help*, *copyright*, *credits* or *license* for more information.+++
-
8/19/2019 python for biologists
17/227
@ Chapter 1: 'ntro%&!tion an% environmen
$hen yo& are rea%y to go. 'f yo&r +in& instaation %oesnt area%y have #ython
instae%, try instaing it ith yo&r pa!-age manager the !omman% i probaby
be either sudo apt-get install python or sudo yum install python=.'f this %oesnt or-, then %onoa% the pa!-age from the #ython %onoa% page1.
$he offi!ia #ython ebsite has instaation instr&!tions for Ma!2 an% in%os3
!omp&ters as e these are i-ey to be the most &pto%ate instr&!tions, so foo
them !osey.
Running P*t+on programs
4 #ython program is 9&st a norma tet fie that !ontains #ython !o%e. $o r&n it em&st first open &p a !omman% ine. ?n +in& an% Ma! !omp&ters, the appi!ation
to %o this i be !ae% something aong the ines of >termina>. ?n in%os, it is
-non as >!omman% prompt>.
$o r&n a #ython program, e 9&st type the path to the #ython ee!&tabe fooe%
by the name of the fie that !ontains the !o%e e ant to r&nB. ?n a +in& or Ma!
ma!hine, the path i be something i-e:
usrlocal-inpython
?n in%os, it i be something i-e:
c!Python27python
1 http://.python.org/getit/
2 http://.python.org/getit/ma!/
3 http://.python.org/getit/in%os/
B hen e refer to >a #ython program> in this boo-, e are &s&ay ta-ing abo&t the tet fie that ho%s the
!o%e.
http://www.python.org/getit/http://www.python.org/getit/mac/http://www.python.org/getit/windows/http://www.python.org/getit/http://www.python.org/getit/mac/http://www.python.org/getit/windows/
-
8/19/2019 python for biologists
18/227
10 Chapter 1: 'ntro%&!tion an% environmen
$o r&n a #ython program, its generay easiest to be in the same fo%er as it. (y
!onvention, #ython programs are given the etension .py, so to r&n a program
!ae% test.py, e 9&st type:
usrlocal-inpython test.py
$here are a !o&pe of tri!-s that !an be &sef& hen eperimenting ith programs1
Firsty, yo& !an r&n #ython in an intera!tive or >she>= mo%e by r&nning it itho&t
the name of a program fie. $his aos yo& to type in%ivi%&a statements an% see
the res&t straight aay.
"e!on%y, yo& !an r&n #ython ith the -i option, hi!h i !a&se it to r&n yo&rprogram an% t+en enter intera!tive mo%e. $his !an be han%y if yo& ant to
eamine the state of variabes after yo&r !o%e has r&n.
P*t+on 2 vs, P*t+on 3
4s i
-
8/19/2019 python for biologists
19/227
11 Chapter 1: 'ntro%&!tion an% environmen
'f yo&re going to &se #ython 2, there is 9&st one thing that yo& have to %o in or%er
to ma-e some of the !o%e eampes or-: in!&%e this ine at the start of a yo&r
programs:
from //future// import diision
e ont go into the epanation behin% this ine, e!ept to say that its ne!essary
in or%er to !orre!t a sma
-
8/19/2019 python for biologists
20/227
12 Chapter 1: 'ntro%&!tion an% environmen
fies the probem by ma-ing it effe!tivey impossibe for yo& to type a tab
!hara!ter.
$he feat&re that is ni!e to have is synta! highlighting . $his i appy %ifferent!oo&rs to %ifferent parts of yo&r #ython !o%e, an% !an hep yo& spot errors more
easiy.
e!ommen%e% tet e%itors are .otepad// for in%os1, Text%rangler for Ma!
?"2, an% gedit for +in&3, a of hi!h are freey avaiabe.
?n the eb an% esehere yo& may see referen!es to #ython 'D)s. 'D) stan%s for
'ntegrate% Deveopment )nvironment, an% they typi!ay !ombine a tet e%itor
ith a !oe!tion of other &sef& programming toos. hie they !an spee% &p%eveopment for eperien!e% programmers, theyre not a goo% i%ea for beginners as
they !ompi!ate things, so ' %ont re!ommen% yo& &se them.
(ea"ing the "ocumentation
#art of the tea!hing phiosophy that 've &se% in riting this boo- is that its better
to intro%&!e a fe &sef& feat&res an% f&n!tions rather than overhem yo& ith a
!omprehensive ist. $he best pa!e to go hen yo& %o ant a !ompete ist of theoptions avaiabe in #ython is the offi!ia %o!&mentationB hi!h, !ompare% to
many ang&ages, is very rea%abe.
1 http://notepa%p&sp&s.org/
2 http://.barebones.!om/pro%&!ts/$etranger/
3 https://pro9e!ts.gnome.org/ge%it/
B http://.python.org/%o!/
http://notepad-plus-plus.org/http://www.barebones.com/products/TextWrangler/https://projects.gnome.org/gedit/http://www.python.org/doc/http://notepad-plus-plus.org/http://www.barebones.com/products/TextWrangler/https://projects.gnome.org/gedit/http://www.python.org/doc/
-
8/19/2019 python for biologists
21/227
13 Chapter 2: #rinting an% manip&ating te
2: Printing and manipulating text
Why are we so intereste" in working with te!t?
?pen the first page of a boo- abo&t earning #ython1, an% the !han!es are that the
first eampes of !o%e yo& see invove numbers. $heres a goo% reason for that:
n&mbers are generay simper to or- ith than tet 5 there are not too many
things yo& !an %o ith them on!e yo&ve got basi! arithmeti! o&t of the ay= an%
so they en% themseves e to eampes that are easy to &n%erstan%. 'ts aso a
pretty safe bet that the average person rea%ing a programming boo- is %oing sobe!a&se they nee% to %o some n&mber!r&n!hing.
"o hat ma-es this boo- %ifferent 5 hy is this first !hapter abo&t tet rather than
n&mbersE $he anser is that, as bioogists, e have a parti!&ar interest in %eaing
ith tet rather than n&mbers tho&gh of !o&rse, e nee% to earn ho to
manip&ate n&mbers too=. "pe!ifi!ay, ere intereste% in parti!&ar types of tet
that e !a se;uences < the D;4 an% protein se
-
8/19/2019 python for biologists
22/227
1B Chapter 2: #rinting an% manip&ating te
the or% e &se to refer to a bit of tet in a !omp&ter program it 9&st means a
string of !hara!ters=. From this point on e &se the or% string hen ere ta-ing
abo&t !omp&ter !o%e, an% e reserve the or% se;uence for hen ere %is!&ssingbioogi!a se
-
8/19/2019 python for biologists
23/227
1 Chapter 2: #rinting an% manip&ating te
$he arg&ments te #ython hat e ant to %o more spe!ifi!ay 5 in this !ase, the
arg&ment tes #ython ea!ty hat it is e ant to print: a frien%y greeting.
4ss&ming yo&ve fooe% the instr&!tions in !hapter 1 an% set &p yo&r #ythonenvironment, type the ine of !o%e above into yo&r favo&rite tet e%itor, save it, an%
r&n it. o& sho&% see a singe ine of o&tp&t i-e this:
ello orld
L&otes are important
'n norma riting, e ony s&rro&n% a bit of tet in
-
8/19/2019 python for biologists
24/227
1G Chapter 2: #rinting an% manip&ating te
print(*She said, ello orld*#print(e said, *ello orld*#
$he above !o%e i give the fooing o&tp&t:
She said, ello orld
e said, *ello orld*
(e !aref& hen riting an% rea%ing !o%e that invoves
-
8/19/2019 python for biologists
25/227
1H Chapter 2: #rinting an% manip&ating te
• (e!a&se the !omments are part of the so&r!e !o%e, they !an never get mie%
&p or separate%. 'n other or%s, if yo& are oo-ing at the so&r!e !o%e for a
parti!&ar program, then yo& a&tomati!ay have the %o!&mentation as e.'n !ontrast, if yo& -eep the %o!&mentation in a separate fie, it !an easiy
be!ome separate% from the !o%e.
• 7aving the !omments right net to the !o%e a!ts as a remin%er to &p%ate the
%o!&mentation henever yo& !hange the !o%e. $he ony thing orse than
&n%o!&mente% !o%e is !o%e ith o% %o!&mentation that is no onger
a!!&rate8
Dont ma-e the mista-e, by the ay, of thin-ing that !omments are ony &sef& if yo& are panning on shoing yo&r !o%e to somebo%y ese. hen yo& start riting
yo&r on !o%e, yo& i be amae% at ho
-
8/19/2019 python for biologists
26/227
1I Chapter 2: #rinting an% manip&ating te
)rror messages an% %eb&gging
't may seem %epressing eary in the boo- to be ta-ing abo&t errors8 7oever, its
orth pointing o&t at this eary stage that computer programs almost never
or0 correctl* t+e first time. #rogramming ang&ages are not i-e nat&ra
ang&ages 5 they have a very stri!t set of r&es, an% if yo& brea- any of them, the
!omp&ter i not attempt to g&ess hat yo& inten%e%, b&t instea% i stop
r&nning an% present yo& ith an error message. o&re going to be seeing a ot of
these error messages in yo&r programming !areer, so ets get &se% to them as soon
as possibe.
(orgetting uotes
7eres one possibe error e !an ma-e hen printing a ine of o&tp&t 5 e !an
forget to in!&%e the
-
8/19/2019 python for biologists
27/227
1@ Chapter 2: #rinting an% manip&ating te
eferring to the ine n&mbers on the eft e !an see that the name of the #ython
fie is error.py ine 1= an% that the error o!!&rs on the first ine of the fie ine
2=. #ythons best g&ess at the o!ation of the error is 9&st before the !ose
parentheses ine 3=. Depen%ing on the type of error, this !an be rong by
-
8/19/2019 python for biologists
28/227
20 Chapter 2: #rinting an% manip&ating te
$his time, #ython %oesnt try to sho &s here on the ine the error o!!&rre%, it
9&st shos &s the hoe ine ine B=. $he error message tes &s hi!h or% #ython
%oesnt &n%erstan% ine =, so in this !ase, its 7eo> on one ine an% then the or% >or%> on the net
ine 5 i-e this:
ello
=orld
e might try p&tting a ne ine in the mi%%e of o&r string i-e this:
print(*ello
=orld*#
b&t that ont or- an% e get the fooing error message:
6 python error.py ile *error.py*, line 1
print(*ello8
Syntax9rror! 9>? hile scanning string literal
#ython fin%s the error hen it gets to the en% of the first ine of !o%e ine 2 in the
o&tp&t=. $he error message ine = is a bit more !rypti! than the others. >6 stan%s
for )n% ?f +ine, an% string literal means a string in
-
8/19/2019 python for biologists
29/227
21 Chapter 2: #rinting an% manip&ating te
Printing special characters
$he reason that the !o%e above %i%nt or- is that #ython got !onf&se% abo&t
hether the ne ine as part of the string hi!h is hat e ante%= or part of the
source co"e hi!h is ho it as a!t&ay interprete%=. hat e nee% is a ay to
in!&%e a ne ine as part of a string, an% &!-iy for &s, #ython has 9&st s&!h a too
b&it in. $o in!&%e a ne ine, e rite a ba!-sash fooe% by the etter n 5
#ython -nos that this is a spe!ia !hara!ter an% i interpret it a!!or%ingy.
7eres the !o%e hi!h prints >7eo or%> a!ross to ines:
4 ho to include a ne line in the middle of a stringprint(*ellonorld*#
;oti!e that theres no nee% for a spa!e before or after the ne ine.
$here are a fe other &sef& spe!ia !hara!ters as e, a of hi!h !onsist of a
ba!-sash fooe% by a etter. $he ony ones hi!h yo& are i-ey to nee% for the
eer!ises in this boo- are the tab !hara!ter \t= an% the carriage return !hara!ter
\r=. $he tab !hara!ter !an sometimes be &sef& hen riting a program that i
pro%&!e a ot of o&tp&t. $he !arriage ret&rn !hara!ter or-s a bit i-e a ne ine inthat it p&ts the !&rsor ba!- to the start of the ine, b&t %oesnt a!t&ay start a ne
ine, so yo& !an &se it to overrite o&tp&t 5 this is sometimes &sef& for ong
r&nning programs.
&toring strings in variables
?K, eve been paying aro&n% ith the print f&n!tion for a hie ets intro%&!e
something ne. e !an ta-e a string an% assign a name to it &sing an e
-
8/19/2019 python for biologists
30/227
22 Chapter 2: #rinting an% manip&ating te
$he variabe my_dna no points to the string !"#C#"!. e !a this assigning a
variabe, an% on!e eve %one it, e !an &se the variabe name instea% of the string
itsef 5 for eampe, e !an &se it in a print statement
1
:
4 store a short @
-
8/19/2019 python for biologists
31/227
-
8/19/2019 python for biologists
32/227
2B Chapter 2: #rinting an% manip&ating te
'ools for manipulating strings
;o e -no ho to store an% print strings, e !an ta-e a oo- at a fe of the
fa!iities that #ython has for manip&ating them. #ython has many b&itin toos
for !arrying o&t !ommon operations, an% in this net se!tion e ta-e a oo- at
them onebyone. 'n the eer!ises at the en% of this !hapter, e oo- at ho e
!an &se m&tipe %ifferent toos together in or%er to !arry o&t more !ompe
operations.
Concatenation
e !an !on!atenate sti!- together= to strings &sing the V symbo1. $his symbo i 9oin together the string on the eft ith the string on the right:
my/dna B *AA))* C *%%&&*print(my/dna#
+ets ta-e a oo- at the o&tp&t:
AA))%%&&
'n the above eampe, the things being !on!atenate% ere strings, b&t e !an aso
&se variabes that point to strings:
upstream B *AAA*my/dna B upstream C *A)%&*4 my/dna is no *AAAA)%&*
1 e !a this the concatenation operator=
-
8/19/2019 python for biologists
33/227
2 Chapter 2: #rinting an% manip&ating te
e !an even 9oin m&tipe strings together in one go:
upstream B *AAA*donstream B *%%%*my/dna B upstream C *A)%&* C donstream
4 my/dna is no *AAAA)%&%%%*
'ts important to reaie that the res&t of !on!atenating to strings together is
itsef a string. "o its perfe!ty ?K to &se a !on!atenation insi%e a print statement:
print(*ello* C * * C *orld*#
4s e see in the rest of the boo-, &sing one too insi%e another is
-
8/19/2019 python for biologists
34/227
2G Chapter 2: #rinting an% manip&ating te
dna/length B len(*A%)&*#print(dna/length#
$heres another interesting thing abo&t the len f&n!tion: the res&t or return
value= is not a string, its a n&mber. $his is a very important i%ea so 'm going to
rite it o&t in bo%: P*t+on treats strings and numbers differentl*,
e !an see that this is the !ase if e try to !on!atenate together a n&mber an% a
string. Consi%er this short program hi!h !a!&ates the ength of a D;4 se
-
8/19/2019 python for biologists
35/227
2H Chapter 2: #rinting an% manip&ating te
7appiy, #ython has a b&itin so&tion 5 a f&n!tion !ae% str hi!h t&rns a
n&mber1 into a string so that e !an print it. 7eres ho e !an mo%ify o&r program
to &se it 5 've remove% the !omments from this version to ma-e it a bit more!ompa!t:
my/dna B *A)%&%A%)*dna/length B len(my/dna#print(*)he length of the @
-
8/19/2019 python for biologists
36/227
2I Chapter 2: #rinting an% manip&ating te
#ython ang&age, it beongs to a parti!&ar type. $he metho% e are ta-ing abo&t
here is !ae% loer, an% e say that it beongs to the string type. 7eres ho e
&se it:
my/dna B *A)%&*4 print my/dna in loer caseprint(my/dna.loer(##
;oti!e ho &sing a metho% oo-s %ifferent to &sing a f&n!tion. hen e &se a
f&n!tion i-e print or len, e rite the f&n!tion name first an% the arg&ments go
in parentheses:
print(*A)%&*#len(my/dna#
hen e &se a metho%, e rite the name of the variabe first, fooe% by a
perio%, then the name of the metho%, then the metho% arg&ments in parentheses.
For the eampe ere oo-ing at here, loer, there is no arg&ment, so the opening
an% !osing parentheses are right net to ea!h other.
'ts important to noti!e that the loer metho% %oes not a!t&ay !hange the
variabe instea% it ret&rns a !opy of the variabe in oer !ase. e !an prove that it
or-s this ay by printing the variabe before an% after r&nning loer. 7eres the
!o%e to %o so:
my/dna B *A)%&*4 print the aria-le
print(*-efore! * C my/dna#
4 run the loer method and store the resultloercase/dna B my/dna.loer(#4 print the aria-le again
print(*after! * C my/dna#
an% heres the o&tp&t e get:
-
8/19/2019 python for biologists
37/227
2@ Chapter 2: #rinting an% manip&ating te
-efore! A)%&after! A)%&
J&st i-e the len f&n!tion, in or%er to a!t&ay %o anything &sef& ith the loer
metho%, e nee% to store the res&t or print it right aay=.
(e!a&se the loer metho% beongs to the string type, e !an ony &se it on
variabes that are strings. 'f e try to &se it on a n&mber:
my/num-er B len(*A%)&*#
4 my/num-er is 'print(my/num-er.loer(##
e i get an error that oo-s i-e this:
Attri-ute9rror! int o-Dect has no attri-ute loer
$he error message is a bit !rypti!, b&t hopef&y yo& !an grasp the meaning:
something that is a n&mber an int, or integer= %oes not have a loer metho%.
$his is a goo% eampe of the importan!e of types in #ython !o%e: e can onl* usemet+ods on t+e t*pe t+at t+e* belong to.
(efore e move on, ets 9&st mention that there is another metho% that beongs to
the string type !ae% upper 5 yo& !an probaby g&ess hat it %oes8
Replacement
7eres another eampe of a &sef& metho% that beongs to the string type:
replace. replace is sighty %ifferent from anything eve seen before 5 it ta-esto arg&ments both strings= an% ret&rns a !opy of the variabe here a
o!!&rren!es of the first string are repa!e% by the se!on% string. $hats
-
8/19/2019 python for biologists
38/227
30 Chapter 2: #rinting an% manip&ating te
protein B *lspad:tn*4 replace aline ith tyrosine
print(protein.replace(**, *y*##
4 e can replace more than one characterprint(protein.replace(*ls*, *ymt*##4 the original aria-le is not affected
print(protein#
4n% this is the o&tp&t e get:
ylspad:tny
ymtpad:tn
lspad:tn
e ta-e a oo- at more toos for !arrying o&t string repa!ement in !hapter H.
xtracting part of a string
hat %o e %o if e have a ong string, b&t e ony ant a short portion of itE $his
is -non as ta-ing a substring , an% it has its on notation in #ython. $o get a
s&bstring, e foo the variabe name ith a pair of s
-
8/19/2019 python for biologists
39/227
31 Chapter 2: #rinting an% manip&ating te
palspad
lspad:tn
$here are to important things to noti!e here. Firsty, e a!t&ay start !o&nting
from position ero, rather than one 5 in other or%s, position 3 is a!t&ay the
fo&rth !hara!ter1. $his epains hy the first !hara!ter of the first ine of o&tp&t is
p an% not s as yo& might thin-. "e!on%y, the positions are inclusive at the start
b&t exclusive at the stop. 'n other or%s, the epression protein/012 gives &s
everything starting at the thir% !hara!ter, an% stopping 9&st before the fifth
!hara!ter i.e. !hara!ters three an% fo&r=.
'f e 9&st give a singe n&mber in the s
-
8/19/2019 python for biologists
40/227
32 Chapter 2: #rinting an% manip&ating te
+ets &se o&r protein se
-
8/19/2019 python for biologists
41/227
33 Chapter 2: #rinting an% manip&ating te
emember that in #ython e start !o&nting from ero rather than one, so position
0 is the first !hara!ter, position B is the fifth !hara!ter, et!. 4 !o&pe of eampes:
protein B *lspad:tn*print(str(protein.find(p###print(str(protein.find(:t###print(str(protein.find(###
4n% the o&tp&t:
3
"F1
;oti!e the behavio&r of fin% hen e as- it to o!ate a s&bstring that %oesnt eist
5 e get ba!- the anser -3.
(oth count an% find have a pretty serio&s imitation: yo& !an ony sear!h for
ea!t s&bstrings. 'f yo& nee% to !o&nt the n&mber of o!!&rren!es of a variabe
protein motif, or fin% the position of a variabe trans!ription fa!tor bin%ing site,
they i not hep yo&. $he hoe of !hapter H is %evote% to toos that !an %o those-in%s of 9obs.
?f the toos eve %is!&sse% in this se!tion, three 5 replace, count an% find 5
re
-
8/19/2019 python for biologists
42/227
3B Chapter 2: #rinting an% manip&ating te
"pitting &p a string into m&tipe bits
4n obvio&s ho
%o e spit a string e.g. a D;4 se $hats a !ommon9ob in bioogy, b&t &nfort&natey e !ant %o it yet &sing the toos from this !hapter
e ta- abo&t vario&s %ifferent ays of spitting strings in !hapter B. ' mention it
here 9&st to reass&re yo& that e i earn ho to %o it event&ay8
(ecap
e starte% this !hapter ta-ing abo&t strings an% ho to or- ith them, b&t aong
the ay e ha% to ta-e a ot of %iversions, a of hi!h ere ne!essary to&n%erstan% ho the %ifferent string toos or-. $han-f&y, that means that eve
!overe% most of the n&ts an% bots of the #ython ang&age, hi!h i ma-e f&t&re
!hapters go m&!h more smoothy.
eve earne% abo&t some genera feat&res of the #ython programming ang&age
i-e
• the %ifferen!e beteen functions, statements an% arguments
• the importan!e of comments an% ho to &se them
• ho to &se #ythons error messages to fi b&gs in o&r programs
• ho to store values in variables
• the ay that types or-, an% the importan!e of &n%erstan%ing them
• the %ifferen!e beteen functions an% metho"s, an% ho to &se them both
4n% eve en!o&ntere% some toos that are spe!ifi!ay for or-ing ith strings:• !on!atenation
• %ifferent types of
-
8/19/2019 python for biologists
43/227
3 Chapter 2: #rinting an% manip&ating te
• !hanging the !ase of a string
• fin%ing an% !o&nting s&bstrings
• repa!ing bits of a string ith something ne
• etra!ting bits of a string to ma-e a ne string
Many of the above topi!s i !rop &p again in f&t&re !hapters, an% i be
%is!&sse% in more %etai, b&t yo& !an aays ret&rn to this !hapter if yo& ant to
br&sh &p on the basi!s. $he eer!ises for this !hapter i ao yo& to pra!ti!e
&sing the string manip&ation toos an% to be!ome famiiar ith them. $hey aso
give yo& the !han!e to pra!ti!e b&i%er bigger programs by &sing the in%ivi%&a
toos as b&i%ing bo!-s.
-
8/19/2019 python for biologists
44/227
3G Chapter 2: #rinting an% manip&ating te
!ercises
Reminder: the %es!riptions of the eer!ises are %eiberatey terse an% may besomehat ambig&o&s 9&st i-e re
-
8/19/2019 python for biologists
45/227
3H Chapter 2: #rinting an% manip&ating te
Restriction fragment lengt+s
7eres a short D;4 se
-
8/19/2019 python for biologists
46/227
3I Chapter 2: #rinting an% manip&ating te
plicing out introns) part t+ree
*sing the %ata from part one, rite a program that i print o&t the origina
genomi! D;4 se
-
8/19/2019 python for biologists
47/227
3@ Chapter 2: #rinting an% manip&ating te
&olutions
Calculating 4T content
$his eer!ise is going to invove a mit&re of strings an% n&mbers. +ets remin%
o&rseves of the form&a for !a!&ating 4$ !ontent:
AT content = A+T
length
$here are three n&mbers e nee% to fig&re o&t: the n&mber of !s, the n&mber of "s
an% the ength of the se
-
8/19/2019 python for biologists
48/227
B0 Chapter 2: #rinting an% manip&ating te
my/dna B *A&)%A)&%A))A&%)A)A%)A)))%&)A)&A)A&A)A)A)A)&%A)%&%))&A)*length B len(my/dna#
a/count B my/dna.count(A#
t/count B my/dna.count()#
print(*length! * C str(length##
print(*A count! * C str(a/count##print(*) count! * C str(t/count##
+ets ta-e a oo- at the o&tp&t from this program:
length! '
A count! 1") count! 21
$hat oo-s abo&t right, b&t ho %o e -no if its ea!ty rightE e !o&% go
thro&gh the se
-
8/19/2019 python for biologists
49/227
B1 Chapter 2: #rinting an% manip&ating te
length! 'A count! 1
) count! 1
)verything oo-s ?K 5 e !an probaby go ahea% an% r&n the !o%e on the ong
se
-
8/19/2019 python for biologists
50/227
B2 Chapter 2: #rinting an% manip&ating te
$o fi it, a e nee% to %o is a%% some parentheses aro&n% the a%%ition, so that the
ine be!omes:
at/content B (a/count C t/count# length
;o e get the !orre!t o&tp&t for the test se
-
8/19/2019 python for biologists
51/227
B3 Chapter 2: #rinting an% manip&ating te
my/dna B *A&)%A)&%A))A&%)A)A%)A)))%&)A)&A)A&A)A)A)A)&%A)%&%))&A)*4 replace A ith )
replacement1 B my/dna.replace(A, )#
4 replace ) ith Areplacement2 B replacement1.replace(), A#4 replace & ith %
replacement3 B replacement2.replace(&, %#4 replace % ith &replacement' B replacement3.replace(%,  print the result of the final replacementprint(replacement'#
hen e ta-e a oo- at the o&tp&t, hoever, something seems rong:
A&A&AA&&AAAA&&AAAA&AAAAA&&AAA&AAA&AAAAAAAA&&AA&&&AA&AA
e !an see 9&st by oo-ing at the origina se
-
8/19/2019 python for biologists
52/227
BB Chapter 2: #rinting an% manip&ating te
)&)%))&%))))&%))))%)))))%&)))&)))&))))))))&%))%&%))&))A&A%AA&%AAAA&%AAAA%AAAAA%&AAA&AAA&AAAAAAAA&%AA%&%AA&AA
A%A%AA%%AAAA%%AAAA%AAAAA%%AAA%AAA%AAAAAAAA%%AA%%%AA%AA
A&A&AA&&AAAA&&AAAA&AAAAA&&AAA&AAA&AAAAAAAA&&AA&&&AA&AA
$he first repa!ement the res&t of hi!h is shon in the first ine of the o&tp&t=
or-s fine 5 a the 4s have been repa!e% ith $s for eampe, oo- at the first
!hara!ter 5 its 4 in the origina se
-
8/19/2019 python for biologists
53/227
B Chapter 2: #rinting an% manip&ating te
!ase. $hen, on!e a the repa!ements have been !arrie% o&t, e !an simpy !a
upper an% !hange the hoe se
-
8/19/2019 python for biologists
54/227
BG Chapter 2: #rinting an% manip&ating te
Restriction fragment lengt+s
+ets start this eer!ise by soving the probem man&ay. 'f e oo- thro&gh the
D;4 se
-
8/19/2019 python for biologists
55/227
BH Chapter 2: #rinting an% manip&ating te
'f e ante% to r&n the same program &sing a %ifferent restri!tion enyme, e%
have to !hange bot+ the string that e &se% in the find metho% !a, and the
n&mber that e a%% in or%er to ta-e a!!o&nt of the !&t site.'ts orth noting that this program ass&mes that the D;4 se
-
8/19/2019 python for biologists
56/227
BI Chapter 2: #rinting an% manip&ating te
my/dna B*A)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&%A)&%A)&
%A)&%A)&%A)&A)%&)A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)*
exon1 B my/dna$1!"3exon2 B my/dna$J1!10000print(exon1 C exon2#
$he o&tp&t from this !o%e oo-s vag&ey right:
)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)
b&t hen e oo- more !osey e !an see that something is not right. $he printe%
!o%ing se
-
8/19/2019 python for biologists
57/227
B@ Chapter 2: #rinting an% manip&ating te
plicing out introns) part to
$his is a straightforar% pie!e of n&mber!r&n!hing. $here are a !o&pe of ays to
go abo&t it. e !o&% &se the eon startstop !oor%inates to !a!&ate the ength ofthe !o%ing portion of the se
-
8/19/2019 python for biologists
58/227
0 Chapter 2: #rinting an% manip&ating te
77.2377237723G
atho&gh e probaby %ont reay re
-
8/19/2019 python for biologists
59/227
1 Chapter 2: #rinting an% manip&ating te
?r e !o&% avoi% &sing variabes for the introns an% eons a together, an% %o
everything in one big print statement:
print(my/dna$0!"2 C my/dna$"2!J0.loer(# C my/dna$J0!10000#
$his ast option is very !on!ise, b&t a bit har%er to rea% than the more verbose ay.
4s the eer!ises in this boo- get onger, yo& noti!e that there are more an% more
%ifferent ays to rite the !o%e 5 yo& may en% &p ith so&tions that oo- very
%ifferent to the eampe so&tions. hen trying to !hoose beteen %ifferent ays
to rite a program, aays favo&r the so&tion that is !earest in intent an% easiest
to rea%.
-
8/19/2019 python for biologists
60/227
2 Chapter 3: ea%ing an% riting fie
3: Reading and riting files
Why are we so intereste" in working with files?
4s e start this !hapter, e fin% o&rseves on!e again %oing things in a sighty
%ifferent or%er to most programming boo-s. $he ma9ority of intro%&!tory
programming boo-s ont !onsi%er or-ing ith eterna fies &nti m&!h f&rther
aong, so hy are e intro%&!ing it noE
$he anser, as as the !ase in the ast !hapter, ies in the parti!&ar 9obs that e
ant to &se #ython for. $he %ata that e as bioogists or- ith is store% in fies, soif ere going to rite &sef& programs e nee% a ay to get the %ata o&t of fies
an% into o&r programs an% vice versa=. 4s yo& ere going thro&gh the eer!ises in
the previo&s !hapter, it may have o!!&rre% to yo& that !opying an% pasting a D;4
se
-
8/19/2019 python for biologists
61/227
3 Chapter 3: ea%ing an% riting fie
4nother reason for o&r interest in fie inp&t/o&tp&t is the nee% for o&r #ython
programs to or- as part of a pipeine or or- fo invoving other, eisting toos.
hen it !omes to &sing #ython in the rea or%, e often ant #ython to eithera!!ept %ata from, or provi%e %ata to, another program. ?ften the easiest ay to %o
this is to have #ython rea%, or rite, fies in a format that the other program
area%y &n%erstan%s.
(ea"ing te!t from a file
Firsty, a
-
8/19/2019 python for biologists
62/227
B Chapter 3: ea%ing an% riting fie
• !ompresse% fies e.g. X'# fies=
'f yo&re not s&re hether a parti!&ar fie is tet or binary, theres a very simpe
ay to te 5 9&st open it &p in a tet e%itor. 'f the fie %ispays itho&t any probemthen its tet regar%ess of hether yo& !an ma-e sense of it or not=. 'f yo& get an
error or a arning from yo&r tet e%itor, or the fie %ispays as a !oe!tion of
in%e!ipherabe !hara!ters, then its binary.
$he eampes an% eer!ises in this !hapter are a itte %ifferent from those in the
previo&s one, be!a&se they rey on the eisten!e of the fies that e are going to
manip&ate. 'f yo& ant to try r&nning the eampes in this !hapter, yo& nee% to
ma-e s&re that there is a fie in yo&r or-ing %ire!tory !ae% "na=t!t hi!h has asinge ine !ontaining a D;4 se
-
8/19/2019 python for biologists
63/227
Chapter 3: ea%ing an% riting fie
$he ay that e &se fie ob9e!ts is a bit %ifferent to strings an% n&mbers as e. 'f
yo& gan!e ba!- at the eampes from the previo&s !hapter yo& see that most of
the time hen e ant to &se a variabe !ontaining a string or n&mber e 9&st &sethe variabe name:
my/string B a-cdefgprint(my/string#my/num-er B '2print(my/num-er C 1#
'n !ontrast, hen ere or-ing ith fie ob9e!ts most of o&r intera!tion i be
thro&gh metho"s. $his stye of programming i seen &n&s&a at first, b&t as esee in this !hapter, the fie type has a e tho&ghto&t set of metho%s hi!h et &s
%o ots of &sef& things.
$he first thing e nee% to be abe to %o is to rea% the !ontents of the fie. $he fie
type has a read metho% hi!h %oes this. 't %oesnt ta-e any arg&ments, an% the
ret&rn va&e is a string, hi!h e !an store in a variabe. ?n!e eve rea% the fie
!ontents into a variabe, e !an treat them 9&st i-e any other string 5 for eampe,
e !an print them:
my/file B open(*dna.txt*#
file/contents B my/file.read(#print(file/contents#
/iles0 contents an" file names
hen earning to or- ith fies its very easy to get !onf&se% beteen a file ob@ect ,a file name, an% the contents of a fie. $a-e a oo- at the fooing bit of !o%e:
-
8/19/2019 python for biologists
64/227
G Chapter 3: ea%ing an% riting fie
my/file/name B *dna.txt*my/file B open(my/file/name#
my/file/contents B my/file.read(#
hats going on hereE ?n ine 1, e store the string "na=t!t in the variabe
my_file_name. ?n ine 2, e &se the variabe my_file_name as the arg&ment
to the open f&n!tion, an% store the res&ting fie ob9e!t in the variabe my_file.
?n ine 3, e !a the read metho% on the variabe my_file, an% store the
res&ting string in the variabe my_file_contents.
$he important thing to &n%erstan% abo&t this !o%e is that there are three separate
variabes hi!h have %ifferent types an% hi!h are storing three very %ifferentthings. my_file_name is a string, an% it stores the name of a fie on %is-.
my_file is a fie ob9e!t, an% it represents the fie itsef. my_file_contents is a
string, an% it stores the tet that is in the fie.
emember that variabe names are arbitrary 5 the !omp&ter %oesnt !are hat yo&
!a yo&r variabes. "o this pie!e of !o%e is ea!ty the same as the previo&s
eampe:
apple B *dna.txt*-anana B open(apple#grape B -anana.read(#
e!ept it is har%er to rea%8 'n !ontrast, the fie name "na=t!t = is not arbitrary 5 it
m&st !orrespon% to the name of a fie on the har% %rive of yo&r !omp&ter.
4 !ommon error is to try to &se the read metho% on the rong thing. e!a that
read is a metho% that ony or-s on fie ob9e!ts. 'f e try to &se the read metho%on the fie name:
my/file/name B *dna.txt*my/contents B my/file/name.read(#
1
2
3
-
8/19/2019 python for biologists
65/227
H Chapter 3: ea%ing an% riting fie
e get an !ttri$uteError 5 #ython i !ompain that strings %ont have a
read metho%1:
Attri-ute9rror! str o-Dect has no attri-ute read
4nother !ommon error is to &se the file ob@ect hen e meant to &se the file
contents. 'f e try to print the fie ob9e!t:
my/file/name B *dna.txt*my/file B open(my/file/name#print(my/file#
e ont get an error, b&t e get an o%%oo-ing ine of o&tp&t:
;open file dna.txt, mode r at 0x7fcff77G'-0+
e ont %is!&ss the meaning of this ine no: 9&st remember that if yo& try to
print the !ontents of a fie b&t instea% yo& get some o&tp&t that oo-s i-e the
above, yo& have amost %efinitey printe% the fie ob9e!t rather than the fie
!ontents.
ealing with newlines
+ets ta-e a oo- at the o&tp&t e get hen e try to print some information from a
fie. e &se the "na=t!t fie from the chapter) eer!ises fo%er. $his fie !ontains
a singe ine ith a short D;4 se
-
8/19/2019 python for biologists
66/227
I Chapter 3: ea%ing an% riting fie
from this !hapter, an% the materia e sa in the previo&s !hapter, e get the
fooing !o%e:
4 open the filemy/file B open(*dna.txt*#4 read the contentsmy/dna B my/file.read(#
4 calculate the lengthdna/length B len(my/dna#4 print the outputprint(*seuence is * C my/dna C * and length is * C str(dna/length##
hen e oo- at the o&tp&t, e !an see that the program is or-ing amostperfe!ty 5 b&t there is something strange: the o&tp&t has been spit over to ines
seuence is A&)%)A&%)%&A&)%A)&
and length is 1J
$he epanation is simpe on!e yo& -no it: #ython has in!&%e% the ne ine
!hara!ter at the en% of the "na=t!t fie as part of the !ontents. 'n other or%s, the
variabe my_dna has a ne ine !hara!ter at the en% of it. 'f e !o&% vie themy_dna variabe %ire!ty 1, e o&% see that it oo-s i-e this:
A&)%)A&%)%&A&)%A)&n
$he so&tion is aso simpe. (e!a&se this is s&!h a !ommon probem, strings have a
metho% for removing ne ines from the en% of them. $he metho% is !ae%
rstrip, an% it ta-es one string arg&ment hi!h is the !hara!ter that yo& ant to
remove. 'n this !ase, e ant to remove the neine !hara!ter \n=. 7eres amo%ifie% version of the !o%e 5 note that the arg&ment to rstrip is itsef a string
so nee%s to be en!ose% in
-
8/19/2019 python for biologists
67/227
@ Chapter 3: ea%ing an% riting fie
my/file B open(*dna.txt*#my/file/contents B my/file.read(#
4 remoe the neline from the end of the file contents
my/dna B my/file/contents.rstrip(*n*#dna/length B len(my/dna#print(*seuence is * C my/dna C * and length is * C str(dna/length##
an% no the o&tp&t oo-s 9&st as e epe!te%:
seuence is A&)%)A&%)%&A&)%A)& and length is 1G
'n the !o%e above, e first rea% the fie !ontents an% then remove% the neine, in
to separate steps:
my/file/contents B my/file.read(#my/dna B my/file/contents.rstrip(*n*#
b&t its more !ommon to rea% the !ontents an% remove the neine a in one go,
i-e this:
my/dna B my/file.read(#.rstrip(*n*#
$his is a bit tri!-y to rea% at first as e are &sing to %ifferent metho%s read an%
rstrip= in the same statement. $he -ey is to rea% it from eft to right 5 e ta-e th
my_file variabe an% &se the read metho% on it, then e ta-e the o&tp&t of that
metho% hi!h e -no is a string= an% &se the rstrip metho% on it. $he res&t
of the rstrip metho% is then store% in the my_dna variabe.
'f yo& fin% it %iffi!&t rite the hoe thing as one statement i-e this, 9&st brea- it&p an% %o the to things separatey 5 yo&r programs i r&n 9&st as e.
-
8/19/2019 python for biologists
68/227
G0 Chapter 3: ea%ing an% riting fie
issing files
hat happens if e try to rea% a fie that %oesnt eistE
my/file B open(*nonexistent.txt*#
e get a ne type of error that eve not seen before:
L>9rror! $9rrno 2
-
8/19/2019 python for biologists
69/227
G1 Chapter 3: ea%ing an% riting fie
Most importanty, termina o&tp&t vanishes hen yo& !ose yo&r termina program
For sma programs i-e the eampes in this boo-, thats not a probem 5 if yo&
ant to see the o&tp&t again yo& !an 9&st rer&n the program. 'f yo& have aprogram that rer> for
rea%ing.
$he %ifferen!e beteen >> an% >a> is s&bte, b&t important. 'f e open a fie that
area%y eists &sing the mo%e >>, then e i overrite the !&rrent !ontents ith
hatever %ata e rite to it. 'f e open an eisting fie ith the mo%e >a>, it i
a%% ne %ata onto the en% of the fie, b&t i not remove any eisting !ontent. 'f
there %oesnt area%y eist a fie ith the spe!ifie% name, then >> an% >a> behave
i%enti!ay 5 they i both !reate a ne fie to ho% the o&tp&t.
L&ite a ot of #ython f&n!tions an% metho%s have these optiona arg&ments. For
the p&rposes of this boo-, e i ony mention them hen they are %ire!ty
reevant to hat ere %oing. 'f yo& ant to see a the optiona arg&ments for a
parti!&ar metho% or f&n!tion, the best pa!e to oo- is the offi!ia #ython
%o!&mentation 5 see !hapter 1 for %etais.
1 e !a this the mo"e of the fie.
2 $hese are the most !ommony&se% options 5 there are a fe others.
-
8/19/2019 python for biologists
70/227
G2 Chapter 3: ea%ing an% riting fie
?n!e eve opene% a fie for riting, e !an &se the fie rite metho% to rite
some tet to it. rite or-s a ot i-e print 5 it ta-es a singe string arg&ment
b&t instea% of printing the string to the s!reen it rites it to the fie.7eres ho e &se open ith a se!on% arg&ment to open a fie an% rite a singe
ine of tet to it:
my/file B open(*out.txt*, **#my/file.rite(*ello orld*#
(e!a&se the o&tp&t is being ritten to the fie in this eampe, yo& ont see any
o&tp&t on the s!reen if yo& r&n it. $o !he!- that the !o%e has or-e%, yo& have tor&n it, then open &p the fie out=t!t in yo&r tet e%itor an% !he!- that its !ontents
are hat yo& epe!t1.
emember that ith rite, 9&st i-e ith print, e !an &se an* string as the
arg&ment. $his aso means that e !an &se any metho% or f&n!tion that returns a
string. $he fooing are a perfe!ty ?K:
4 rite *a-cdef*my/file.rite(*a-c* C *def*#4 rite *G*
my/file.rite(str(len(A%)%&)A%###4 rite *))%&*my/file.rite(*A)%&*.replace(A, )##4 rite *atgc*
my/file.rite(*A)%&*.loer(##4 rite contents of my/aria-lemy/file.rite(my/aria-le#
1 .tt is the stan%ar% fie name etension for a pain tet fie. +ater in this boo-, hen e generate o&tp&t
fies ith a parti!&ar format, e &se %ifferent fie name etensions.
-
8/19/2019 python for biologists
71/227
G3 Chapter 3: ea%ing an% riting fie
4losing files
$heres one more important fie metho% to oo- at before e finish this !hapter 5
close. *ns&rprisingy, this is the opposite of open b&t note that its a metho" ,
hereas open is a function=. e sho&% !a close after ere %one rea%ing or
riting to a fie 5 e ont go into the %etais here, b&t its a goo% habit to get into
as it avoi%s some types of b&gs that !an be tri!-y to tra!- %on1. close is an
&n&s&a metho% as it ta-es no arg&ments so its !ae% ith an empty pair of
parentheses= an% %oesnt ret&rn any &sef& va&e:
my/file B open(*out.txt*, **#my/file.rite(*ello orld*#
4 remem-er to close the filemy/file.close(#
Paths an" fol"ers
"o far, e have ony %eat ith opening fies in the same fo%er that e are r&nning
o&r program. hat if e ant to open a fie from a %ifferent part of the fie systemE
$he open f&n!tion is
-
8/19/2019 python for biologists
72/227
GB Chapter 3: ea%ing an% riting fie
my/file B open(r*c!indos@es:topmyfoldermyfile.txt*#
an% if yo&re on a Ma!, i-e this:
my/file B open(*Msersmartin@es:topmyfoldermyfile.txt*#
(ecap
eve ta-en a hoe !hapter to intro%&!e the vario&s ays of rea%ing an% riting
to fies, be!a&se its s&!h an important part of b&i%ing programs that are &sef& in
bioogy. eve seen ho or-ing ith fie !ontents is aays a tostep pro!ess 5
e m&st open a fie before rea%ing or riting 5 an% oo-e% at severa !ommon
pitfas. e ret&rn to the theme of fie manip&ation in ater !hapters here e
a%%ress some of the short!omings of the te!hni
-
8/19/2019 python for biologists
73/227
G Chapter 3: ea%ing an% riting fie
!ercises
plitting genomic '.4
+oo- in the chapter) fo%er for a fie !ae% genomic"na=t!t 5 it !ontains the same
pie!e of genomi! D;4 that e ere &sing in the fina eer!ise from !hapter 2. rite
a program that i spit the genomi! D;4 into !o%ing an% non!o%ing parts, an%
rite these se
-
8/19/2019 python for biologists
74/227
GG Chapter 3: ea%ing an% riting fie
rite a program that i !reate a F4"$4 fie for the fooing three se
-
8/19/2019 python for biologists
75/227
GH Chapter 3: ea%ing an% riting fie
&olutions
plitting genomic '.4
e have a hea%start on this probem, be!a&se e have area%y ta!-e% a simiar
probem in the previo&s !hapter. +ets remin% o&rseves of the so&tion e en%e%
&p ith for that eer!ise:
my/dna B*A)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&%A)&%A)&
%A)&%A)&%A)&A)%&)A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)*exon1 B my/dna$0!"2intron B my/dna$"2!J0exon2 B my/dna$J0!10000print(exon1 C intron.loer(# C exon2#
hat !hanges %o e nee% to ma-eE Firsty, e nee% to rea% the D;4 se
-
8/19/2019 python for biologists
76/227
GI Chapter 3: ea%ing an% riting fie
+ets p&t it a together, ith some ban- ines to separate o&t the %ifferent parts of
the program:
4 open the file and read its contentsdna/file B open(*genomic/dna.txt*#my/dna B dna/file.read(#
4 extract the different -its of @
-
8/19/2019 python for biologists
77/227
G@ Chapter 3: ea%ing an% riting fie
5 that i ma-e it easier to see the o&tp&t right aay. ?n!e eve got it or-ing,
e sit!h over to fie o&tp&t. 7eres a fe ines hi!h i print %ata to the
s!reen:
print(header/1#print(se/1#print(header/2#
print(se/2#print(header/3#print(se/3#
an% heres hat the o&tp&t oo-s i-e:
AN&123A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%@9'"
actgatcgacgatcgatcgatcacgact@9'"actgatcgacgatcgatcgatcacgact
;ot far off 5 the ines are in the right or%er, b&t e forgot to in!&%e the greater
than symbo at the start of the hea%er. 4so, e %ont reay nee% to print the
hea%er an% the se
-
8/19/2019 python for biologists
78/227
H0 Chapter 3: ea%ing an% riting fie
+AN&123A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%
+@9'"
actgatcgacgatcgatcgatcacgact+LH7GJA&)%A&FA&)%)FFA&)%)AFFFF&A)%)%
;et, ets ta!-e the probems ith the se
-
8/19/2019 python for biologists
79/227
H1 Chapter 3: ea%ing an% riting fie
output B open(*seuences.fasta*, **#output.rite(+ C header/1 C n C se/1#
output.rite(+ C header/2 C n C se/2.upper(##
output.rite(+ C header/3 C n C se/3.replace(F, ##
4fter ma-ing these !hanges the !o%e %oesnt pro%&!e any o&tp&t on the s!reen, so
to see hats happene% e nee% to ta-e a oo- at the se;uences=fasta fie:
+AN&123
A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%+@9'"A&)%A)&%A&%A)&%A)&%A)&A&%A&)+LH7GJA&)%A&A&)%)A&)%)A&A)%)%
$his %oesnt oo- right 5 the se!on% an% thir% ines have been 9oine% together, as
have the fo&rth an% fifth. hat has happene%E
't oo-s i-e eve &n!overe% a %ifferen!e beteen the print f&n!tion an% the
rite metho%. print a&tomati!ay p&ts a ne ine at the en% of the string,
hereas rite %oesnt. $his means eve got to be !aref& hen sit!hing
beteen them8 $he fi is .
-
8/19/2019 python for biologists
80/227
H2 Chapter 3: ea%ing an% riting fie
7eres the fina !o%e, in!&%ing the variabe %efinition at the beginning, ith ban-
ines an% !omments:
4 set the alues of all the header aria-lesheader/1 B *!5C36/header_6 @ (E7819header_/ @ :;?
B set the alues of all the seAuence aria$lesseA_3 @ !"C#"!C#!"C#!"C#!"C#C"!#!C#"!"C#seA_6 @ actgatcgacgatcgatcgatcacgactseA_/ @ !C"#!C-!C"#"D!C"#"!----C!"#"#
B make a ne file to hold the outputoutput @ open+seAuences.fasta% ,
B rite the header and seAuence for seA3output.rite+)) F header_3 F )\n) F seA_3 F )\n),
B rite the header and uppercase seAuences for seA6output.rite+)) F header_6 F )\n) F seA_6.upper+, F )\n),
B rite the header and seAuence for seA/ ith hyphens remoedoutput.rite+)) F header_/ F )\n) F seA_/.replace+)-)% )), F )\n),
%riting multiple (4T4 files
e !an sove this probem ith a sight mo%ifi!ation of o&r so&tion to the previo&
eer!ise. e nee% to !reate three ne fies to ho% the o&tp&t, an% e !onstr&!t
the name of ea!h fie by &sing string !on!atenation:
output/1 B open(header/1 C *.fasta*, **#
output/2 B open(header/2 C *.fasta*, **#output/3 B open(header/3 C *.fasta*, **#
emember, the first arg&ment to open is a string, so its fine to &se a !on!atenation
be!a&se e -no that the res&t of !on!atenating to strings is aso a string.
-
8/19/2019 python for biologists
81/227
H3 Chapter 3: ea%ing an% riting fie
e aso !hange the rite statements so that e have one for ea!h of the o&tp&t
fies. e nee% to be !aref& ith the n&mber here in or%er to ma-e s&re that e get
the right se
-
8/19/2019 python for biologists
82/227
HB Chapter B: +ists an% oop
": #ists and loops
Why "o we nee" lists an" loops?
$hin- ba!- over the eer!ises that eve seen in the previo&s to !hapters 5 theyve
a invove% %eaing ith one bit of information at a time. 'n !hapter 2, e &se%
string manip&ation toos to pro!ess singe se
-
8/19/2019 python for biologists
83/227
H Chapter B: +ists an% oop
$he imitations of this approa!h be!ame !ear
-
8/19/2019 python for biologists
84/227
HG Chapter B: +ists an% oop
4reating lists an" retrieving elements
$o ma-e a ne ist, e p&t severa strings or n&mbers 1 insi%e s
-
8/19/2019 python for biologists
85/227
HH Chapter B: +ists an% oop
hat if e ant to get more than one eement from a istE e !an give a start an%
stop position, separate% by a !oon, to spe!ify a range of eements:
ran:s B $*:ingdom*,*phylum*, *class*, *order*, *family*loer/ran:s B ran:s$2!4 loer ran:s are class, order and family
Does this oo- famiiarE 'ts the ea!t same notation that e &se% to get s&bstrings
ba!- in !hapter 2, an% it or-s in ea!ty the same ay 5 n&mbers are inclusive at
the start an% exclusive at the en%. $he fa!t that e &se the same notation for
strings an% ists hints at a %eeper reationship beteen the to types. 'n fa!t, hat
e ere %oing hen etra!ting s&bstrings in !hapter 2 as treating a string as
t+oug+ it ere a list of c+aracters. $his i%ea 5 that e !an treat a variabe as
tho&gh it ere a ist hen its not 5 is a poerf& one in #ython an% e !ome ba!-
to it ater in this !hapter.
Working with list elements
$o a%% another eement onto the en% of an eisting ist, e !an &se the append
metho%:
apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*apes.append(*Pan paniscus*#
append is an interesting metho% be!a&se it a!t&ay !hanges the variabe on hi!h
its &se% 5 in the above eampe, the apes ist goes from having three eements to
having fo&r. e !an get the ength of a ist by &sing the len f&n!tion, 9&st i-e e
%i% for strings:
-
8/19/2019 python for biologists
86/227
HI Chapter B: +ists an% oop
apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*print(*)here are * C str(len(apes## C * apes*#
apes.append(*Pan paniscus*#
print(*
-
8/19/2019 python for biologists
87/227
H@ Chapter B: +ists an% oop
the ist. 'f e ant to print o&t a ist to see ho this or-s, e nee% to &se% str
9&st as e %i% hen printing o&t n&mbers=:
ran:s B $*:ingdom*,*phylum*, *class*, *order*, *family*print(*at the start ! * C str(ran:s##ran:s.reerse(#print(*after reersing ! * C str(ran:s##
ran:s.sort(#print(*after sorting ! * C str(ran:s##
'f e ta-e a oo- at the o&tp&t, e !an see ho the or%er of the eements in the ist
is !hange% by these to metho%s:
at the start ! $:ingdom, phylum, class, order, familyafter reersing ! $family, order, class, phylum, :ingdomafter sorting ! $class, family, :ingdom, order, phylum
(y %efa&t, #ython sorts strings in aphabeti!a or%er an% n&mbers in as!en%ing
n&meri!a or%er1.
Writing a loop
'magine e ante% to ta-e o&r ist of apes:
apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*
an% print o&t ea!h eement on a separate ine, i-e this:
omo sapiens is an apePan troglodytes is an ape%orilla gorilla is an ape
1 e !an sort in other ays too, b&t thats beyon% the s!ope of this boo-
-
8/19/2019 python for biologists
88/227
I0 Chapter B: +ists an% oop
?ne ay to %o it o&% be to 9&st print ea!h eement separatey:
print(apes$0 C * is an ape*#print(apes$1 C * is an ape*#print(apes$2 C * is an ape*#
b&t this is very repetitive an% reies on &s -noing the n&mber of eements in the
ist. hat e nee% is a ay to say something aong the ines of > for each element in
the list of apes0 print out the element0 followe" by the wor"s 7 is an ape7>. #ythons oop
synta aos &s to epress those instr&!tions i-e this:
for ape in apes!print(ape C * is an ape*#
+ets ta-e a moment to oo- at the %ifferent parts of this oop. e start by riting
for x in y, here y is the name of the ist e ant to pro!ess an% x is the name
e ant to &se for the !&rrent eement ea!h time ro&n% the oop.
x is 9&st a variabe name so it foos a the r&es that eve area%y earne% abo&t
variabe names=, b&t it behaves sighty %ifferenty to a the other variabes eve
seen so far. 'n a previo&s eampes, e !reate a variabe an% store something in it,
an% then the va&e of that variabe %oesnt !hange &ness e !hange it o&rseves. 'n
!ontrast, hen e !reate a variabe to be &se% in a oop, e %ont set its va&e 5 the
va&e of the variabe i be a&tomati!ay set to ea!h eement of the ist in t&rn,
an% it i be %ifferent ea!h time ro&n% the oop.
'mportanty, the oop variabe x ony eists insi%e the oop 5 it gets !reate% at the
start of ea!h oop iteration, an% %isappears at the en%. $his means that on!e the
oop has finishe% r&nning for the ast time, that variabe is gone forever. hen a
variabe is restri!te% to a bo!- of !o%e i-e this, e !a it the variabes scope 5 e
i see severa more eampes ater in the boo-.
-
8/19/2019 python for biologists
89/227
I1 Chapter B: +ists an% oop
$his first ine of the oop en%s ith a !oon, an% a the s&bse
-
8/19/2019 python for biologists
90/227
I2 Chapter B: +ists an% oop
omo sapiens is an ape. Lts name starts ith Lts name has 12 letters
Pan troglodytes is an ape. Lts name starts ith P
Lts name has 1 letters%orilla gorilla is an ape. Lts name starts ith %Lts name has 1 letters
hy is the above approa!h better than printing o&t these si ines in si separate
statementsE e, for one thing, theres m&!h ess re%&n%an!y 5 here e ony
nee%e% to rite to print statements. $his aso means that if e nee% to ma-e a
!hange to the !o%e, e ony have to ma-e it on!e rather than three separate times.
4nother benefit of &sing a oop here is that if e ant to a%% some eements to theist, e %ont have to to&!h the oop !o%e at a. Conse
-
8/19/2019 python for biologists
91/227
I3 Chapter B: +ists an% oop
Lndentation9rror! unindent does not match any outer indentation leel
hen yo& en!o&nter an ;ndentationError, go ba!- to yo&r !o%e an% %o&be!he!- that a the ines in the bo!- mat!h &p. 4so %o&be!he!- that yo& are &sing
either tabs or spa!es for in%entation, not bot+. $he easiest ay to %o this, as
mentione% in !hapter 1, is to enabe tab emulation in yo&r tet e%itor.
,sing a string as a list
eve area%y seen ho a string !an preten% to be a ist 5 e !an &se ist in%e
notation to get in%ivi%&a !hara!ters or s&bstrings from insi%e a string. Can e aso&se oop notation to pro!ess a string as tho&gh it ere a istE es 5 if e rite a
oop statement ith a string in the position here e% normay fin% a ist, #ython
treats eac+ c+aracter in the string as a separate eement. $his aos &s to very
easiy pro!ess a string one !hara!ter at a time:
name B *martin*for character in name!
print(*one character is * C character#
'n this !ase, ere 9&st printing ea!h in%ivi%&a !hara!ter:
one character is mone character is aone character is r
one character is tone character is ione character is n
$he pro!ess of repeating a set of instr&!tions for ea!h eement of a ist or
!hara!ter in a string= is !ae% iteration, an% e often ta- abo&t iterating over a ist
or string.
-
8/19/2019 python for biologists
92/227
IB Chapter B: +ists an% oop
&plitting a string to make a list
"o far in this !hapter, a o&r ists have been ritten man&ay. 7oever, there are
penty of f&n!tions an% metho%s in #ython that pro%&!e ists as their o&tp&t. ?ne
s&!h metho% that is parti!&ary interesting to bioogists is the split metho%
hi!h or-s on strings. split ta-es a singe arg&ment, !ae% the "elimiter , an%
spits the origina string herever it sees the %eimiter, pro%&!ing a ist. 7eres an
eampe:
names B *melanogaster,simulans,ya:u-a,ananassae*species B names.split(*,*#
print(str(species##
e !an see from the o&tp&t that the string has been spit herever there as a
!omma eaving &s ith a ist of strings:
$melanogaster, simulans, ya:u-a, ananassae
?f !o&rse, on!e eve !reate% a ist in this ay e !an iterate over it &sing a oop,
9&st i-e any other ist.
terating over lines in a file
4nother very &sef& thing that e !an iterate over is a fie. J&st as a string !an
preten% to be a ist for the p&rposes of ooping, a fie ob9e!t !an %o the same tri!-1.
hen e treat a string as a ist, ea!h !hara!ter be!omes an in%ivi%&a eement, b&t
hen e treat a fie as a ist, ea!h line be!omes an in%ivi%&a eement. $his ma-es
pro!essing a fie inebyine very easy:
1 'f yo&re intereste% in ho this >preten%ing> a!t&ay or-s, oo- &p the #ython %o!&mentation for iterator
5 b&t be prepare% to %o
-
8/19/2019 python for biologists
93/227
I Chapter B: +ists an% oop
file B open(*some/input.txt*#for line in file!
4 do something ith the line
4
-
8/19/2019 python for biologists
94/227
IG Chapter B: +ists an% oop
that o&% nee% to !hange is the stop position in the s&bstring. (&t hat are e
going to iterate overE e !ant 9&st iterate over the protein string, be!a&se that i
give &s in%ivi%&a resi%&es, hi!h is not hat e ant. e !an man&ay assembe aist of stop positions, an% oop over that:
stop/positions B $3,',,",7,G,J,10for stop in stop/positions!
su-string B protein$0!stopprint(su-string#
b&t this seems !&mbersome, an% ony or-s if e -no the ength of the protein
se
-
8/19/2019 python for biologists
95/227
IH Chapter B: +ists an% oop
ith to n&mbers, range i !o&nt &p from the first n&mber in!&sive1= to the
se!on% e!&sive=:
for num-er in range(3, G#!print(num-er#
3'
"7
ith three n&mbers, range i !o&nt &p from the first to the se!on% ith the step
sie given by the thir%:
for num-er in range(2, 1', '#!print(num-er#
2
"10
(ecap
'n this !hapter eve seen severa toos that or- together to ao o&r programs to
%ea eeganty ith m&tipe pie!es of %ata. +ists et &s store many eements in a
singe variabe, an% oops et &s pro!ess those eements one by one. 'n earning
abo&t oops, eve aso been intro%&!e% to the bo!- synta an% the importan!e of
in%entation in #ython.
1 $he r&es for ranges are the same as for array notation 5 in!&sive on the o en%, e!&sive on the high en
5 so yo& ony have to memorie them on!e8
-
8/19/2019 python for biologists
96/227
II Chapter B: +ists an% oop
eve aso seen severa &sef& ays in hi!h e !an &se the notation eve earne%
for or-ing ith ists ith other types of %ata. Depen%ing on the !ir!&mstan!es, e
!an &se strings, files, an% ranges as if they ere ists. $his is a very hepf& feat&re of#ython, be!a&se on!e eve be!ome famiiar ith the synta for or-ing ith ists,
e !an &se it in many %ifferent pa!e. +earning abo&t these toos has aso hepe% &s
ma-e sense of some interesting behavio&r that e observe% in earier !hapters.
-
8/19/2019 python for biologists
97/227
I@ Chapter B: +ists an% oop
!ercises
.ote: a the fies mentione% in these eer!ises !an be fo&n% in the chapter* fo%eof the eer!ises %onoa%.
Processing '.4 in a file
$he fie input=t!t !ontains a n&mber of D;4 se
-
8/19/2019 python for biologists
98/227
@0 Chapter B: +ists an% oop
&olutions
Processing '.4 in a file
$his seems a bit more !ompi!ate% than previo&s eer!ises 5 e are being as-e% to
rite a program that %oes to things at on!e8 5 so ets ta!-e it one step at a time.
First, e rite a program that simpy rea%s ea!h se
-
8/19/2019 python for biologists
99/227
@1 Chapter B: +ists an% oop
7eres hat the !o%e oo-s i-e ith the s&bstring part a%%e%:
file B open(*input.txt*#for dna in file! last/character/position B len(dna#
trimmed/dna B dna$1'!last/character/positionprint(trimmed/dna#
4s before, e are simpy printing the trimme% D;4 se
-
8/19/2019 python for biologists
100/227
@2 Chapter B: +ists an% oop
?pening &p the trimme"=t!t fie, e !an see that the res&t oo-s goo%. 't %i%nt
matter that e never remove% the neines, be!a&se they appear in the !orre!t
pa!e in the o&tp&t fie anyay:
)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&A&)%A)&%A)&%A)&%A)&%A)&%A)%&)A)&%)&%)A)&%A)&A&%A)&)A)&%)A&%)A)%&A)A)&%A)A)&%A)&%)A%)&A&)A)&%A)%A)&)A%&)A&%A)&%)A%&)%)AA&)A%&)A%)&)&%A)%&A)%A)&A%&))A%&)%A)%A)%&)A)%&A
;o the fina step 5 printing the engths to the s!reen 5 re
-
8/19/2019 python for biologists
101/227
@3 Chapter B: +ists an% oop
ultiple exons from genomic '.4
$his is very simiar to the eer!ises from the previo&s to !hapters, an% so o&r
so&tion to it is going to oo- very simiar. +ets !on!entrate on the ne bit of theprobem first 5 rea%ing the fie of eon o!ations. 4s before, e !an start by opening
&p the fie an% printing ea!h ine to the s!reen:
exon/locations B open(*exons.txt*#for line in exon/locations!
print(line#
$his gives &s a oop in hi!h e are %eaing ith a %ifferent eon ea!h time ro&n%.'f e oo- at the o&tp&t, e !an see that e sti have a neine at the en% of ea!h
ine, b&t e not orry abo&t that for no:
,G
72,133
1J0,27"
3'0,3JG
;o e have to spit &p ea!h ine into a start an% stop position. $he split metho%
is probaby a goo% !hoi!e for this 9ob 5 ets see hat happens hen e spit ea!h
ine &sing a !omma as the %eimiter:
exon/locations B open(*exons.txt*#for line in exon/locations!
positions B line.split(,#print(positions#
$he o&tp&t shos that ea!h ine, hen spit, t&rns into a ist of to eements:
-
8/19/2019 python for biologists
102/227
-
8/19/2019 python for biologists
103/227
@ Chapter B: +ists an% oop
genomic/dna B open(*genomic/dna.txt*#.read(#exon/locations B open(*exons.txt*#
for line in exon/locations!
positions B line.split(,#start B positions$0stop B positions$1
exon B genomic/dna$start!stopprint(*exon is! * C exon#
*nfort&natey, hen e r&n this !o%e e get an error at ine H:
ile *multiple/exons/from/genomic/dna.py*, line 7, in ;module+
exon B genomic/dna$start!stop)ype9rror! slice indices must -e integers or
-
8/19/2019 python for biologists
104/227
@G Chapter B: +ists an% oop
b&t instea% e have a singe exon variabe that stores one eon at a time. 7eres
one ay to get the !ompete !o%ing se
-
8/19/2019 python for biologists
105/227
@H Chapter B: +ists an% oop
coding seuence is ! &%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%
coding seuence is !
&%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A)A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%Acoding seuence is !
&%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A)A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%A&%A)&%A)&%A)&%)A%&)A%&)A%&)A%A)&%A)&A)&A)&%)A%&)A%&)&%A&)A%&)A&%)A&%A)&%A)%&A)&%A)&%)Acoding seuence is !&%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A
)A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%A&%A)&%A)&%A)&%)A%&)A%&)A%&)A%A)&%A)&A)&A)&%)A%&)A%&)&%A&)A%&)A&%)A&%A)&%A)%&A)&%A)&%)A&%A)&%A)&%A)&%A)&%A)&%
A)&%A)&%A)&%A)&%A)&%)A%&a