PYTHON Discussion for 674

12
Python (version 3.x with bash shell syntax) Introduction References http://docs.python.org/3/tutorial/ Dive into Python 3 Free online – find the website! Think like a scientist Free online – check sources at dive into python site Probably only Python 2 version available Why Python? Public websites offer many bioinformatics tools. Many are quite sophisticated. However, there will be times when you will either have a ton of data or you will need additional analysis. You will thus need to customize a programming tool. Once you are comfortable with a programming tool it will be fairly easy to manipulate your data in the manner best suited for your research. Python is a highlevel programming language. It is fairly intuitive (compared to C for example). Python was first released in 1981 by Guido Van Rossum (a fan of the Flying Circus). Guido worked for Google for a time and now is involved in Dropbox. He remains the BDFL of Python. Python is used by many companies and is incorporated into many applications. Users can write functions, modules to be incorporated into an existing application. It ships with Mac OSX and is easily downloaded for all Unix platforms and nearly any other operating system available. The code is free and open source. Python releases are stable and its development continues to evolve. Coding in Python is similar to pseudo code so it is easy to learn. The code is readable containing no braces and requiring consistent indentation. The python.org website is excellent for the new learner and the advanced programmer. It’s power lies in its vast number of libraries that are amenable to any application. Starting Python First to do is login in to the computer and bring up a terminal window. In the terminal type: python3.x [return]

description

python discussion and direction

Transcript of PYTHON Discussion for 674

Python  (version  3.x  with  bash  shell  syntax)        Introduction  

References  

http://docs.python.org/3/tutorial/  

 

Dive  into  Python  3  

  Free  online  –  find  the  website!  

Think  like  a  scientist  

  Free  online  –  check  sources  at  dive  into  python  site  

  Probably  only  Python  2  version  available  

 

Why  Python?      

Public  websites  offer  many  bioinformatics  tools.    Many  are  quite  sophisticated.    However,  there  will  be  times  when  you  will  either  have  a  ton  of  data  or  you  will  need  additional  analysis.    You  will  thus  need  to  customize  a  programming  tool.    Once  you  are  comfortable  with  a  programming  tool  it  will  be  fairly  easy  to  manipulate  your  data  in  the  manner  best  suited  for  your  research.      

Python  is  a  high-­‐level  programming  language.    It  is  fairly  intuitive  (compared  to  C  for  example).    Python  was  first  released  in  1981  by  Guido  Van  Rossum  (a  fan  of  the  Flying  Circus).    Guido  worked  for  Google  for  a  time  and  now  is  involved  in  Dropbox.    He  remains  the  BDFL  of  Python.  

   

Python  is  used  by  many  companies  and  is  incorporated  into  many  applications.    Users  can  write  functions,  modules  to  be  incorporated  into  an  existing  application.    It  ships  with  Mac  OSX  and  is  easily  downloaded  for  all  Unix  platforms  and  nearly  any  other  operating  system  available.    The  code  is  free  and  open  source.    Python  releases  are  stable  and  its  development  continues  to  evolve.  

Coding  in  Python  is  similar  to  pseudo  code  so  it  is  easy  to  learn.    The  code  is  readable  containing  no  braces  and  requiring  consistent  indentation.    The  python.org  website  is  excellent  for  the  new  learner  and  the  advanced  programmer.    It’s  power  lies  in  its  vast  number  of  libraries  that  are  amenable  to  any  application.  

 

Starting  Python  

First  to  do  is  login  in  to  the  computer  and  bring  up  a  terminal  window.      

In  the  terminal  type:  python3.x [return]

 

NOTE:  If  you  do  not  have  the  executable  file  python3.x  in  your  path  you  will  need  to  find  the  executable  and  type  its  full  path  name.    You  may  also  add  the  path  to  the  executable  to  your  PATH  environment  variable.    The  best  way  to  do  this  is  with  in  your  bash_profile  or  bashrc  file.  

export PATH="/opt/local/bin:”$PATH OR export PATH="/opt/local/bin:${PATH}"

 

Note  the  output  on  the  screen.    Type    >>> license()

 

What  do  you  learn  about  the  software  you  are  using?  

 

To  exit  python  command  interpreter:  ctrl-D quit()

 

Help  

The  help  function,  help(),  can  be  used  to  remind  yourself  of  available  functions,  usages  and  definitions.    Type:  

>>> help() # and follow instructions to search keywords

You  can  also  type  for  example:  >>> help(“finally”)

 

Your  first  program  in  Python  

Guess  what  it  will  be?  

Here  are  three  ways  to  run  a  program  with  python.  

1.  Interactive  python  session  

Type  the  following  at  the  Python  command  line  prompt  :  >>> print(“Hello world!”) [return] Hello World!

This  runs  python  in  the  interactive  mode.        

 

2.  At  the  terminal  command  line  using  a  python  (*.py)  file.  

 Open  a  vi  session  (vi hello.py)  and  edit  the  file  to  contain  print(“Hello world!”)

Close  the  file  and  type:  python3.x hello.py

The  output  should  be:  Hello World!

3.  As  a  standalone  executable  

You  can  also  add  the  1st  line  program  path  option  to  run  the  script.      

Start  a  vi  session  (vi hello.py)  and  edit  the  file  to  contain  the  following:   #!/usr/bin/python3.x #read_file.py #class header: # print(“Hello world!”)

Be  sure  the  path  to  the  code  in  the  first  line  is  correct  and  that  the  permissions  on  the  file  are  correct.    Also,  don’t  forget  to  add  the  class  headers  to  all  programs  you  write.  

Close  the  file  and  type:  ./hello.py

The  output  should  be:  Hello World!

     

Additional  options  to  launch  python:  

exec(open('dir_list.py').read())  

The  exec  command  can  be  executed  within  an  interactive  python  session.  

Variables  are  then  available  in  the  interactive  session.  

python3.x  –i  script.py  10  100  1000    

-­‐i  option  puts  user  into  interactive  mode  after  running  script.py  

arguments  10  100  1000  can  be  accessed  within  python  from  sys.argv  >>> sys.argv >>> sys.argv[1:] >>> sys.argv[0] >>> sys.argv[2]

 

Syntax  errors  

Correct  syntax  produces  no  errors:  >>> print(“Hello World!”) Hello World!

If  you  type  incorrect  syntax  a  “run  time”  error  is  produced:  >>> print “Hello World!” File "<stdin>", line 1 print “Hello World!” ^ SyntaxError: invalid character in identifier

In  the  above  case  a  print  statement  used  with  Python  version  2.x  syntax  produces  a  runtime  error  in  python  version  3.x  

Run  time  errors  are  also  called  “exceptions”.  

Semantic  errors  

Correct  syntax  produces  an  incorrect  answer.    Program  completes  without  producing  a  run  time  error:  

>>> print(“Hello Word!”) Hello Word!

Here  the  phrase  is  merely  misspelled.  

Values  

Can  be  letter  or  number.  >>> print(“Hello World!”) Hello World! >>> print(4) 4

Type  

Tells  whether  the  value  is  an  integer  or  a  string.    Try  the  following:  

Type  str,  string:  >>> type(“Hello World!”)

Type  int,  integer:  >>> type(4)

Type  float,  floating  point  number:  >>> type(3.14)

syntax  of  numbers.    Make  a  variable,  m1,  to  be  an  integer  of  1  million  (1,000,000).    Try  the  following:  >>> m1=1,000,000 >>> m2=1000000

Try  printing  the  variable  m1  and  m2.    Which  one  worked?      The  one  with  commas  made  the  variable  m1  a  variable  of  type  tuple  

Find  the  type  of  each  variable.      

A  tuple  is  a  list  we  will  discuss  this  class  type  shortly.    

 

Quotes  

Quotes  mark  strings.    You  can  also  use  single  quotes.    If  you  wish  to  use  a  double  quote  in  your  string  and  you  are  defining  the  string  with  double  quotes  then  you  must  “escape”  the  double  quote.    Similarly  for  single  quotes.  

Text  between  triple  quotes  specifies  comments  as  well  as  text  after  #  character.    Similar  to  the  use  of  #  sign  in  bash.  

>>> #You can place COMMENTS after a pound (#) sign >>> “””Or you can place COMMENTS after triple quotes if you have more than one line of text. “””

 

Variables  

Names  of  variables  must  follow  these  rules:  

Any  length  

Letters  or  numbers  (and  underscore)  

First  character  must  be  a  letter  

In  general  only  lowercase  (variables  are  case  sensitive)  

Cannot  use  python  reserved  names  (keywords  for  example)  

 

OK           Error  

x=26 123abc = “First three”

abc123=”First three” num_#=26

finally

To  list  the  current  variables  in  a  session  type:   >>> dir() ['__builtins__', '__doc__', '__name__', '__package__'] >>> x=2 >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'x']

This  command  returns  a  list  of  strings  containing  the  objects  (variables)  in  the  current  session.    The  list  is  in  alphabetical  order  and  lists  attributes  reachable  from  it.  

 

Statements  

Statements  are  executed  and  return  a  result  (or  not).      

Print  is  a  statement  that  returns  to  stdout  the  given  value  of  a  variable.      >>> print(x) 26

An  assignment  statement  (assigning  a  variable  to  a  value)  is  executed  but  the  result  is  not  printed.  >>> x=4

Expressions  

Expressions  are  similar  to  those  you  use  in  mathematics.  >>> 20-4 >>> 3 >>> x >>> x-3 >>> greeting=”Hello World!”

 

Operators  (Operands)  

Operators  and  order  of  increasing  precedence:  

+     addition  

-­‐     subtraction  

*     mutliplication  

/     division  

**     power  

()     parentheses  

Left  -­‐>Right   flow   >>> 3*2**3 24

Additional  operators  /= addition  of  an  “=”  sign  to  the  operator  resets  the  variable  to  the  result // floor  divide abs(x) absolute  value     >>> x=23 >>> x/=2 #variable reset >>> x 11.5 >>> x//2 #floor divide 5.0

     

String  as  operands  

Strings  can  also  be  associated  with  operands.  

 >>> s=”abc” >>> s+s # concatenation ‘abcabc’ >>> 3*s # repetition ‘abcabcabc’

 

Print  function  

You  will  find  the  print  function  is  used  often.    You  should  have  the  variable  “greeting”  already  defined  in  your  session.    If  not  type  it  again:  

>>> greeting=”Hello World!” >>> greeting ‘Hello World!’

Notice  the  result  of  typing  greeting  is  given  in  single  quotes.    Single  quotes  are  used  in  the  same  manner  as  double  quotes.    They  are  printed  when  typing  greeting  because  the  variable  is  a  string.    When  you  type:  

>>> print(greeting) Hello World!

The  quotes  are  stripped  out  as  part  of  the  function  print.  

What  happens  when  you  type:  >>> print(“greeting”)

What  happened?  

Multiple  variables  and  text  can  be  printed  using  comma  separation.  >>> x=2 >>> y=5 >>> print("values: ",x," ",y," ",x/y)

   

String  formatting  

Strings  can  be  formatted  and  used  in  print  statements.  >>> input(“Input data here: “) >>> pi = 3.14159265358979323 >>> print("pi is %f "% pi ,"gives: ", "pi is %.2f" % pi) pi is 3.141593 gives: pi is 3.14 >>> print("pi is %e" % pi) pi is 3.141593e+00 >>> print("{0} is a {1}".format('this', 'test')) this is a test >>> print("{pos1} is a {pos2}".format(pos1 = 'this', pos2 = 'another test')) this is a another test >>> print("{pos1} is a {pos2}i of pi: {0}".format(pi,pos1 = 'this', pos2 = 'another test')) this is a another testi of pi: 3.141592653589793

 

In  the  above  examples  ‘f’  gives  fixed  point  notation  and  ‘e’  exponential  notation.    The  variables  to  print  are  given  by  either  a  ‘%’  sign  or  two  braces.  

>>> x,y=12,4.2 >>> ("%.2f" % (x/y)) '2.86' >>> ('{0:0.2f}'.format(x/y)) '2.86'

 

For  additional  formatting  options  see:  

http://docs.python.org/2/library/string.html      

Input  function  

The  input  function  can  be  used  for  making  a  script  interactive  with  the  user.  >>> input(“Input data here: “) Input data here: #this is the user prompt for data entry

          #if  you  type  100  here  the  result  will  be  returned  as:  ‘100’

The  result  can  be  passed  to  a  variable  as  well.  >>> xx=input(“Input data here: “)

Note  input  requires  a  string  which  can  be  formatted  as  discussed  above.    

Type  conversion  

What  is  the  type  of  the  variable  xx  above?    In  Python  3  all  results  from  the  input  function  are  type  str.    Since  the  input  you  are  asking  for  is  sometimes  a  number  you  must  convert  the  string  to  a  number  using:  

>>> x1=int(input(“Input data here: “)) >>> str1=str(input(“Input string here: “))

   

Functions    

Functions  can  be  declared  to  isolate  steps  and  simplify  the  main  code.    The  following  is  a  sample  program  showing  the  general  format  for  a  python  script.    You  should  insert  the  text  into  a  file  and  test  it.  

#!/opt/local/python3.x # #fitch:20120305:test1.py: python example fnc code #Usage: ./test1.py def square(inp1): """ what the function does """ out1=inp1**2 return(out1) def cube(inp2): """ what the function does """ out2=inp2**3 return(out2) #Input data str1=input("Please input a number: ") num_str=float(str1) #calculation ans1=square(num_str) ans2=cube(num_str) print(“The value squared is: “,ans1) print(“The value cubed is: “,ans2)

         

Import  a  module  

One  of  the  powers  of  python  is  the  simplicity  of  adding  modules  and  functions.    A  module  consists  of  a  set  of  related  functions  defined  in  a  *.py  file.    For  example  a  math  module  you  can  imagine  would  consist  of  several  basic  math  functions  (log,  cosine,  sine,  square,  exponential,  etc).    Another  module  might  consist  of  statistics  (number  of  points,  maximum,  minimum,  average,  standard  deviation,  etc).  

Modules  have  the  feature  they  can  be  added  to  the  available  built  in  functions  as  needed.    The  way  to  do  this  is  through  import.  

>>> x=4 >>> log(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'log' is not defined

What  happened?    The  function  log  is  not  a  built  in  command.    It  is  part  of  the  math  module.    You  must  import  the  math  module  before  you  can  use  its  functions.  

>>> import math >>> log(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'log' is not defined

What  happened?    The  function  log  is  not  a  built  in  command.    It  is  part  of  the  math  module.    To  tell  this  to  python  you  must  specify  the  module  the  function  belongs  to.  

>>> math.log(x) 0.6931471805599453

math.log(x)  returns  the  logarithm  of  base  “e”.    Base  10  logarithms  can  be  calculated  in  two  ways.      

Try  to  find  the  two  different  ways  using  the  help  function.  help()    then  type  math    then  search  on  log  help("math")  help("math.log")  

There  are  two  other  ways  you  can  import  functions  from  a  module.  from math import *

This  syntax  allows  all  functions  within  the  math  module  to  be  accessible  via  the  function  name  only.    To  find  a  logarithm  you  do  not  need  to  type  math.log  you  can  now  just  type:  

>>> log(x) 0.6931471805599453

A  single  function  from  the  math  module  can  also  be  imported.    In  this  case  type:  from math import log

This  statement  will  only  make  the  log  function  available  from  the  math  module.  

NOTE:    If  modules  have  duplicate  names  but  different  algorithms  you  should  tread  cautiously.    For  example,  the  power  function,  pow,  is  a  built-­‐in  as  well  as  having  a  counterpart  defined  in  the  math  module.    The  built-­‐in  version  will  work  on  integers  without  conversion  to  floats.    The  math  module  first  converts  integers  to  floats  then  calculate  the  power  expression  using  a  different  algorithm.    The  conversion  adds  time  to  the  function  so  if  you  are  just  calculating  powers  of  integers  the  built-­‐in  version  will  be  faster.  

If  you  type  import math  you  will  have  both  pow  and  math.pow  available.    If  you  type  import math *  you  will  overwrite  the  built-­‐in  function  pow  with  the  math  library  version.    Something  you  may  or  may  not  wish  to  do.  

 

Additional  modules  

One  of  the  powers  of  python  is  the  simplicity  of  adding  modules  and  functions.    A  module  consists  of  Two  other  modules  that  you  will  find  useful  are  sys  and  os  modules.    The  sys  module  gives  access  to  some  environment  variables  (PATH  for  instance)  and  other  interpreter  export  data.      The  os  module  gives  you  quick  and  easy  file  manipulation  ability  within  python.      

>>> import os >>> print(os.getcwd) /Users/fitch/CODE/COMP_LAB_PYTHON

Other  functions  can  be  found  using  the  help()  function.  

How  might  you  list  the  contents  of  a  directory?  Search    …  help(“os”)  search  directory  

 

Try:  >>>  seq=['G',  'G',  'C',  'C',  'T',  'T',  'C',  'T',  'C',  'G',  'A',  'A',  'T',  'G',  'A',  'A',  'T',  'C']  >>>  str=''  >>>  str.join(seq)  'GGCCTTCTCGAATGAATC'    

For  loop  

For  loops  are  used  in  the  same  manner  as  any  other  language.    In  python  a  for  loop  is  implemented  with  the  syntax  

>>> for i in list:

In  the  os  module  a  function  listdir  will  return  a  list  consisting  of  the  filenames  in  the  directory  argument.    Type  the  following  at  the  python  command  line:  

>>> f=os.listdir(os.getcwd()) >>> print(f) ['.dir_list.py.swp', '__pycache__', 'dir_list.py', 'fnc.py', 'humansize.py', 'humansize_inp.py', 'quad.py', 't.py', 'test.py', 'test2.py'] >>> for f in os.listdir(os.getcwd()): ... print(f) # be certain to indent here ! ... .dir_list.py.swp # current vi session __pycache__ # python storage of compiled python scripts (binary) for cross platform use dir_list.py fnc.py humansize.py humansize_inp.py quad.py

 

In  this  case  the  list  generated  by  os.listdir  is  a  standard  formatted  list  that  print  understands.    Thus,  only  the  entries  are  printed.  

 

TRY  IT  

Using  a  for  loop  type  the  individual  variables  within  your  current  python  session.  

for  I  in  dir():      print(i)  EXTEND  LATER:  to  not  list  attributes  (ie  if  first  two  characters  are  __  don’t  print)  >>>  seq=['G',  'G',  'C',  'C',  'T',  'T',  'C',  'T',  'C',  'G',  'A',  'A',  'T',  'G',  'A',  'A',  'T',  'C']  >>>  str=''  >>>  str.join(seq)  'GGCCTTCTCGAATGAATC'    

Recursion  

A  recursive  algorithm  calls  itself.      

def factorial(n): if n == 0: return 1 else: return n * factorial(n - 1)

A  recursive  algorithm  must  have  a  termination  condition  n == 0

And  a  reduction  step  where  the  function  calls  itself  factorial(n - 1)

 

Another  example:  def recursive(string, num): print("#%s - %s" (string, num)) recursive(string, num+1)

Also  worth  noting,  python  by  default  has  a  limit  to  the  depth  of  recursion  available,  to  avoid  absorbing  all  of  the  computer's  memory.  On  my  computer  this  is  1000.  I  don't  know  if  this  changes  depending  on  hardware,  etc.  To  see  yours  :  

import sys sys.getrecursionlimit()

and  to  set  it  :  import sys #(if you haven't already) sys.setrecursionlimit()