PYTHON Discussion for 674
description
Transcript of PYTHON Discussion for 674
Python (version 3.x with bash shell syntax) Introduction
References
http://docs.python.org/3/tutorial/
Dive into Python 3
Free online – find the website!
Think like a scientist
Free online – check sources at dive into python site
Probably only Python 2 version available
Why Python?
Public websites offer many bioinformatics tools. Many are quite sophisticated. However, there will be times when you will either have a ton of data or you will need additional analysis. You will thus need to customize a programming tool. Once you are comfortable with a programming tool it will be fairly easy to manipulate your data in the manner best suited for your research.
Python is a high-‐level programming language. It is fairly intuitive (compared to C for example). Python was first released in 1981 by Guido Van Rossum (a fan of the Flying Circus). Guido worked for Google for a time and now is involved in Dropbox. He remains the BDFL of Python.
Python is used by many companies and is incorporated into many applications. Users can write functions, modules to be incorporated into an existing application. It ships with Mac OSX and is easily downloaded for all Unix platforms and nearly any other operating system available. The code is free and open source. Python releases are stable and its development continues to evolve.
Coding in Python is similar to pseudo code so it is easy to learn. The code is readable containing no braces and requiring consistent indentation. The python.org website is excellent for the new learner and the advanced programmer. It’s power lies in its vast number of libraries that are amenable to any application.
Starting Python
First to do is login in to the computer and bring up a terminal window.
In the terminal type: python3.x [return]
NOTE: If you do not have the executable file python3.x in your path you will need to find the executable and type its full path name. You may also add the path to the executable to your PATH environment variable. The best way to do this is with in your bash_profile or bashrc file.
export PATH="/opt/local/bin:”$PATH OR export PATH="/opt/local/bin:${PATH}"
Note the output on the screen. Type >>> license()
What do you learn about the software you are using?
To exit python command interpreter: ctrl-D quit()
Help
The help function, help(), can be used to remind yourself of available functions, usages and definitions. Type:
>>> help() # and follow instructions to search keywords
You can also type for example: >>> help(“finally”)
Your first program in Python
Guess what it will be?
Here are three ways to run a program with python.
1. Interactive python session
Type the following at the Python command line prompt : >>> print(“Hello world!”) [return] Hello World!
This runs python in the interactive mode.
2. At the terminal command line using a python (*.py) file.
Open a vi session (vi hello.py) and edit the file to contain print(“Hello world!”)
Close the file and type: python3.x hello.py
The output should be: Hello World!
3. As a standalone executable
You can also add the 1st line program path option to run the script.
Start a vi session (vi hello.py) and edit the file to contain the following: #!/usr/bin/python3.x #read_file.py #class header: # print(“Hello world!”)
Be sure the path to the code in the first line is correct and that the permissions on the file are correct. Also, don’t forget to add the class headers to all programs you write.
Close the file and type: ./hello.py
The output should be: Hello World!
Additional options to launch python:
exec(open('dir_list.py').read())
The exec command can be executed within an interactive python session.
Variables are then available in the interactive session.
python3.x –i script.py 10 100 1000
-‐i option puts user into interactive mode after running script.py
arguments 10 100 1000 can be accessed within python from sys.argv >>> sys.argv >>> sys.argv[1:] >>> sys.argv[0] >>> sys.argv[2]
Syntax errors
Correct syntax produces no errors: >>> print(“Hello World!”) Hello World!
If you type incorrect syntax a “run time” error is produced: >>> print “Hello World!” File "<stdin>", line 1 print “Hello World!” ^ SyntaxError: invalid character in identifier
In the above case a print statement used with Python version 2.x syntax produces a runtime error in python version 3.x
Run time errors are also called “exceptions”.
Semantic errors
Correct syntax produces an incorrect answer. Program completes without producing a run time error:
>>> print(“Hello Word!”) Hello Word!
Here the phrase is merely misspelled.
Values
Can be letter or number. >>> print(“Hello World!”) Hello World! >>> print(4) 4
Type
Tells whether the value is an integer or a string. Try the following:
Type str, string: >>> type(“Hello World!”)
Type int, integer: >>> type(4)
Type float, floating point number: >>> type(3.14)
syntax of numbers. Make a variable, m1, to be an integer of 1 million (1,000,000). Try the following: >>> m1=1,000,000 >>> m2=1000000
Try printing the variable m1 and m2. Which one worked? The one with commas made the variable m1 a variable of type tuple
Find the type of each variable.
A tuple is a list we will discuss this class type shortly.
Quotes
Quotes mark strings. You can also use single quotes. If you wish to use a double quote in your string and you are defining the string with double quotes then you must “escape” the double quote. Similarly for single quotes.
Text between triple quotes specifies comments as well as text after # character. Similar to the use of # sign in bash.
>>> #You can place COMMENTS after a pound (#) sign >>> “””Or you can place COMMENTS after triple quotes if you have more than one line of text. “””
Variables
Names of variables must follow these rules:
Any length
Letters or numbers (and underscore)
First character must be a letter
In general only lowercase (variables are case sensitive)
Cannot use python reserved names (keywords for example)
OK Error
x=26 123abc = “First three”
abc123=”First three” num_#=26
finally
To list the current variables in a session type: >>> dir() ['__builtins__', '__doc__', '__name__', '__package__'] >>> x=2 >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'x']
This command returns a list of strings containing the objects (variables) in the current session. The list is in alphabetical order and lists attributes reachable from it.
Statements
Statements are executed and return a result (or not).
Print is a statement that returns to stdout the given value of a variable. >>> print(x) 26
An assignment statement (assigning a variable to a value) is executed but the result is not printed. >>> x=4
Expressions
Expressions are similar to those you use in mathematics. >>> 20-4 >>> 3 >>> x >>> x-3 >>> greeting=”Hello World!”
Operators (Operands)
Operators and order of increasing precedence:
+ addition
-‐ subtraction
* mutliplication
/ division
** power
() parentheses
Left -‐>Right flow >>> 3*2**3 24
Additional operators /= addition of an “=” sign to the operator resets the variable to the result // floor divide abs(x) absolute value >>> x=23 >>> x/=2 #variable reset >>> x 11.5 >>> x//2 #floor divide 5.0
String as operands
Strings can also be associated with operands.
>>> s=”abc” >>> s+s # concatenation ‘abcabc’ >>> 3*s # repetition ‘abcabcabc’
Print function
You will find the print function is used often. You should have the variable “greeting” already defined in your session. If not type it again:
>>> greeting=”Hello World!” >>> greeting ‘Hello World!’
Notice the result of typing greeting is given in single quotes. Single quotes are used in the same manner as double quotes. They are printed when typing greeting because the variable is a string. When you type:
>>> print(greeting) Hello World!
The quotes are stripped out as part of the function print.
What happens when you type: >>> print(“greeting”)
What happened?
Multiple variables and text can be printed using comma separation. >>> x=2 >>> y=5 >>> print("values: ",x," ",y," ",x/y)
String formatting
Strings can be formatted and used in print statements. >>> input(“Input data here: “) >>> pi = 3.14159265358979323 >>> print("pi is %f "% pi ,"gives: ", "pi is %.2f" % pi) pi is 3.141593 gives: pi is 3.14 >>> print("pi is %e" % pi) pi is 3.141593e+00 >>> print("{0} is a {1}".format('this', 'test')) this is a test >>> print("{pos1} is a {pos2}".format(pos1 = 'this', pos2 = 'another test')) this is a another test >>> print("{pos1} is a {pos2}i of pi: {0}".format(pi,pos1 = 'this', pos2 = 'another test')) this is a another testi of pi: 3.141592653589793
In the above examples ‘f’ gives fixed point notation and ‘e’ exponential notation. The variables to print are given by either a ‘%’ sign or two braces.
>>> x,y=12,4.2 >>> ("%.2f" % (x/y)) '2.86' >>> ('{0:0.2f}'.format(x/y)) '2.86'
For additional formatting options see:
http://docs.python.org/2/library/string.html
Input function
The input function can be used for making a script interactive with the user. >>> input(“Input data here: “) Input data here: #this is the user prompt for data entry
#if you type 100 here the result will be returned as: ‘100’
The result can be passed to a variable as well. >>> xx=input(“Input data here: “)
Note input requires a string which can be formatted as discussed above.
Type conversion
What is the type of the variable xx above? In Python 3 all results from the input function are type str. Since the input you are asking for is sometimes a number you must convert the string to a number using:
>>> x1=int(input(“Input data here: “)) >>> str1=str(input(“Input string here: “))
Functions
Functions can be declared to isolate steps and simplify the main code. The following is a sample program showing the general format for a python script. You should insert the text into a file and test it.
#!/opt/local/python3.x # #fitch:20120305:test1.py: python example fnc code #Usage: ./test1.py def square(inp1): """ what the function does """ out1=inp1**2 return(out1) def cube(inp2): """ what the function does """ out2=inp2**3 return(out2) #Input data str1=input("Please input a number: ") num_str=float(str1) #calculation ans1=square(num_str) ans2=cube(num_str) print(“The value squared is: “,ans1) print(“The value cubed is: “,ans2)
Import a module
One of the powers of python is the simplicity of adding modules and functions. A module consists of a set of related functions defined in a *.py file. For example a math module you can imagine would consist of several basic math functions (log, cosine, sine, square, exponential, etc). Another module might consist of statistics (number of points, maximum, minimum, average, standard deviation, etc).
Modules have the feature they can be added to the available built in functions as needed. The way to do this is through import.
>>> x=4 >>> log(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'log' is not defined
What happened? The function log is not a built in command. It is part of the math module. You must import the math module before you can use its functions.
>>> import math >>> log(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'log' is not defined
What happened? The function log is not a built in command. It is part of the math module. To tell this to python you must specify the module the function belongs to.
>>> math.log(x) 0.6931471805599453
math.log(x) returns the logarithm of base “e”. Base 10 logarithms can be calculated in two ways.
Try to find the two different ways using the help function. help() then type math then search on log help("math") help("math.log")
There are two other ways you can import functions from a module. from math import *
This syntax allows all functions within the math module to be accessible via the function name only. To find a logarithm you do not need to type math.log you can now just type:
>>> log(x) 0.6931471805599453
A single function from the math module can also be imported. In this case type: from math import log
This statement will only make the log function available from the math module.
NOTE: If modules have duplicate names but different algorithms you should tread cautiously. For example, the power function, pow, is a built-‐in as well as having a counterpart defined in the math module. The built-‐in version will work on integers without conversion to floats. The math module first converts integers to floats then calculate the power expression using a different algorithm. The conversion adds time to the function so if you are just calculating powers of integers the built-‐in version will be faster.
If you type import math you will have both pow and math.pow available. If you type import math * you will overwrite the built-‐in function pow with the math library version. Something you may or may not wish to do.
Additional modules
One of the powers of python is the simplicity of adding modules and functions. A module consists of Two other modules that you will find useful are sys and os modules. The sys module gives access to some environment variables (PATH for instance) and other interpreter export data. The os module gives you quick and easy file manipulation ability within python.
>>> import os >>> print(os.getcwd) /Users/fitch/CODE/COMP_LAB_PYTHON
Other functions can be found using the help() function.
How might you list the contents of a directory? Search … help(“os”) search directory
Try: >>> seq=['G', 'G', 'C', 'C', 'T', 'T', 'C', 'T', 'C', 'G', 'A', 'A', 'T', 'G', 'A', 'A', 'T', 'C'] >>> str='' >>> str.join(seq) 'GGCCTTCTCGAATGAATC'
For loop
For loops are used in the same manner as any other language. In python a for loop is implemented with the syntax
>>> for i in list:
In the os module a function listdir will return a list consisting of the filenames in the directory argument. Type the following at the python command line:
>>> f=os.listdir(os.getcwd()) >>> print(f) ['.dir_list.py.swp', '__pycache__', 'dir_list.py', 'fnc.py', 'humansize.py', 'humansize_inp.py', 'quad.py', 't.py', 'test.py', 'test2.py'] >>> for f in os.listdir(os.getcwd()): ... print(f) # be certain to indent here ! ... .dir_list.py.swp # current vi session __pycache__ # python storage of compiled python scripts (binary) for cross platform use dir_list.py fnc.py humansize.py humansize_inp.py quad.py
In this case the list generated by os.listdir is a standard formatted list that print understands. Thus, only the entries are printed.
TRY IT
Using a for loop type the individual variables within your current python session.
for I in dir(): print(i) EXTEND LATER: to not list attributes (ie if first two characters are __ don’t print) >>> seq=['G', 'G', 'C', 'C', 'T', 'T', 'C', 'T', 'C', 'G', 'A', 'A', 'T', 'G', 'A', 'A', 'T', 'C'] >>> str='' >>> str.join(seq) 'GGCCTTCTCGAATGAATC'
Recursion
A recursive algorithm calls itself.
def factorial(n): if n == 0: return 1 else: return n * factorial(n - 1)
A recursive algorithm must have a termination condition n == 0
And a reduction step where the function calls itself factorial(n - 1)
Another example: def recursive(string, num): print("#%s - %s" (string, num)) recursive(string, num+1)
Also worth noting, python by default has a limit to the depth of recursion available, to avoid absorbing all of the computer's memory. On my computer this is 1000. I don't know if this changes depending on hardware, etc. To see yours :
import sys sys.getrecursionlimit()
and to set it : import sys #(if you haven't already) sys.setrecursionlimit()