1
Bioinformatics AlgorithmsLecture 1
© Jeff Parker, 2009
It is always advisable to perceive clearly our ignorance. Charles Darwin
2
Outline
What is this course about?
What do I need to know?
What will I learn?
What tools will I be using?
What is our first task?
3
Outline
Introduce an interesting problem from BiologyApply some Computer Science techniques
Introduce some biological backgroundExplain the motivation for our problemLook at exact pattern match
Find a faster algorithmLook at approximate pattern match
Find a much faster algorithm that uses Dynamic ProgrammingAlgorithm used in tools such as
Basic Local Alignment Search Tool (BLAST)http://www.ncbi.nlm.nih.gov/BLAST/
If time permits, we will consider searching a text for multiple patterns
4
What is Bioinformatics?
The use of techniques from mathematics, statistics, and computer science to solve biological problems
Many activities of the cell can be interpreted as manipulation of strings from a small alphabet
Things we will not be studying
How to use cells to perform computation
How the cells perform the computation
Instead, we will be studying computations that can help us identify
Genes that are similar - pattern matching
Retracing evolutionary history - phylogenetic trees
How some reactions are facilitated - protein folding
5
Pattern Matching
We are interested in exact match and inexact match
6
Pylogenetic Trees
7
Protein Folding
Proteins are defined by a sequence, but their use depends upon their three dimensional shape
Evolution has selected proteins that reliably assume the same shape
8
Biotechnology @ Extension
The Extension school has an ALM in Biotechnology Program
This course serves as one of the Information Technology courses
Requisite – CS 119 (Data Structures)
Comfort reading and writing algorithms
Comfort evaluating their running time
Will run as a standard Lecture course with problem sets
Hope we can have more interactions in class than typical lecture
Edward Freedman, course TF
Broad Institute at MIT
Chair of the Boston chapter of the ACM
The course is new: we expect to hear about your interests
9
What do I need to know?
Enough Biology to understand the central Dogma
Enough Programming to read and write algorithms
A willingness to explore regions that we don’t understand fully
10
What will I learn?
An understanding of how some Bioinformatics tools work
The datasets are huge and the problems intractable (NP Complete)
Thus most algorithms are heuristics (algorithm that may not yield an optimal solution, but finds one quickly)
An appreciation for the strength and weaknesses of certain approaches
An introduction to a wide number of computer algorithms
An introduction to an important new field
11
What tools
Biologists use a number of tools, such as BLAST
Our prime interest will be in understanding how these algorithms work
We will be using a computer language to express algorithms
12
What is our first task?
We will begin with a simple problem to introduce the major ideas
Pattern Matching
Understand the problem
Write some algorithms
Look at the odds
First we need to review some basic Biology
We will not get through all of my notes tonight
13
Central Dogma of Biology
To understand Life, we must understandDNA - holds information on how cell worksRNA - is used to transfer information from DNA and to build Proteins - which form enzymes that are used to signal and regulate all
activity, build key componentsAll three can be viewed as a string of symbols from a small alphabet
DNA - 4 characters: A G T CAdenine, Guanine, Thymine, Cytosine (A-T, C-G)
RNA Like DNA, replacing Thymine with Uracil
Protein - 20 amino Acids - Glycine, Alanine, Valvoline, etc.
14
Central Dogma of Biology
All living organisms are described by 4 letter strings of DNAA-T and G-C form complementary pairs as shown aboveWe are watching replication above – more realistic images later
15
DNA
DNA is a double helix made up of
Sugar Molecule
Phospate Group
a base that holds the information
The two sides are not symmetric
The sugar molecule has 5 carbons
Note special role of carbons 3 and 5
DNA rebuilding proceeds naturally only on the 3 end.
16
Replication and Transcription
We duplicate all of a DNA strandWe transcribe a gene (a section of the strand) to mRNA which is translatednews.bbc.co.uk/2/shared/spl/hi/sci_nat/03/dna50/how_dna_works/html/default.stm
17
Translation
Triplets of RNA (called a codon) describe 20 Amino Acids that are used to build up Proteins.Sample Amino Acids
Leucine, Proline, …
There is redundancy in encoding43 = 64 >> 20
Different codons may yield the same amino acidACT, ACC, ACA, ACG
all yield Threonine C4H9NO3
18
Pattern Match
A known gene may help us understand unknown gene with similar structureProteins with similar makeup may act similarly.
Locating similar genes in different organisms can help us trace lineage.Our Goal
Want to be able to find approximate matches for a gene or protein.Model this as a search for a pattern in a text.
A related problem is looking at similarities between strings.Parallel solution
Problem is hard becauseStrings are very long The set of possible matches is large
We start with a simpler problem: exact match
19
Exact Pattern MatchingThe basis for most exact pattern match follows
Algorithm Line up text and pattern
Compare the two
If they match
Report the position of match
Else
Slide pattern to right and try again
Text
Pattern
20
Python Pattern Match
def find(text, pattern): """Look for pattern in the string text.""" for x in range(len(text)): for y in range(len(pattern)): if (text[x+y] != pattern[y]): break if (y == len(pattern) - 1): return x return -1
print find("This is my wish", "is")
21
Python Pattern Match
def find(text, pattern): """Look for pattern in the string text.""" for x in range(len(text)): for y in range(len(pattern)): if (text[x+y] != pattern[y]): break if (y == len(pattern) - 1): return x return -1
print find("This is my wish", "is")
Define a function
22
Python Pattern Match
def simpleSearch(text, pattern): """Look for pattern in the string text.""" for x in range(len(text)): for y in range(len(pattern)): if (text[x+y] != pattern[y]): break if (y == len(pattern) - 1): return x return -1
print simpleSearch("This is my wish", "is")
>>>print range(4)[0, 1, 2, 3]Space is used rather than { }Note use of ":"
23
Using Python Slice
def simpleSearch2(text, pattern): """Find the pattern in the string text.""" for x in range(len(text)): if (text[x:x+len(pattern)] == pattern): return x return -1
print simpleSearch2("This is my wish", "is")print simpleSearch2("This is my wish", "if")
>>>print text[1:3]hi
24
AnalysisThis algorithm behaves well in practiceThe worst case is bad
For pattern of length NText of length MWorst case is O(NM)
25
Odds of a match –here-
A priori odds of matching two characters: 1/P, where P is the size of the alphabet
A Posteriori: we base the odds on measurement.Say there are 100,000 distinct last names in the Boston Phone Book.What are the odds that two people selected at random have the same
name?Higher than 1/100,000: some names are common – Smith or Parker
26
What are the odds of a match?
What are the odds that a pattern of length N matches at arbitrary spot?(1/P)N
What are the odds that there is no match at a given spot?1 - (1/P)N
Odds of no match first two spots? (Must fail in both spot.)(1 – (1/P)N)*(1 – (1/P)N)
Odds of no match in text of length M? Have M – N + 1 starting spots(1 – (1/P)N)M – N + 1 ~ 1 – (M-N+1)(1/P)N + …
27
Odds of match somewhere
Odds of no match in text of length M. Have M – N + 1 starting spots(1 – (1/P)N)M – N + 1 ~ 1 – (M-N+1)(1/P)N + …
But the odds that there is a match at the next spot are not independent of the outcome at a previous position.
A posteriori, we need to look at the pattern and what we have learnedWe will often be sloppy and use a priori reasoning.One theme that we will encounter multiple times is that there is information contained in the
work we do to find a partial match
28
What are the frequencies?Let's count the frequency of letters in a real gene"""Frequency - count the frequency of each letter in DNA sequence"""text = input("Enter the quoted text: ")print "Saw ", textsymbolCounts = {} # Empty Dictionary# Go over all pairs in the sequencefor x in range(len(text)): ch = text[x] # Increment count if (ch in symbolCounts): symbolCounts[ch] = symbolCounts[ch] + 1 else: symbolCounts[ch] = 1
29
What are the frequencies?Let's count the frequency of letters in a real gene
# Rough printprint symbolCounts
symbols = ['A', 'G', 'C', 'T']
# Pretty Printfor ch in symbols: if (ch in symbolCounts): print ch, symbolCounts[ch] else: print ch, 0
30
What are the frequencies?% python freq.py
Enter the quoted text: AGCTTraceback (most recent call last): File "freq.py", line 3, in <module> text = input("Enter the quoted text: ") File "<string>", line 1, in <module>NameError: name 'AGCT' is not defined
% python freq.py Enter the quoted text: "AGCT"Saw AGCT{'A': 1, 'C': 1, 'T': 1, 'G': 1}A 1G 1C 1T 1
31
What are the frequencies?% wc temp 1 1 1272 temp% python freq.py < tempEnter the quoted text: Saw
ATGAAAGCTTCCTGGGCCTCCTTCCCCATCCTTGCACCTGTAGCCACCGTCAGTGGTGTTTGGAGGCTACAGCTGTTCCGACTGATGCTCATAGGACTCATACATGGTATGTCATCTGTATTCGTGGTGAAAAATGGCTACTGAACAACTTGCACAATGGAAGTCTACTCAAGCTGCCTCCTTGTCAAATTAACATACTAACAGCAGTGATAAAAATGTGACCTTCAACCTGCCCTGTAATTTAGAAGTACTAAATAACAAATGTCGTGGTCAAGGAAATGCT…
{'A': 413, 'C': 270, 'T': 347, 'G': 239}A 413G 239C 270T 347
32
Better Pattern Matching
The Boyer Moore algorithm uses the same basic idea as simple search
Algorithm Line up text and pattern
Compare the two
If they match
Report the position of match
Else
Slide pattern to right and try again
33
Insight
We get the most mileage by looking at right edge
Match text above last letter in patternOnly need to call function compare here
Skip 1
Skip 3
Skip 2
Skip -4
If the text has…
34
Data Structures for Boyer MooreBoyer-Moore preprocesses the pattern and keeps a "skip table"
If you see this character in the text, skip this many places
If the text holds Blue, slide 3
Now Blue in pattern is below Blue in text
The skip table has an entry for each element of the "alphabet" - the 4 nucleotides in our case.
If the character matches the last char of pattern, we compare full string
Skip 1
Skip 3
Skip 2
Skip -4
If the text has…
35
Data Structures for Boyer Moore# Create a dictionary to hold skip table.# Insert skip for last letter.ln = len(pattern)letter = pattern[ln-1] # Last letterd = { letter:(ln) }ln = ln - 1
# Iterate over the pattern, filling out skip table.for x in xrange(len(pattern) - 1):
d[pattern[x]] = lnln = ln - 1
# The last character is special.d[letter] = -1 * d[letter]
Skip 1
Skip 3
Skip 2
Skip -4
If pattern is…
36
Matchdef boyerMooreSearch(text, pattern, d):
ln = len(pattern)x = ln - 1while (x < len(text)):
if (text[x] in d):skip = d[text[x]]
else:skip = ln
if (skip < 0): # Match last charstart = x - ln + 1if (text[start: x + 1] == pattern):
return start # Found it!x = x - skip # Not a match: skip
else:x = x + skip # Ordinary skip.
return -1 # Never found it
Skip 1
Skip 3
Skip 2
Skip -4
37
Inputpattern = input("Enter the pattern in quotes: ")...
% python BoyerMoore.py Enter the pattern in quotes: wikiTraceback (most recent call last):File "BoyerMoore.py", line 21, in <module>
pattern = input("Enter the pattern in quotes: ")File "<string>", line 1, in <module>
NameError: name 'wiki' is not defined
% python BoyerMoore.py Enter the pattern in quotes: "wiki"
38
Analysis of Boyer MooreMuch of the time, we are just looking up an entry in the skip table and
slidingHas a modest setup time to build the skip tableFor some reasonable assumptions, Boyer Moore is sublinear in text length
Does not need to even look at many characters in the testIn the example below, we inspect only 4 items in text
Does better with large alphabetsNot much use for approximate matches
39
Odds in Boyer MooreWhat can we expect from Boyer-Moore?With large alphabet, when we have a missmatch, we hope to slide a long
wayWith a small alphabet, the average length of slide decreasesWe can have a long slide with a small alphabet – just not very likely
In the example below, skip table entry for blue is 15. Typical measure used is the expected length of a slide
40
Rumination on Boyer MooreIs it worth the effort to preprocess the pattern?If we are searching a long text, and it speeds up the search, it is worthwhileKnuth-Morris-Pratt is another algorithm that preprocesses pattern
Looks for repeats in pattern. If we have matched the first instance of a repeat, we don't have to check it again
In the example below, when pattern and text fail, we know that the first two symbols in the pattern will match 3 spaces to the right.
http://www.ics.uci.edu/~goodrich/dsa/11strings/demos/pattern/
Later we will see some algorithms that preprocess the text
41
Mutations
DNA is constantly being transcribed and replicatedSometimes there are transcription errors which lead to mutationsThree types
The Good: Mutation in sickle cell gene provides resistance to malariaThe Bad: Huntington’s disease, a degenerative disease of nervous systemThe Silent: may cause no difference
May result in same Amino AcidMay be part of junk DNA
ATCTAG
ATCGAG
42
Cystic Fibrosis
Cystic Fibrosis (CF) is chronic and frequently fatal genetic disease of the body’s mucus glands. CF primarily affects the lungs of children.
In early 1980s biologists hypothesized that CF is caused by mutations in some gene.
ATP binding proteins are present on cell membranes and act as a transport channel.
In 1989 biologists found similarity between CF Gene and ATP binding proteins.
A mutation was found in 70% of CF patients.Those with CF are missing a single amino acid.
43
Cystic Fibrosis
44
Browse CFTR Gene
Go to www.ensembl.orgSelect human Click on Human GenomeClick on Cromosome 7In Search box on page (not browser) enter CFTRSelect map element (NM_001104950.1)Click on NM_001104950.1 Take you to http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?list_uids=157279742List of publications about the gene, with PubMed linksScroll down to see /translation and ORIGINGene is over 4K base pairs long.
45
Approximate pattern match
In biology, we are often looking for an approximate matchChanges can be viewed as one of three forms
Add a character to pattern Remove a character from patternAlter a character
ATCGGAATG-GA
46
Distance
How can we define how far apart two sequences are?We can take two sequences, and count the number of places they differ
ATCGGTATGGGA
We speak of the Hamming distance. Measures number of substitutions to get from one string to the other
Since mutations can also lead to dropped or added terms, we will also useLevenshtein distance or edit distance
ATCGG-AT-GGA
Smallest number of insertions, deletions, and substitutions required Finding the edit distance is a problem in itself that we will address soonEither satisfies properties for a metric, including the triangle inequality
D(a,c) <= D(a, b) + D(b, c)
47
Approximate pattern match
Example: find pattern ATGGA in text ATCGGA Here are two choices
Can change G in pattern to C and add G to patternWe illustrate add or delete with “-” in text or pattern
ATCGGAATG-GA
Or we could add a C to patternATCGGAAT-GGA
Second version should be cheaperDefine distance between two strings as the sum of the costs of the operations
needed to make them the sameWe assume today that each operation has cost 1. Methods extend to other pricing
schemes
48
Recursive Solution
To find a match between ATGGA and ATCGGA
Try all possible actions on first characters, then compare the rest
Match or replace first characters of each string
Drop first char of text
Drop first char of pattern
Try to match the remainder using recursion
At each step, at least one string is shorter.
ATGGAATCGGA
TGGATCGGA
TGGAATCGGA
ATGGATCGGA
49
Backtracking Solutiondef approxMatch(text, pattern):
"""Find the pattern in the string text."""print "Looking for", pattern, "in text ", texttlen = len(text)plen = len(pattern)if (tlen == 0):
return plenif (plen == 0):
return tlen
match = approxMatch(text[1:tlen], pattern[1:plen])if (text[0] != pattern[0]):
match = match + 1
delt = 1 + approxMatch(text[1:tlen], pattern)delp = 1 + approxMatch(text, pattern[1:plen])
return min(match, min(delt, delp))
print approxMatch("ATGGA", "ATCGGA")
50
Backtracking Solution
51
Problems with backtracking
This finds right answer, but spends time recomputing values
Example ”ATGGA" to ”ATCGGA"
Worse at next level
If we can organize previous results, we can use Dynamic Programming to build a solution from the ground up
ATGGAATCGGA
TGGATCGGA
TGGAATCGGA
ATGGATCGGA
GGATCGGA
TGGACGGA
GGATCGGA
GGAATCGGA
TGGATCGGA
TGGACGGA
TGGATCGGA
ATGGACGGA
GGACGGA
52
DotsList the pattern and text as row and column headings.
Place a dot in each cell where row heading and column heading match.
We will use this idea in other ways…
53
Dotsdef dots(text, pattern):
tlen = len(text)plen = len(pattern)
# Print the text.print " ",for col in xrange(tlen):
print text[col],print ""
for row in xrange(plen):print(pattern[row]),for col in xrange(tlen):
if (text[col] == pattern[row]):print "*",
else:print " ",
print ""
dots("ATGGA", "ATCGGA")
A T G G A A * * T * C G * * G * * A * *
54
Dynamic Programming
2 5 1 6 7 3 2
3 2 6 8 2 9 3
1 7 6 8 5 3 8
8 6 8 3 4 2 1
2 6 3 8 2 3 4
6 7 5 6 8 4 2
6 3 4 6 8 3 6
Our first example of Dynamic Programming is a puzzle
Given an array, pick the path that goes from top to bottom that maximizes the values hit.
Path must descend with every step: cannot meander back up.
55
Puzzle path
Pick a path that goes from top to bottom, and maximizes the sum
Every step must descend.
Here is a sample path, with value
7 + 8 + 8 + 4 + 3 + 8 + 3
This isn't the best we can do. (Tweak the tail of the path to select 8 rather than 3)
What other changes do you see?
How can we find the best? Too many choices to try them all.
2 5 1 6 7 3 2
3 2 6 8 2 9 3
1 7 6 8 5 3 8
8 6 8 3 4 2 1
2 6 3 8 2 3 4
6 7 5 6 8 4 2
6 3 4 6 8 3 6
56
Dynamic Programming
It is easy to find the best path in a two level puzzle
For each new row
For each element of the row
Look at the three nbr in row above: pick the best of them
Store the running total for following round
For each square, we remember
where the path came from (lines)
2 5 1 6 7 3 2
3 2 6 8 2 9 3
1 7 6 8 5 3 8
2 5 1 6 7 3 2
3 2 6 8 2 9 3
1 7 6 8 5 3 8
8 7 12 9 615 16
57
Iteration
At each stage, we build on the previous results.Note that some squares are never selected (the 1 and 2s in first row)Note that some paths are started, and then dropped (3 to 3)
These will never be used againInput to each new round: contents of current row, and the running totals
from previous row. We don't care about prior path yet.For solution: select the largest total in last row, and follow path back.
2 5 1 6 7 3 2
3 2 6 8 2 9 3
1 7 6 8 5 3 8
8 7 12 9 615 16
8 19 21 23 21 19 249
58
Rock Pile GameTwo player gameHave two piles of rocksPlayers take turns.Must take a rock from a pileCan take a rock from each pileIf you take the last rock, you win the game.Is there a winning strategy for the game?Assume we start with two piles of 8 rocks each
59
Rock Pile GameMust take a rock from a pileCan take a rock from each pile
Represent situation as ordered pair, (x, y)If player has (1, 0) left, can win by taking rock
60
Rock Pile GameMust take a rock from a pileCan take a rock from each pileIf player has (1, 0) left, can win by taking rockIf player has (1, 1) left, wins by taking both.
61
Rock Pile GameMust take a rock from a pileCan take a rock from each pileIf player has (2, 0) left has no wining moveMust take one, leaving (1, 0), which is wining move
62
Rock Pile GameMust take a rock from a pileCan take a rock from each pileIf player has (2, 1) left, can take one and leave (2, 0)
63
Rock Pile GameStrategy: try to leave even number of rocks
64
Approximate pattern match
Example: find pattern CTAG in text CCTGHere are two choices
Can change T in pattern to CCTAGCCTG
Or we could add a C to pattern and an A to the textC_TAGCCT_G
65
DP Approximate Pattern Match
Keep a table that stores the best match to substrings
Use stored values to compute next value
C0
C0
T0
G0
C -C
1
CC
0
CC
0
TC
1
GC
1
T --CT
2
C-CT
1
CCCT
1
CTCT
0
CTGCT-
1
A ---CTA
3
C--CTA
2
C-CCTA
2
CT-CTA
1
G ----CTAG
4
C---CTAG
3
C--CCTAG
3
0
This represents thebest we can do matching pattern CTA in text CCT
66
How we build the table
Consider filling in the blank spot in pinkWe have three choicesBuild on pair above, deleting char T in pattern
T-CTCost: 1 + 1 = 2
Build on pair on left, inserting char T from textCCTCT-Cost: 1 + 1 = 2
Match or replace, using pair from upper leftCTCTCost: 0 + 0 (since the T’s in text and pattern match)
We only display the winner
C T
CCC
0
TC
1
TCCCT
1
CTCT
0
13
2
C T
C CC
0
TC
T CCCT
1
1
67
Key Idea
To compute the best match ending at location [i,j] we compute the three values below, pick minimal value, and store it in d[i][j]
insertCost = d[i-1][j] + 1;deleteCost = d[i][j-1] + 1;
if (pattern[i] == text[j])matchCost = d[i-1][j-1];
elsereviseCost = d[i-1][j-1] + 1;
A C
CAC
1
CC
0
TCACT
1
C-CT
1
13
2
68
Compare to Backtracking
if (pattern[i] == text[j])matchCost = d[i-1][j-1];
elsereviseCost = d[i-1][j-1] + 1;
insertCost = d[i-1][j] + 1;deleteCost = d[i][j-1] + 1;
match = approxMatch(text[1:tlen], pattern[1:plen])if (text[0] != pattern[0]): match = match + 1delt = 1 + approxMatch(text[1:tlen], pattern)delp = 1 + approxMatch(text, pattern[1:plen])
A C
CAC
1
CC
0
TCACT
1
C-CT
1
13
2
69
What do we need to store?Possible to compute one row at a time - only need to store prior row
Better to run down cols: they are shorter since pattern is shorter than text
To find best approximate match for pattern of length N in text of length M takes O(NM)
Same as worst case for simple match
C0
C0
T0
G0
C -C
1
CC
0
CC
0
TC
1
GC
1
T --CT
2
C-CT
1
CCCT
1
CTCT
0
CTGCT-
1
A ---CTA
3
C--CTA
2
C-CCTA
2
CT-CTA
1
CTGCTA
1
G ----CTAG
4
C---CTAG
3
C--CCTAG
3
CT--CTAG
2
CT-GCTAG
1
0C
0CC
0C-CT
1CTGCCT-A
2CTGCCTAG
2
A0
AC
1CACT
1C-ACTA
2CTGCACT-AG
3
70
Larger Sample
We don’t need the strings: implicit from the shape of the path
Only need to store the scores
G A T C G C C T G A C G G0 0 0 0 0 0 0 0 0 0 0 0 0
C 1 1 1 1 0 1 0 0 1 1 1 0 1 1T 2 2 2 1 1 1 1 1 0 1 2 1 1 2A 3 3 2 2 2 2 2 2 1 1 1 2 2 2G 4 3 3 3 3 2 3 3 2 1 2 2 2 2
0
Sometimes we have multiple choices that yield the same score
ATCG
CTAG
and
C--G
CTAG
71
Odds
What are the odds of a match between two strings of length k, if we can tolerate one replacement error?
AGCT vs AGTT
Clearly the odds are better than (1/N)k
72
Other Pricing Schemes
We may decide that alternative pricing models are betterOne common assumption is that the first deletion is rare
(expensive) but it is much cheaper to continue to delete
ATC AT- --T GGT GTT
Our basic algorithm can deal with this changeModify the cost of a delete when we are in a cut
Use an Affine Gap Function
73
Problem for next week
The human Genome includes many repetitionsSome of this reflects historySome reflects motifs
The book uses finding motifs as an important example
Our problem: take a string, and look for longest repetitionCome up with as many ideas as you can, and implement someYou may assume that the string is DNA
74
Problem for next week
Write a program that takes a DNA string and counts the frequency of each 2 letter sequence
On line references for PythonThe Python Tutorial at pyton.org (poke around)Dive Into Python
% python freq.py Enter the text: "ACGGTCG"Saw ACGGTCG {'GG': 1, 'AC': 1, 'GT': 1, 'CG': 2, 'TC': 1} A G C TA 0 0 1 0G 0 1 0 1C 0 2 0 0T 0 0 1 0
75
References
Boyer R.S., Moore J.S., “A fast string searching algorithm”, CACM, 20:762-772, 1977
See Knuth, Morris, and Pratt, "Fast Pattern Matching in Strings" in SIAM Journal on Computing, 6(2): 323-350, 1977
The approximate match algorithm is due to Wagner and Fischer, and is described in "The String-to-String Correction Problem", Journal of the ACM 21(1):168-178
Good reference is Computer Algorithms by Sara Baase and Allen Van Gelder, Addison-Wesley
76
Summary
There is a world of interesting problems in Biology
There is great interest in finding solutions
Computer Science can help
Crucial to keep in touch with Biologists about solutions
Not all simplifications are equally valid
Not all matches are meaningful
Many Biologists use the new tools in their research
There is a need for those who understand the algorithms the tools use
Top Related