M.M. Dalkilic, PhD Monday, September 08, 2008 Class II Indiana University, Bloomington, IN Sequence...

Click here to load reader

download M.M. Dalkilic, PhD Monday, September 08, 2008 Class II Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic,

of 22

description

Computation (review) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 © 3 Algorithm “process or rules for (esp. machine) calculations. The execution of an algorithm must not include any subjective decisions, nor must it require the use of intuition or creativity” [Brassard & Bratley]

Transcript of M.M. Dalkilic, PhD Monday, September 08, 2008 Class II Indiana University, Bloomington, IN Sequence...

M.M. Dalkilic, PhD Monday, September 08, 2008 Class II Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 Outline New Due Dates for Programs New Reading Posted on Website: T-Coffee Readings [Mount] Chap 3, [R] Chaps 3-4 Most Important Aspect of Bioinformaticshomology search through sequence similarity (contd) 2 Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 Computation (review) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 3 Algorithm process or rules for (esp. machine) calculations. The execution of an algorithm must not include any subjective decisions, nor must it require the use of intuition or creativity [Brassard & Bratley] Computation (review) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 4 constant Upper bound starts Upper bound Computation (Next Lecture) Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 5 Divide and Conquer gives rise to Dynamic Programmingthe approach used in sequence comparison General Technique of Divide and Conquer Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 6 General approachto work on more smaller pieces Key point: data is not share between among processes The cost of breaking-down, solving, then reassembling solution is less than working on the solution itself constantwork General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 7 But what if data needs to be shared or the cost of redundancy is too high? Rethink computation: Dynamic Programming or Recursive Optimization Reduce cost of sharing thereby reduce cost of recursion General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 8 Dynamic programming reduces the running time of a recursive function to be at most the time required to evaluate the function for all arguments less than or equal to the given argument, treating the cost of a recursive call as a constant [Sedgewick] o Top-down DP o Bottom-Up DP General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 9 o Top-down DP Create a dictionary of new input-output values are they are encountered; Each time recursion is called, we look-up the entryif its blank, we add it; Otherwise, we continue General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 10 o Top-down DP New input-output pairs encountered General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 11 o Bottom-up DP Simply pre-compute all input-output pairs sequentially; General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 12 o TPD generally easier o Memory isnt so much of an issue o We might not need every entry in the dictionary General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 13 o DP has state variables that keep information about the current state o DP has decision variables that are used for making choices o DP has return function that is optimized General Technique of Dynamic Programming Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 14 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 15 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 16 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 17 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 18 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 19 Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 20 Given that eine, one, and bir all mean 1 in different languages, based on edit distance (sequence similarity) which two words are more related? All that remains is to prove that edit distance is essentially sequence alignment Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 21 A sequence alignment is grid of cells that contain either a single symbol, -, or blank. A sequence alignment looks much like a spreadsheet All that remains is to prove that edit distance is essentially sequence alignment Edit Substitution to Sequence Alignment Sequence Similiarty (Computation) M.M. Dalkilic, PhD SoI Indiana University, Bloomington, IN 2008 22 A scientist then can use sequence alignment and be assured that this is nothing more than window dressing edit distancewhich itself is a kind of distance between sequences Next class, the algorithm for sequence alignments