CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is...
-
Upload
holly-singleton -
Category
Documents
-
view
216 -
download
1
Transcript of CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is...
![Page 1: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/1.jpg)
CS 312: Algorithm Design & Analysis
Lecture #24: Optimality,
Gene Sequence Alignment
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Slides by: Eric Ringger, with contributions from Mike Jones, Eric Mercer, Sean Warnick
![Page 2: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/2.jpg)
Announcements
Homework #15 due now
Project #5: Gene Sequence Alignment Kick-off: today Read directions now Whiteboard experience: due Monday Early: Monday after mid-term exam Due: Wednesday after mid-term exam
Mid-term Exam Start preparing your one page of notes Must be prepared by you. No cutting and pasting.
![Page 3: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/3.jpg)
Objectives
Revisit the main ideas behind Dynamic Programming
Define the optimality property for DP Develop the algorithm for gene sequence
alignment (or at least begin) Prepare for Project #5
![Page 4: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/4.jpg)
Dynamic Programming
The six steps:1. Ask: am I solving an optimization problem?2. Devise a minimal description (address) for any problem
instance and sub-problem3. Divide problems into sub-problems: define the recurrence to
specify the relationship of problems to sub-problems4. Check that the optimality property holds: An optimal
solution to a problem is built from optimal solutions to sub-problems.
5. Store results – typically in a table – and re-use the solutions to sub-problems in the table as you build up to the overall solution.
6. Back-trace / analyze the table to extract the composition of the final solution.
![Page 5: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/5.jpg)
Optimality Property
An optimal solution to a problem is built from optimal solutions to sub-problems.
The optimality property is a necessary condition for solving an optimization problem by DP! It allows us to store and re-use optimal results to
sub-problems.
![Page 6: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/6.jpg)
Optimality
A
B C
E F G H I
D
J K
1
2
1
2
( ( ))
( ( ))( ) min ( max)
...
( ( ))nn
f optimalsolution child
f optimalsolution childoptimalsolution parent or
f optimalsolution child
![Page 7: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/7.jpg)
Shortest Path
American Fork
Orem
Provo
Sundance
Geneva
20
1012
3
15
18
10
12
Goal: the shortest path from AF to Provo.
Does this problem exhibit the optimality property? Pair up. Discuss
![Page 8: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/8.jpg)
Questions
Q. In general, do you know whichsub-problem solutions to use in advance?
A. No. So a very greedy algorithm is not an option. (But Dijkstra’s is.) Q: How does having a table of intermediate shortest path results help
find the shortest path from AF to Provo? A: Reuse those results for intermediate destinations as you try
different routes. Q. Do you have to reconsider alternative sub-optimal solutions for the
intermediate destinations? A. No
Thus,, the Optimality Property holds Therefore, the shortest path problem can be solved by DP.
American Fork
Orem
Provo
Sundance
Geneva
20
10
12
3
15
18
10
12
![Page 9: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/9.jpg)
Optimality in Driving
The shortest route from American Fork to Provo passes through Orem.
Assume we have found this route.
Then what can we say about the shortest route from AF to Orem?
It follows that optimal route from AF to Provo.
Could it be otherwise?
![Page 10: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/10.jpg)
A related problem
Now suppose you drive from AF to Orem as fast as you canon your way to Provo,
But you are limited by the gas in your tank.
![Page 11: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/11.jpg)
Does the Optimality Property Hold?
AF Orem Provo
Goal: get to Provo in as little time as possible. No refueling.Does this problem (formulation) satisfy the optimality property or not? Why?
5/9
10/5
20/1
5/9
10/5
20/1
“takes 20 minutes using1 gallon of gas”
Start with 10 gallons
![Page 12: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/12.jpg)
Problem Solving Advice
Start by asking: which sub-problems should be solved? If you know how to choose in advance using local
information only, then greedy might work.
Else if sub-problems don’t overlap, then divide and conquer would be a good choice.
Else if the optimality property holds, then DP is a good choice.
Else the optimality property does NOThold, so apply another strategy.
(Stay tuned for more guidance)
Important!
![Page 13: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/13.jpg)
x=ACGCTGA y=ACTGT
Gene Sequence Alignment
![Page 14: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/14.jpg)
Virtually Identical Problems
Edit Distance aka Levenshtein Distance
Sequence Alignment E.g., Gene Sequence Alignment
Fundamentally the same thing! We’re focusing on gene sequence
alignment.
![Page 15: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/15.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Contrast the 2 perspectives.
![Page 16: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/16.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
x: ACGCT-Cy: A--CTGT
Alignment Example:
The ‘-’ is a “gap”
![Page 17: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/17.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
x: ACGCT-Cy: A--CTGT
Divide intoPairs
![Page 18: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/18.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Type: Match; Cost = cmatch
x: ACGCT-Cy: A--CTGT
Each Pair hasa type and a cost
![Page 19: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/19.jpg)
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Type: Insertion into x (= deletion from y) aka “indel”; Cost = cindel
![Page 20: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/20.jpg)
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
![Page 21: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/21.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Type: Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
![Page 22: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/22.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
![Page 23: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/23.jpg)
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
How would you solve this problem?
![Page 24: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/24.jpg)
Solution Ideas Enumerate all and score
Pro: Easy to code Pro: Optimal Con: exponential
Greedy: work from left to right, gobbling up matches and inserting gaps or allowing substitutions as necessary Pro: Easy Pro: Linear = fast / efficient Con: not optimal
DP Pre-req: optimality property Pre-req: define addressable sub-problems Pre-req: determine relationship between problem and sub-problems Pro: Optimal Con: ?
Divide and Conquer?
![Page 25: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/25.jpg)
Designing the DP Algorithm for Gene Sequence Alignment
![Page 26: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/26.jpg)
DP?
Define each sub-problem to be the best score for aligning the first bases of sequence with the first bases of sequence
Does that suffice as a minimal description?
In those terms, what is our objective function? minimize
Can we divide this problem into sub-problems? How many? Hint: how many sub-problems are one step away from ?
![Page 27: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/27.jpg)
Example: Sub-problems
x=ACGCTGA y=ACTGT
![Page 28: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/28.jpg)
Example: Sub-problems
x=ACGCTGA y=ACTGT
![Page 29: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/29.jpg)
To be continued in Lecture #25
![Page 30: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649f095503460f94c1dabb/html5/thumbnails/30.jpg)
Assignment
HW #16
Read Section 6.3, if you haven’t done so already.
Thursday: Screencast & Quiz