Longest common subsequence lcs

45
LONGEST COMMON SUBSEQUENCE (LCS) Group Name:

Transcript of Longest common subsequence lcs

Page 1: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE (LCS)

Group Name:

Page 2: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE

What is Longest common subsequence ?

The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences).

Page 3: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE

Suppose you have a sequenceX = < A,B,C,D,E,F,G>

of elements over a finite set S.

A sequence Y = <B,C,E,G > over S is called a subsequence of X if and only if it can be obtained

from X by deleting elements.

WHAT IS SUBSEQUENCES ?

Page 4: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE WHAT IS COMMON SUBSEQUENCES ?

Suppose that X and Y are two sequences over a set S. If , A=<A,B,C,E,D,G,F,H,K>

B=<A,B,D,F,H,K>then a common subsequence of X and Y could be

Z=<A,F,K>

We say that Z is a common subsequence of X and Y if and only if Z is a subsequence of X Z is a subsequence of Y

Page 5: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE

THE LONGEST COMMON SUBSEQUENCE PROBLEM

Given two sequences X and Y over a set S, the longest common subsequence problem asks to find a common subsequence of X

and Y that is of maximal length.

Page 6: Longest common subsequence  lcs

LONGEST COMMON SUBSEQUENCE

NAÏVE SOLUTION

Let X be a sequence of length m,and Y a sequence of length n.

Check for every subsequence of X whether it is a subsequence of Y, and return the longest common subsequence found.

There are 2m subsequences of X. Testing a sequences whether or not it is a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take O(n2m) time.

Page 7: Longest common subsequence  lcs

FACTS OF LCS

INPUT: two strings

OUTPUT: longest common subsequence

ACTGAACTCTGTGCACT

TGACTCAGCACAAAAAC

Page 8: Longest common subsequence  lcs

FACTS OF LCS

INPUT: two strings

OUTPUT: longest common subsequence

ACTGAACTCTGTGCACT

TGACTCAGCACAAAAAC

Page 9: Longest common subsequence  lcs

FACTS OF LCSBrute Force

X= ABCBDABY= BDCABA

Elements of X is m=7Elements of Y is n=6So, the complexity will calculate by O (n)

Page 10: Longest common subsequence  lcs

FACTS OF LCSBrute Force

Strength

Wide applicability, simplicity Reasonable algorithms for some important

problems such as searching, string matching, and matrix multiplication

Standard algorithms for simple computational tasks such as sum and product of n numbers, and finding maximum or minimum in a list

Page 11: Longest common subsequence  lcs

FACTS OF LCSBrute ForceWeakness

Brute Force approach rarely yields efficient algorithms

Some brute force algorithms are unacceptably slow

Brute Force approach is neither as constructive nor creative as some other design techniques

Page 12: Longest common subsequence  lcs

Facts OF LCS Dynamic programming

a

b

b

a

=

A = a x b matrix

How many operations to compute AB ?

Page 13: Longest common subsequence  lcs

Facts OF LCS Dynamic programming

a

b

b

c

=

Page 14: Longest common subsequence  lcs

Facts OF LCS Dynamic programming

a

b

b

a

=

Need to compute = O (ab)

Page 15: Longest common subsequence  lcs

Work Examples

Page 16: Longest common subsequence  lcs

To Compare DNA of two (or more ) Different organisms

Page 17: Longest common subsequence  lcs

EXAMPLEAssume two DNA sequence

X = {ATGCTTC}Y = {GCTCA}

Page 18: Longest common subsequence  lcs

LCS EXAMPLE X = {ATGCTTC}Y = {GCTCA}

A T G C T T C

GCTCA

1 2 3 45 6 7

1

2

3

4

5

Yj

Xi0

0

Page 19: Longest common subsequence  lcs

LCS EXAMPLE

A T G C T T C0 0 0 0 0 0 0 0

G 0C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA} 1 2 3 4

5 6 7

1

2

3

4

5

Yj

Xi0

0

Z[j,i]

Here I = 1, j = 1

Z[1,1]

Page 20: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y

A G

Not Match

1 2 3 45 6 7

0

0

Z[1,1]

Z[j-1, i]=Z[1-1, 1]= Z[0,1]

Z[j, i-1]=Z[1, 1-1]= Z[1,0]

Maximum of two boxz[J-1, i] and [J, i-1]

1

2

3

4

5

Page 21: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y

A G

Not Match

Lets Take from Upper one

Arrow indicate from where you Take the maximum.

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 22: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

T G 0

Not Match

Lets Take from left one

Arrow indicate from where you Take the maximum.

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 23: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

G G

Match

arrow

When match arrow will be diagonal because we will increment the value of this cell Z[i-1, j-1] + 10 = 1

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 24: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

G G

Match

arrow

Incremented value X[i-1] Y[j-1]

1 2 3 45 6 7

1

2

3

4

5

0

0

Z[I,j] = Z[3,1]

Page 25: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

C G 1

Not Match

Lets Take from left one

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 26: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

T G 1

Not Match

Lets Take from left one

arrow

0

0

1 2 3 45 6 7

1

2

3

4

5

Page 27: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

T G 1

Not Match

Lets Take from left one

arrow

0

0

1 2 3 45 6 7

1

2

3

4

5

Page 28: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

C G 1

Not Match

Lets Take from left one

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 29: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

A C 0

Not Match

Lets Take from left one

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 30: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

A C 0

Not Match

Lets Take from Upper one

arrow

0

0

1 2 3 45 6 7

1

2

3

4

5

Page 31: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

G C 1

Not Match

Lets Take from left one

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 32: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

C C

Match

arrow

Increment Z[i-1,j-1]

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 33: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2T 0C 0A 0

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

T C 2

Not Match

Lets Take from left one

arrow

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 34: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

XiX Y Max

T G 1

Not Match

Lets Take from left one

arrow

In the same way…

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 35: Longest common subsequence  lcs

Traceback Approach

Page 36: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

XiFirstly have to point out highest value

For left and upper arrow we will follow the direction

For diagonal arrow we will point out the character for this cell.

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 37: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= G

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 38: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= GC

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 39: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= GCT

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 40: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= {GCTC}

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 41: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

XiFirstly have to point out highest value

For left and upper arrow we will follow the direction

For diagonal arrow we will point out the character for this cell.

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 42: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= C

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 43: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= TC

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 44: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= CTC

1 2 3 45 6 7

1

2

3

4

5

0

0

Page 45: Longest common subsequence  lcs

LCS EXAMPLE

Xi A T G C T T CYJ 0 0 0 0 0 0 0 0G 0 0 0 1 1 1 1 1C 0 0 0 1 2 2 2 2T 0 0 1 1 2 3 3 3C 0 0 1 1 2 3 3 4A 0 1 1 1 2 3 3 4

X = {ATGCTTC}Y = {GCTCA}

Yj

Xi

LCS Z= {GCTC}

1 2 3 45 6 7

1

2

3

4

5

0

0