Dynamic Programming New Diff

17

description

Ideas on a new diff tool based on dynamic programming techniques from gene sequencing

Transcript of Dynamic Programming New Diff

Page 1: Dynamic Programming New Diff
Page 2: Dynamic Programming New Diff

DNA RNA PROTEIN

One gene = one protein

Four bases: ATCG3 bases = 1 amino acid N amino acids = one protein

ATTTACAGATTACCC

ATT TAC AGA TTA CCC

Lys Xxx Yyy Zzz Ccc

Page 3: Dynamic Programming New Diff

ATCGTATACCCGAAT Human genome

AACGTATTCCCAT Fruit fly genome

ATCGTATACCCGAAT

AACGTATTCCC--AT Gaps and Mismatches

Page 4: Dynamic Programming New Diff

C

C

A

A

T

G

T

G

A

TTTCA Match +mMismatch -sGap -d

Needleman-Wunsch Algorithm

Page 5: Dynamic Programming New Diff

-6

C

-5C

-6A

-7

A

-4T

-7G

-3T

33-2G

34-1A

-5-4-3-2-10

TTTCA Match +4Mismatch -1Gap -1

Needleman-Wunsch Algorithm

Page 6: Dynamic Programming New Diff

12

13

14

9

4

-1

-1

-6

C

131010550-5C

189944-1-6A

17

8

3

-2

-2

-7

A

1011611-4T

8833-2-7G

56722-3T

01233-2G

01234-1A

-5-4-3-2-10

TTTCA Match +4Mismatch -1Gap -1

Needleman-Wunsch Algorithm

Page 7: Dynamic Programming New Diff

12

13

14

9

4

-1

-1

-6

C

131010550-5C

189944-1-6A

17

8

3

-2

-2

-7

A

1011611-4T

8833-2-7G

56722-3T

01233-2G

01234-1A

-5-4-3-2-10

TTTCA Match +4Mismatch -1Gap -1

Needleman-Wunsch Algorithm

Page 8: Dynamic Programming New Diff

12

13

14

9

4

-1

-1

-6

C

131010550-5C

189944-1-6A

17

8

3

-2

-2

-7

A

1011611-4T

8833-2-7G

56722-3T

01233-2G

01234-1A

-5-4-3-2-10

TTTCA Match +4Mismatch -1Gap -1

Needleman-Wunsch Algorithm

ACTTTCA-AGTT-CAG

ACTTTCA-AGTTC-AG

Page 9: Dynamic Programming New Diff

GeneticsSequence of nucleic acidsMatching acidsMismatchGaps

Source Code RevisionsSequence of source linesunmodified linesadded or modified linesadded or deleted lines

Double Needleman-Wunsch1 st for line alignments2 nd for detecting modifications

Page 10: Dynamic Programming New Diff

rev. 1.1

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

Rev. 1.2

1 int i;

2 int N;

3 char * ch;

4 for (i=1;i<10;i++)

rev. 1.1

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

Rev. 1.2

1 int i;

2 int N;

3 char * ch;

4 for (i=1;i<10;i++)

added

rev. 1.1

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

Rev. 1.2

1 int i;

2 int N;

3 char * ch;

4 for (i=1;i<10;i++)

modified

added

Page 11: Dynamic Programming New Diff

f o r ( i n t j = 0 ; j < 7

----------------------------------------------------------------

| 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15

f| -1 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

o| -2 3 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5

r| -3 2 7 12 11 10 9 8 7 6 5 4 3 2 1 0

(| -4 1 6 11 16 15 14 13 12 11 10 9 8 7 6 5

j| -5 0 5 10 15 15 14 13 12 16 15 14 13 12 11 10

=| -6 -1 4 9 14 14 14 13 12 15 20 19 18 17 16 15

0| -7 -2 3 8 13 13 13 13 12 14 19 24 23 22 21 20

;| -8 -3 2 7 12 12 12 12 12 13 18 23 28 27 26 25

j| -9 -4 1 6 11 11 11 11 11 16 17 22 27 32 31 30

<| -10 -5 0 5 10 10 10 10 10 15 16 21 26 31 36 35

7| -11 -6 -1 4 9 9 9 9 9 14 15 20 25 30 35 40

Actual score: 40 Theoretical maximum: 40 Solution goodness 1.00

for(int j=0;j<7

for(____j=0;j<7 Solution: S2 is a subset of S1Solution: S2 is a subset of S1

Page 12: Dynamic Programming New Diff

f o r ( i n t j = 0 ; j < 7

----------------------------------------------------------------

| 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15

f| -1 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

o| -2 3 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5

r| -3 2 7 12 11 10 9 8 7 6 5 4 3 2 1 0

(| -4 1 6 11 16 15 14 13 12 11 10 9 8 7 6 5

j| -5 0 5 10 15 15 14 13 12 16 15 14 13 12 11 10

=| -6 -1 4 9 14 14 14 13 12 15 20 19 18 17 16 15

0| -7 -2 3 8 13 13 13 13 12 14 19 24 23 22 21 20

;| -8 -3 2 7 12 12 12 12 12 13 18 23 28 27 26 25

j| -9 -4 1 6 11 11 11 11 11 16 17 22 27 32 31 30

<| -10 -5 0 5 10 10 10 10 10 15 16 21 26 31 36 35

=| -11 -6 -1 4 9 9 9 9 9 14 19 20 25 30 35 35

5| -12 -7 -2 3 8 8 8 8 8 13 18 19 24 29 34 34

Actual score: 34 Theoretical maximum: 45 Solution goodness 0.76

for(int j=0;j<_7

for(____j=0;j<=5 Gaps in both lines AND mismatchGaps in both lines AND mismatch

Page 13: Dynamic Programming New Diff

rev. A

1 char * ch;

2 for (i=1;i<10;i++)

rev. B

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

rev. A

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

4

rev. B

1 char * ch;

2 for (i=1;i<10;i++)

rev. A

1 char * ch;

2 for (int i=1;i<10;i++)

rev. B

1 int i;

2 char * ch;

3 for (i=1;i<10;i++)

added

deleted

added

modified

Page 14: Dynamic Programming New Diff
Page 15: Dynamic Programming New Diff
Page 16: Dynamic Programming New Diff

E.Bair, et.al. Computational Genomics , Stanford UniversityW.H.Press et.al. Numerical Recipes in C++ , Cambridge University Press

J.Neider et.al. OpenGL Programming Guide (Red Book) , Addison-Wesley

X.y OpenGL Tutorial , www.videotutorialsrock.com

Page 17: Dynamic Programming New Diff

This document was created with Win2PDF available at http://www.daneprairie.com.The unregistered version of Win2PDF is for evaluation or non-commercial use only.