DNA Structure Notation Operations

63
1 DNA DNA Structure Structure Notation Notation Operations Operations Vincenzo Manca Vincenzo Manca Dipartimento di Dipartimento di Informatica Informatica Universita’ di Verona Universita’ di Verona

description

DNA Structure Notation Operations. Vincenzo Manca Dipartimento di Informatica Universita’ di Verona. 10 Years of Molecular Computing. 1994 Adleman’s Experiment * 1995 Lipton’s Model * 1996 Int. Conf. on Math. Linguistics (Marcus) 1997 Mangalia (Paun, Head) - PowerPoint PPT Presentation

Transcript of DNA Structure Notation Operations

Page 1: DNA  Structure  Notation Operations

1

DNA DNA Structure Structure NotationNotation

OperationsOperations

Vincenzo MancaVincenzo Manca

Dipartimento di InformaticaDipartimento di Informatica

Universita’ di VeronaUniversita’ di Verona

Page 2: DNA  Structure  Notation Operations

2

10 Years of Molecular Computing10 Years of Molecular Computing 1994 Adleman’s Experiment *1994 Adleman’s Experiment * 1995 Lipton’s Model *1995 Lipton’s Model * 1996 Int. Conf. on Math. Linguistics (Marcus)1996 Int. Conf. on Math. Linguistics (Marcus) 1997 Mangalia (Paun, Head)1997 Mangalia (Paun, Head) 1998 MFCS Brno (Molecular Computing)1998 MFCS Brno (Molecular Computing) 1999 (Paun’s WMC)1999 (Paun’s WMC) 2000 DNA6 Leiden *2000 DNA6 Leiden * 2001 DNA7 Tampa (FL) : 3-SAT2001 DNA7 Tampa (FL) : 3-SAT 2002 DNA8 Sapporo : DNA Duplication 2002 DNA8 Sapporo : DNA Duplication 2004 DNA10 Milan : XPCR Extraction2004 DNA10 Milan : XPCR Extraction 2005 DNA11 Ontario : XPCR Recombination 2005 DNA11 Ontario : XPCR Recombination

Page 3: DNA  Structure  Notation Operations

3

DNA Computing MottoDNA Computing Motto

Problem: Data and RequirementsProblem: Data and Requirements Algorithm: SolutionsAlgorithm: Solutions

Encode data by DNA strandsEncode data by DNA strands Encode algorithms by biotech proceduresEncode algorithms by biotech procedures Decode final strands as solutionsDecode final strands as solutions

Page 4: DNA  Structure  Notation Operations

4

A General schema of combinatorial problemA General schema of combinatorial problem

A set of Requirements for “assignments”, that is, A set of Requirements for “assignments”, that is, sequences 0/1 of some length nsequences 0/1 of some length n

The Space of possible solutions has E(2,n) elements, The Space of possible solutions has E(2,n) elements, but only some of them satisfy the requirementsbut only some of them satisfy the requirements

Encode assignments by DNA strandsEncode assignments by DNA strands

Encode requirements as biotech protocols that filterEncode requirements as biotech protocols that filterthe strands encoding the true solutionsthe strands encoding the true solutions

Page 5: DNA  Structure  Notation Operations

5

Space GenerationIn linear time

Solution ExtractionIn linear time

!!!

Page 6: DNA  Structure  Notation Operations

6

New Trends in DNACNew Trends in DNAC

o DNA Self Assembly (Seeman, Winfree, …)DNA Self Assembly (Seeman, Winfree, …)

o DNA Automata (Shapiro)DNA Automata (Shapiro)

o DNA Algorithms ==> new biotech protocolsDNA Algorithms ==> new biotech protocols

Page 7: DNA  Structure  Notation Operations

7

Biotech ProtocolsBiotech Protocols

AlgorithmsAlgorithms

DNA ComputingComputing DNA

A change of perspective

Page 8: DNA  Structure  Notation Operations

8

In the search for implementing algorithms on In the search for implementing algorithms on DNA, general algorithmic principles are DNA, general algorithmic principles are discovered in fundamental biomolecular discovered in fundamental biomolecular processes.processes.

Page 9: DNA  Structure  Notation Operations

9

1’

2’3’

4’

5’ O

P

B

CH2

1’

2’ 3’

4’

5’O CH2OH

H

1’

2’3’

4’

5’ O

B

CH2OH

B

NucleotidesNucleotides~330 Dalton

1 Dalton = 1.64 10-24

1 g. H = 6.2 1023

1’--- 1’ = ~ 1nm

A few grams of DNA = the amount of all electronic information stored in all the world

--------

Page 10: DNA  Structure  Notation Operations

10

StringsStrings Strings over an alphabet are Strings over an alphabet are sequencessequences of of

symbols of the alphabet : symbols of the alphabet :

abbabbbaabbabbba

On strings a On strings a concatenationconcatenation associative associative operation - - is definedoperation - - is defined

(()) = = (()) = = = =

A language L is a set of stringsA language L is a set of strings

Page 11: DNA  Structure  Notation Operations

11

DNA Sequences are DNA Sequences are Mobile Double StringsMobile Double Strings

B B = {A, T, C, G} = {A, T, C, G}

B* = B* = strings over strings over BB

[i,j] [i,j]

||||

s is a s is a -strand -strand oror s : s : or or type(s )=type(s )= :n :n or or mult(mult()=n)=n

Page 12: DNA  Structure  Notation Operations

12

Complementation Complementation - - c c (involutive(involutive))

Reverse Reverse rev rev (involutive)(involutive)MirrorMirror mir mir ((involutiveinvolutive))

mirmir(()= )= revrev((cc) )

Reverse and ComplementationReverse and Complementation commutecommute

HybridizationHybridization ||||] [] [] ] [ [

PairingPairing

Page 13: DNA  Structure  Notation Operations

13

B B = {A, T, C, G} = {A, T, C, G} BBBB* = * = strings over strings over BB : fraction notation : fraction notation

Axiom :Axiom : = = rev(rev() ) rev( rev())

extext

Overlap Overlap ----xx-- --

overlapping concatenation overlapping concatenation ZZ-> up <- -> up <- down down-> -> ->/ ->/ = = ->/ ->/

Page 14: DNA  Structure  Notation Operations

14

BilinearityBilinearityComplementarityComplementarityAntiparallelismAntiparallelism

The marvelous formThe marvelous form

5’

3’

Page 15: DNA  Structure  Notation Operations

15

Hybridization :Hybridization : || || mirmir(())

] ] [ [ <==> <==> , , mirmir(())

] [] [ <==> <==> ] ] [ [ for some for some

Pairing :Pairing : ] [] [ ==> ==> / / revrev(() )

Page 16: DNA  Structure  Notation Operations

16

NotationNotation / / = = = = ->->

/ / mirmir(() = <) = <>>

/ / = = revrev(() = <- ) = <-

===> <===> <> = <> = <mirmir(()>)>

BB* BB* is the set of DNA strings , is the set of DNA strings , BB* BB* B* B*

Page 17: DNA  Structure  Notation Operations

17

Page 18: DNA  Structure  Notation Operations

A pool P of DNA molecules is a A pool P of DNA molecules is a multiset of strandsmultiset of strands

i) Set of strands typed by strings i) Set of strands typed by strings

ii) Set of strings with multiplicitiesii) Set of strings with multiplicities

P = {s1:P = {s1:1 , s2:1 , s2:2, ….}2, ….}

P = {P = {1: n1 , 1: n1 , 2: n2, ….}2: n2, ….}

multmultPP((1) = n1 , mult1) = n1 , multPP ( (2) = n2 2) = n2

s s P P

PP

Page 19: DNA  Structure  Notation Operations

19

Types of DNA Pools are Types of DNA Pools are Languages of BB*Languages of BB*

Type(T) = {Type(T) = { BB* | s : BB* | s : , s , s T } T }

Page 20: DNA  Structure  Notation Operations

20

Test Tube Operations in DNACTest Tube Operations in DNAC Denature (Melting)Denature (Melting) Renature (Hybridization, Annealing)Renature (Hybridization, Annealing) MixMix SplitSplit fish (by Affinity)fish (by Affinity) RemoveRemove lengthlength Separate (Gel Electrophoresis)Separate (Gel Electrophoresis) Ligate (Ligase)Ligate (Ligase) Extend (Polymerase) Extend (Polymerase) Synthetize (Oligos)Synthetize (Oligos) InfixInfix

Page 21: DNA  Structure  Notation Operations

21

SSTRANDTRAND H HYBRIDIZATIONYBRIDIZATION

Page 22: DNA  Structure  Notation Operations

22

Page 23: DNA  Structure  Notation Operations

23

Page 24: DNA  Structure  Notation Operations

24

Polymerase ExtensionPolymerase Extension

Page 25: DNA  Structure  Notation Operations

25

DNA LigaseDNA Ligase

’ ’

’ ’

Ligase Joins 5' phosphateto 3' hydroxyl

’ ’

Page 26: DNA  Structure  Notation Operations

26

Ligase CatenationLigase Catenation

Page 27: DNA  Structure  Notation Operations

27

BufferGel

Electrode

Electrode

Samples

Slower

GEL ELECTROPHORESIS – Separation of DNAfragments

Page 28: DNA  Structure  Notation Operations

28

More Complex OperationsMore Complex Operations

Amplification (PCR)Amplification (PCR)

SequencingSequencing

Restriction (R. Enzymes)Restriction (R. Enzymes)

Clonation (Plasmide Transinfection)Clonation (Plasmide Transinfection)

Page 29: DNA  Structure  Notation Operations

29

PCR: Polymerase Chain PCR: Polymerase Chain ReactionReaction

Page 30: DNA  Structure  Notation Operations

30ExponentialLinear

h(h(

long short

PCR with 3’ sticky endPCR with 3’ sticky end

Page 31: DNA  Structure  Notation Operations

31

PCR LemmaPCR Lemma

Given a pool P of type {Given a pool P of type {} and two primers } and two primers , , that that hybridize with hybridize with and and respectively ( respectively ( ] ][ [ ). ).

If the extensions e1 and e2 of the two primers with the If the extensions e1 and e2 of the two primers with the relative single strands overlap, then an exponential relative single strands overlap, then an exponential amplification of amplification of strands happens which has the blunt strands happens which has the blunt form :form :

<e1 <e1 ZZ exte2> exte2>

which appears within the first two steps.which appears within the first two steps.

Page 32: DNA  Structure  Notation Operations

32

T of

type L

Operation

T’ of type L’

Page 33: DNA  Structure  Notation Operations

33

MathematicallyMathematicallyTest Tube Operations Test Tube Operations

Type (T) = LType (T) = L means thatmeans that

Types of strands of T constitute the language LTypes of strands of T constitute the language L

Given some test tubes as arguments with some typesGiven some test tubes as arguments with some types

provide as resultsprovide as results

Test tubes with other typesTest tubes with other types

Page 34: DNA  Structure  Notation Operations

34

Page 35: DNA  Structure  Notation Operations

35

DNA Test Tube MachineDNA Test Tube Machine

Register Machines where:Register Machines where:

- Registers are Test Tubes- Registers are Test Tubes(multisets of strands instead of numbers)(multisets of strands instead of numbers)

- DNA Test Tubes operations- DNA Test Tubes operations(instead of arithmetic operations)(instead of arithmetic operations)

Page 36: DNA  Structure  Notation Operations

36

Adleman’s ProblemAdleman’s Problem

Given a Graph (of seven nodes)

Find (if there are)The paths from two given nodes (0,6)

Passing once for every node(hamiltonian paths)

Page 37: DNA  Structure  Notation Operations

37

Adleman - Lipton’s Extract ModelAdleman - Lipton’s Extract ModelIn Combinatorial ProblemsIn Combinatorial Problems

The Generation of all possible solutionsin linear time

The Extraction of true solutionsin linear time

Extraction is performed in a number of sub-steps and each of them selects all the strands that include a sub-strand of a given type

Page 38: DNA  Structure  Notation Operations

38

Adleman’s Graph

Page 39: DNA  Structure  Notation Operations

39

ic jc

Node i = i i

Arc ij = mir(i j)

Ai BiBj

Bj’ Ai’

i i

Adleman’s EncodingAdleman’s Encoding

|i| = |i| = 10 i , j = 1, …, 7

Page 40: DNA  Structure  Notation Operations

40

Adleman’s AlgorithmAdleman’s Algorithm

Generation of hamiltonian paths from v1 to v7 Generation of hamiltonian paths from v1 to v7

Generate paths of G (hybridization/ligation)Generate paths of G (hybridization/ligation)Perform PCR of primers Perform PCR of primers 0, mir(6)Separate paths of length 140 (7 x 20)Separate paths of length 140 (7 x 20)forfor J := 1 J := 1 toto 7 7 dodo Select strands where Select strands where jj occurs occursoutputoutput remaining strands remaining strands

Page 41: DNA  Structure  Notation Operations

41

MIX and Split MethodMIX and Split Method

Generation of space solutions of N variablesGeneration of space solutions of N variables

Merge X1 and Merge X1 and X1 in a tube TX1 in a tube TSplit T into A and BSplit T into A and BFor J := 2 To NFor J := 2 To N

Extend strands of A with XJExtend strands of A with XJExtend strands of B with Extend strands of B with XJXJMerge A and B into TMerge A and B into TSplit T into A and BSplit T into A and B

Merge A and BMerge A and B

Page 42: DNA  Structure  Notation Operations

42

Lipton’s Algorithm 3-Sat(N, M)Lipton’s Algorithm 3-Sat(N, M)

o Generate N-space solutions in TGenerate N-space solutions in To For J = 1 To MFor J = 1 To M

T1 := Extract [T, L(1,J)]T1 := Extract [T, L(1,J)] T := T - T1T := T - T1 T2 := Extrtact[T , L(2,J)]T2 := Extrtact[T , L(2,J)] T := T - T2 T := T - T2 T3 := Extract[T , L(3,J)]T3 := Extract[T , L(3,J)] T := Merge(T1, T2)T := Merge(T1, T2) T := Merge(T, T3)T := Merge(T, T3)

o Detect TDetect To ifif T T , , thenthen take a clone and sequence it (Solution) take a clone and sequence it (Solution)o elseelse “Unsolvable Problem” “Unsolvable Problem”

Page 43: DNA  Structure  Notation Operations

DNA ExtractionDNA ExtractionStrands of type Strands of type are called are called -strands-strands

(or instances of (or instances of ))

A A -strand with -strand with including including as substring is as substring is called a called a -superstrand (-superstrand ( is a is a -superstring)-superstring)

Problem: Problem:

Extract all the Extract all the -superstrands of a pool P-superstrands of a pool P

Page 44: DNA  Structure  Notation Operations

A Formulation of the DNA A Formulation of the DNA Extraction ProblemExtraction Problem

Given an input pool PP of heterogeneous DNA strands with the same length and with the same prefix and suffix, and given a string

Provide an output pool PP[[]] such that all and only the types of -superstrands of PP are represented in PP[[]] .

Page 45: DNA  Structure  Notation Operations

In other words, extraction of In other words, extraction of -superstrands of -superstrands of P meansP means

To provide a pool PTo provide a pool P[[]] such that for any two such that for any two strings strings : :

P <==> P <==> P P[[]]

i.e. the strings represented in Pi.e. the strings represented in P[[]] are all and are all and only the only the -superstrings belonging to P.-superstrings belonging to P.

Page 46: DNA  Structure  Notation Operations

4646

Cross Pairing PCRCross Pairing PCR

ShortlyShortly

XPCRXPCR

Page 47: DNA  Structure  Notation Operations

47

XPCR provides an efficient method for affix concatenationof double strands (Head’s null context splicing rule)

N.B. Genome Sequencing is related to Affix Concatenation Closure

Page 48: DNA  Structure  Notation Operations

Melting + Hybridization

Polymerase Extension

h()

Page 49: DNA  Structure  Notation Operations

Melting + Hybridization

Polymerase Extension

h()

Page 50: DNA  Structure  Notation Operations

50

Page 51: DNA  Structure  Notation Operations

Linear Amplification

h()

h()

Linear Amplification

Exponential Amplification

h()

Page 52: DNA  Structure  Notation Operations

52

Page 53: DNA  Structure  Notation Operations

53

XPCR was tested in XPCR was tested in many different situations many different situations

in pools generated by recombination of 22 in pools generated by recombination of 22 strands of lengths between 10 - 20strands of lengths between 10 - 20

Page 54: DNA  Structure  Notation Operations

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

RhoA XPCRRhoA XPCR

Lane 2: RhoA of 582 bpLane 3: of 253 bpLane 4: XPCR of 582+253 -229 = 606 bp Starts at position -229 of RhoA

Page 55: DNA  Structure  Notation Operations

55

XPCR DNA ExtractionXPCR DNA Extraction XPCR-Extract(P, XPCR-Extract(P, )) L:= length(P) , R1 := L:= length(P) , R1 := , R2 := , R2 := For eachFor each n n L L dodo

Q := separate(P, n)Q := separate(P, n)P := infix(Q, P := infix(Q, , , ))(P1, P2) := split(P)(P1, P2) := split(P)P1 := PCR(P1, P1 := PCR(P1, , , ))For eachFor each m < n m < n dodo R1 := R1 + separate(P1, m) R1 := R1 + separate(P1, m)P2 := PCR(P2, P2 := PCR(P2, , mir(, mir())))For eachFor each m < n m < n dodo R2 := R2 + separate(P2, m) R2 := R2 + separate(P2, m)Q := mix(R1, R2)Q := mix(R1, R2)Q := PCR(Q, Q := PCR(Q, , mir(, mir())))Q := separate(Q, n +|Q := separate(Q, n +|| + || + ||)|)

Output Output Q Q

Page 56: DNA  Structure  Notation Operations

5656

Consider a pool P of Consider a pool P of ……-strands that are -strands that are

either either -superstrands or -superstrands or ’-superstrands, and ’-superstrands, and

where all where all -superstrands are either -superstrands are either

1-superstrands, 1-superstrands, 2-superstrands, or 2-superstrands, or

3-superstrands … (3-superstrands … ( ’, ’, 1 1 2 2 3 …15 3 …15 bp). bp).

Experimental CheckExperimental Check

Page 57: DNA  Structure  Notation Operations

57

Experimental CheckExperimental Check

Our extraction is correct and complete in the Our extraction is correct and complete in the sense that:sense that:

1.1. XPCR-Extraction selected only XPCR-Extraction selected only -superstrands-superstrands2.2. XPCR-Extraction selected all kinds of XPCR-Extraction selected all kinds of -superstrands (-superstrands (1, 1, 2 , 2 , 3 …- 3 …-

superstrands).superstrands).

Page 58: DNA  Structure  Notation Operations

58

Gamma ExtractionGamma Extraction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Lane 2: … strands of 120 bp ( 15 bp)Lane 3: … of 45 bpLane 4: XPCR … and … 150 bpLane 5: PCR(, a.s.) ( at -45)Lane 6: PCR(’, a.s.)Lane 7: PCR(1, a.s.) (1 at -125)Lane 8: PCR(2, a.s.) (2 at -75)

Page 59: DNA  Structure  Notation Operations

59

ApplicationsApplications

o XPCR in generation of space solutionsXPCR in generation of space solutions

o XPCR in in vitro mutagenesisXPCR in in vitro mutagenesis

o XPCR in gene extractionXPCR in gene extraction

Page 60: DNA  Structure  Notation Operations

60

Page 61: DNA  Structure  Notation Operations

61

XPCR −Mutagenesis(P, , ) =1. let P : {<>} 2. input Q : {<[−20,−1] [1, 20]>} 3. (P1, P2) := split(P)4. P1 := PCR(P1, [1, 20], mir([−18,−1]))5. P2 := PCR(P2, [1, 20], mir([−20,−1]))6. P1 := separate(P1, | |)7. P2 := separate(P2, | |)8. P1 := mix(P1,Q)9. P1 := PCR(P1, [1, 18], mir([1, 20]))10. P1 := separate(P1, || + | | + 20)11. P := mix(P1, P2)12. P := PCR(P, [1, 20],mir([−20,−1]))13. P := separate(P, || + || + ||)14. output P

XPCR MutagenesisXPCR Mutagenesis

Page 62: DNA  Structure  Notation Operations

62

XPCR MutagenesisXPCR Mutagenesis

Figure 10: Electrophoresis resultsLane 1: molecular size marker ladder (100bp)Lane 2: amplification of strand (230bp)Lane 3: amplification of strand (229bp)Lane 4: amplification of strand [-18, -1] [1,20] (188bp)Lane 5: cross pairing amplification of and [-18, -1] [1,20] (400bp) Lane 6: cross pairing amplification of and [1,20] (609bp)Lane 7: RhoAwt (582bp),lane 8: positive control by PCR( , [-20, -1]) (354 bp)

Page 63: DNA  Structure  Notation Operations

63

Ongoing ResearchOngoing Research

XPCR ClonationXPCR Clonation

Dry DNA ComputingDry DNA Computing