DNA Structure Notation Operations

1

DNA DNA Structure Structure NotationNotation

OperationsOperations

Vincenzo MancaVincenzo Manca

Dipartimento di InformaticaDipartimento di Informatica

Universita’ di VeronaUniversita’ di Verona

2

10 Years of Molecular Computing10 Years of Molecular Computing 1994 Adleman’s Experiment *1994 Adleman’s Experiment * 1995 Lipton’s Model *1995 Lipton’s Model * 1996 Int. Conf. on Math. Linguistics (Marcus)1996 Int. Conf. on Math. Linguistics (Marcus) 1997 Mangalia (Paun, Head)1997 Mangalia (Paun, Head) 1998 MFCS Brno (Molecular Computing)1998 MFCS Brno (Molecular Computing) 1999 (Paun’s WMC)1999 (Paun’s WMC) 2000 DNA6 Leiden *2000 DNA6 Leiden * 2001 DNA7 Tampa (FL) : 3-SAT2001 DNA7 Tampa (FL) : 3-SAT 2002 DNA8 Sapporo : DNA Duplication 2002 DNA8 Sapporo : DNA Duplication 2004 DNA10 Milan : XPCR Extraction2004 DNA10 Milan : XPCR Extraction 2005 DNA11 Ontario : XPCR Recombination 2005 DNA11 Ontario : XPCR Recombination

3

DNA Computing MottoDNA Computing Motto

Problem: Data and RequirementsProblem: Data and Requirements Algorithm: SolutionsAlgorithm: Solutions

Encode data by DNA strandsEncode data by DNA strands Encode algorithms by biotech proceduresEncode algorithms by biotech procedures Decode final strands as solutionsDecode final strands as solutions

4

A General schema of combinatorial problemA General schema of combinatorial problem

A set of Requirements for “assignments”, that is, A set of Requirements for “assignments”, that is, sequences 0/1 of some length nsequences 0/1 of some length n

The Space of possible solutions has E(2,n) elements, The Space of possible solutions has E(2,n) elements, but only some of them satisfy the requirementsbut only some of them satisfy the requirements

Encode assignments by DNA strandsEncode assignments by DNA strands

Encode requirements as biotech protocols that filterEncode requirements as biotech protocols that filterthe strands encoding the true solutionsthe strands encoding the true solutions

5

Space GenerationIn linear time

Solution ExtractionIn linear time

!!!

6

New Trends in DNACNew Trends in DNAC

o DNA Self Assembly (Seeman, Winfree, …)DNA Self Assembly (Seeman, Winfree, …)

o DNA Automata (Shapiro)DNA Automata (Shapiro)

o DNA Algorithms ==> new biotech protocolsDNA Algorithms ==> new biotech protocols

7

Biotech ProtocolsBiotech Protocols

AlgorithmsAlgorithms

DNA ComputingComputing DNA

A change of perspective

8

In the search for implementing algorithms on In the search for implementing algorithms on DNA, general algorithmic principles are DNA, general algorithmic principles are discovered in fundamental biomolecular discovered in fundamental biomolecular processes.processes.

9

1’

2’3’

4’

5’ O

P

B

CH2

1’

2’ 3’

4’

5’O CH2OH

H

1’

2’3’

4’

5’ O

B

CH2OH

B

NucleotidesNucleotides~330 Dalton

1 Dalton = 1.64 10-24

1 g. H = 6.2 1023

1’--- 1’ = ~ 1nm

A few grams of DNA = the amount of all electronic information stored in all the world

--------

10

StringsStrings Strings over an alphabet are Strings over an alphabet are sequencessequences of of

symbols of the alphabet : symbols of the alphabet :

abbabbbaabbabbba

On strings a On strings a concatenationconcatenation associative associative operation - - is definedoperation - - is defined

(()) = = (()) = = = =

A language L is a set of stringsA language L is a set of strings

11

DNA Sequences are DNA Sequences are Mobile Double StringsMobile Double Strings

B B = {A, T, C, G} = {A, T, C, G}

B* = B* = strings over strings over BB

[i,j] [i,j]

||||

s is a s is a -strand -strand oror s : s : or or type(s )=type(s )= :n :n or or mult(mult()=n)=n

12

Complementation Complementation - - c c (involutive(involutive))

Reverse Reverse rev rev (involutive)(involutive)MirrorMirror mir mir ((involutiveinvolutive))

mirmir(()= )= revrev((cc) )

Reverse and ComplementationReverse and Complementation commutecommute

HybridizationHybridization ||||] [] [] ] [ [

PairingPairing

13

B B = {A, T, C, G} = {A, T, C, G} BBBB* = * = strings over strings over BB : fraction notation : fraction notation

Axiom :Axiom : = = rev(rev() ) rev( rev())

extext

Overlap Overlap ----xx-- --

overlapping concatenation overlapping concatenation ZZ-> up <- -> up <- down down-> -> ->/ ->/ = = ->/ ->/

14

BilinearityBilinearityComplementarityComplementarityAntiparallelismAntiparallelism

The marvelous formThe marvelous form

5’

3’

15

Hybridization :Hybridization : || || mirmir(())

] ] [ [ <==> <==> , , mirmir(())

] [] [ <==> <==> ] ] [ [ for some for some

Pairing :Pairing : ] [] [ ==> ==> / / revrev(() )

16

NotationNotation / / = = = = ->->

/ / mirmir(() = <) = <>>

/ / = = revrev(() = <- ) = <-

===> <===> <> = <> = <mirmir(()>)>

BB* BB* is the set of DNA strings , is the set of DNA strings , BB* BB* B* B*

A pool P of DNA molecules is a A pool P of DNA molecules is a multiset of strandsmultiset of strands

i) Set of strands typed by strings i) Set of strands typed by strings

ii) Set of strings with multiplicitiesii) Set of strings with multiplicities

P = {s1:P = {s1:1 , s2:1 , s2:2, ….}2, ….}

P = {P = {1: n1 , 1: n1 , 2: n2, ….}2: n2, ….}

multmultPP((1) = n1 , mult1) = n1 , multPP ( (2) = n2 2) = n2

s s P P

PP

19

Types of DNA Pools are Types of DNA Pools are Languages of BB*Languages of BB*

Type(T) = {Type(T) = { BB* | s : BB* | s : , s , s T } T }

20

Test Tube Operations in DNACTest Tube Operations in DNAC Denature (Melting)Denature (Melting) Renature (Hybridization, Annealing)Renature (Hybridization, Annealing) MixMix SplitSplit fish (by Affinity)fish (by Affinity) RemoveRemove lengthlength Separate (Gel Electrophoresis)Separate (Gel Electrophoresis) Ligate (Ligase)Ligate (Ligase) Extend (Polymerase) Extend (Polymerase) Synthetize (Oligos)Synthetize (Oligos) InfixInfix

21

SSTRANDTRAND H HYBRIDIZATIONYBRIDIZATION

24

Polymerase ExtensionPolymerase Extension

25

DNA LigaseDNA Ligase

’ ’

’ ’

Ligase Joins 5' phosphateto 3' hydroxyl

’ ’

26

Ligase CatenationLigase Catenation

27

BufferGel

Electrode

Electrode

Samples

Slower

GEL ELECTROPHORESIS – Separation of DNAfragments

28

More Complex OperationsMore Complex Operations

Amplification (PCR)Amplification (PCR)

SequencingSequencing

Restriction (R. Enzymes)Restriction (R. Enzymes)

Clonation (Plasmide Transinfection)Clonation (Plasmide Transinfection)

29

PCR: Polymerase Chain PCR: Polymerase Chain ReactionReaction

30ExponentialLinear

h(h(

long short

PCR with 3’ sticky endPCR with 3’ sticky end

31

PCR LemmaPCR Lemma

Given a pool P of type {Given a pool P of type {} and two primers } and two primers , , that that hybridize with hybridize with and and respectively ( respectively ( ] ][ [ ). ).

If the extensions e1 and e2 of the two primers with the If the extensions e1 and e2 of the two primers with the relative single strands overlap, then an exponential relative single strands overlap, then an exponential amplification of amplification of strands happens which has the blunt strands happens which has the blunt form :form :

<e1 <e1 ZZ exte2> exte2>

which appears within the first two steps.which appears within the first two steps.

32

T of

type L

Operation

T’ of type L’

33

MathematicallyMathematicallyTest Tube Operations Test Tube Operations

Type (T) = LType (T) = L means thatmeans that

Types of strands of T constitute the language LTypes of strands of T constitute the language L

Given some test tubes as arguments with some typesGiven some test tubes as arguments with some types

provide as resultsprovide as results

Test tubes with other typesTest tubes with other types

35

DNA Test Tube MachineDNA Test Tube Machine

Register Machines where:Register Machines where:

- Registers are Test Tubes- Registers are Test Tubes(multisets of strands instead of numbers)(multisets of strands instead of numbers)

- DNA Test Tubes operations- DNA Test Tubes operations(instead of arithmetic operations)(instead of arithmetic operations)

36

Adleman’s ProblemAdleman’s Problem

Given a Graph (of seven nodes)

Find (if there are)The paths from two given nodes (0,6)

Passing once for every node(hamiltonian paths)

37

Adleman - Lipton’s Extract ModelAdleman - Lipton’s Extract ModelIn Combinatorial ProblemsIn Combinatorial Problems

The Generation of all possible solutionsin linear time

The Extraction of true solutionsin linear time

Extraction is performed in a number of sub-steps and each of them selects all the strands that include a sub-strand of a given type

38

Adleman’s Graph

39

ic jc

Node i = i i

Arc ij = mir(i j)

Ai BiBj

Bj’ Ai’

i i

Adleman’s EncodingAdleman’s Encoding

|i| = |i| = 10 i , j = 1, …, 7

40

Adleman’s AlgorithmAdleman’s Algorithm

Generation of hamiltonian paths from v1 to v7 Generation of hamiltonian paths from v1 to v7

Generate paths of G (hybridization/ligation)Generate paths of G (hybridization/ligation)Perform PCR of primers Perform PCR of primers 0, mir(6)Separate paths of length 140 (7 x 20)Separate paths of length 140 (7 x 20)forfor J := 1 J := 1 toto 7 7 dodo Select strands where Select strands where jj occurs occursoutputoutput remaining strands remaining strands

41

MIX and Split MethodMIX and Split Method

Generation of space solutions of N variablesGeneration of space solutions of N variables

Merge X1 and Merge X1 and X1 in a tube TX1 in a tube TSplit T into A and BSplit T into A and BFor J := 2 To NFor J := 2 To N

Extend strands of A with XJExtend strands of A with XJExtend strands of B with Extend strands of B with XJXJMerge A and B into TMerge A and B into TSplit T into A and BSplit T into A and B

Merge A and BMerge A and B

42

Lipton’s Algorithm 3-Sat(N, M)Lipton’s Algorithm 3-Sat(N, M)

o Generate N-space solutions in TGenerate N-space solutions in To For J = 1 To MFor J = 1 To M

T1 := Extract [T, L(1,J)]T1 := Extract [T, L(1,J)] T := T - T1T := T - T1 T2 := Extrtact[T , L(2,J)]T2 := Extrtact[T , L(2,J)] T := T - T2 T := T - T2 T3 := Extract[T , L(3,J)]T3 := Extract[T , L(3,J)] T := Merge(T1, T2)T := Merge(T1, T2) T := Merge(T, T3)T := Merge(T, T3)

o Detect TDetect To ifif T T , , thenthen take a clone and sequence it (Solution) take a clone and sequence it (Solution)o elseelse “Unsolvable Problem” “Unsolvable Problem”

DNA ExtractionDNA ExtractionStrands of type Strands of type are called are called -strands-strands

(or instances of (or instances of ))

A A -strand with -strand with including including as substring is as substring is called a called a -superstrand (-superstrand ( is a is a -superstring)-superstring)

Problem: Problem:

Extract all the Extract all the -superstrands of a pool P-superstrands of a pool P

A Formulation of the DNA A Formulation of the DNA Extraction ProblemExtraction Problem

Given an input pool PP of heterogeneous DNA strands with the same length and with the same prefix and suffix, and given a string

Provide an output pool PP[[]] such that all and only the types of -superstrands of PP are represented in PP[[]] .

In other words, extraction of In other words, extraction of -superstrands of -superstrands of P meansP means

To provide a pool PTo provide a pool P[[]] such that for any two such that for any two strings strings : :

P <==> P <==> P P[[]]

i.e. the strings represented in Pi.e. the strings represented in P[[]] are all and are all and only the only the -superstrings belonging to P.-superstrings belonging to P.

4646

Cross Pairing PCRCross Pairing PCR

ShortlyShortly

XPCRXPCR

47

XPCR provides an efficient method for affix concatenationof double strands (Head’s null context splicing rule)

N.B. Genome Sequencing is related to Affix Concatenation Closure

Melting + Hybridization

Polymerase Extension

h()

Linear Amplification

h()

h()

Linear Amplification

Exponential Amplification

h()

53

XPCR was tested in XPCR was tested in many different situations many different situations

in pools generated by recombination of 22 in pools generated by recombination of 22 strands of lengths between 10 - 20strands of lengths between 10 - 20

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

RhoA XPCRRhoA XPCR

Lane 2: RhoA of 582 bpLane 3: of 253 bpLane 4: XPCR of 582+253 -229 = 606 bp Starts at position -229 of RhoA

55

XPCR DNA ExtractionXPCR DNA Extraction XPCR-Extract(P, XPCR-Extract(P, )) L:= length(P) , R1 := L:= length(P) , R1 := , R2 := , R2 := For eachFor each n n L L dodo

Q := separate(P, n)Q := separate(P, n)P := infix(Q, P := infix(Q, , , ))(P1, P2) := split(P)(P1, P2) := split(P)P1 := PCR(P1, P1 := PCR(P1, , , ))For eachFor each m < n m < n dodo R1 := R1 + separate(P1, m) R1 := R1 + separate(P1, m)P2 := PCR(P2, P2 := PCR(P2, , mir(, mir())))For eachFor each m < n m < n dodo R2 := R2 + separate(P2, m) R2 := R2 + separate(P2, m)Q := mix(R1, R2)Q := mix(R1, R2)Q := PCR(Q, Q := PCR(Q, , mir(, mir())))Q := separate(Q, n +|Q := separate(Q, n +|| + || + ||)|)

Output Output Q Q

5656

Consider a pool P of Consider a pool P of ……-strands that are -strands that are

either either -superstrands or -superstrands or ’-superstrands, and ’-superstrands, and

where all where all -superstrands are either -superstrands are either

1-superstrands, 1-superstrands, 2-superstrands, or 2-superstrands, or

3-superstrands … (3-superstrands … ( ’, ’, 1 1 2 2 3 …15 3 …15 bp). bp).

Experimental CheckExperimental Check

57

Experimental CheckExperimental Check

Our extraction is correct and complete in the Our extraction is correct and complete in the sense that:sense that:

1.1. XPCR-Extraction selected only XPCR-Extraction selected only -superstrands-superstrands2.2. XPCR-Extraction selected all kinds of XPCR-Extraction selected all kinds of -superstrands (-superstrands (1, 1, 2 , 2 , 3 …- 3 …-

superstrands).superstrands).

58

Gamma ExtractionGamma Extraction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Lane 2: … strands of 120 bp ( 15 bp)Lane 3: … of 45 bpLane 4: XPCR … and … 150 bpLane 5: PCR(, a.s.) ( at -45)Lane 6: PCR(’, a.s.)Lane 7: PCR(1, a.s.) (1 at -125)Lane 8: PCR(2, a.s.) (2 at -75)

59

ApplicationsApplications

o XPCR in generation of space solutionsXPCR in generation of space solutions

o XPCR in in vitro mutagenesisXPCR in in vitro mutagenesis

o XPCR in gene extractionXPCR in gene extraction

61

XPCR −Mutagenesis(P, , ) =1. let P : {<>} 2. input Q : {<[−20,−1] [1, 20]>} 3. (P1, P2) := split(P)4. P1 := PCR(P1, [1, 20], mir([−18,−1]))5. P2 := PCR(P2, [1, 20], mir([−20,−1]))6. P1 := separate(P1, | |)7. P2 := separate(P2, | |)8. P1 := mix(P1,Q)9. P1 := PCR(P1, [1, 18], mir([1, 20]))10. P1 := separate(P1, || + | | + 20)11. P := mix(P1, P2)12. P := PCR(P, [1, 20],mir([−20,−1]))13. P := separate(P, || + || + ||)14. output P

XPCR MutagenesisXPCR Mutagenesis

62

XPCR MutagenesisXPCR Mutagenesis

Figure 10: Electrophoresis resultsLane 1: molecular size marker ladder (100bp)Lane 2: amplification of strand (230bp)Lane 3: amplification of strand (229bp)Lane 4: amplification of strand [-18, -1] [1,20] (188bp)Lane 5: cross pairing amplification of and [-18, -1] [1,20] (400bp) Lane 6: cross pairing amplification of and [1,20] (609bp)Lane 7: RhoAwt (582bp),lane 8: positive control by PCR( , [-20, -1]) (354 bp)

63

Ongoing ResearchOngoing Research

XPCR ClonationXPCR Clonation

Dry DNA ComputingDry DNA Computing

DNA Structure Notation Operations

Documents

Transcript of DNA Structure Notation Operations