DNA Structure Notation Operations
description
Transcript of DNA Structure Notation Operations
1
DNA DNA Structure Structure NotationNotation
OperationsOperations
Vincenzo MancaVincenzo Manca
Dipartimento di InformaticaDipartimento di Informatica
Universita’ di VeronaUniversita’ di Verona
2
10 Years of Molecular Computing10 Years of Molecular Computing 1994 Adleman’s Experiment *1994 Adleman’s Experiment * 1995 Lipton’s Model *1995 Lipton’s Model * 1996 Int. Conf. on Math. Linguistics (Marcus)1996 Int. Conf. on Math. Linguistics (Marcus) 1997 Mangalia (Paun, Head)1997 Mangalia (Paun, Head) 1998 MFCS Brno (Molecular Computing)1998 MFCS Brno (Molecular Computing) 1999 (Paun’s WMC)1999 (Paun’s WMC) 2000 DNA6 Leiden *2000 DNA6 Leiden * 2001 DNA7 Tampa (FL) : 3-SAT2001 DNA7 Tampa (FL) : 3-SAT 2002 DNA8 Sapporo : DNA Duplication 2002 DNA8 Sapporo : DNA Duplication 2004 DNA10 Milan : XPCR Extraction2004 DNA10 Milan : XPCR Extraction 2005 DNA11 Ontario : XPCR Recombination 2005 DNA11 Ontario : XPCR Recombination
3
DNA Computing MottoDNA Computing Motto
Problem: Data and RequirementsProblem: Data and Requirements Algorithm: SolutionsAlgorithm: Solutions
Encode data by DNA strandsEncode data by DNA strands Encode algorithms by biotech proceduresEncode algorithms by biotech procedures Decode final strands as solutionsDecode final strands as solutions
4
A General schema of combinatorial problemA General schema of combinatorial problem
A set of Requirements for “assignments”, that is, A set of Requirements for “assignments”, that is, sequences 0/1 of some length nsequences 0/1 of some length n
The Space of possible solutions has E(2,n) elements, The Space of possible solutions has E(2,n) elements, but only some of them satisfy the requirementsbut only some of them satisfy the requirements
Encode assignments by DNA strandsEncode assignments by DNA strands
Encode requirements as biotech protocols that filterEncode requirements as biotech protocols that filterthe strands encoding the true solutionsthe strands encoding the true solutions
5
Space GenerationIn linear time
Solution ExtractionIn linear time
!!!
6
New Trends in DNACNew Trends in DNAC
o DNA Self Assembly (Seeman, Winfree, …)DNA Self Assembly (Seeman, Winfree, …)
o DNA Automata (Shapiro)DNA Automata (Shapiro)
o DNA Algorithms ==> new biotech protocolsDNA Algorithms ==> new biotech protocols
7
Biotech ProtocolsBiotech Protocols
AlgorithmsAlgorithms
DNA ComputingComputing DNA
A change of perspective
8
In the search for implementing algorithms on In the search for implementing algorithms on DNA, general algorithmic principles are DNA, general algorithmic principles are discovered in fundamental biomolecular discovered in fundamental biomolecular processes.processes.
9
1’
2’3’
4’
5’ O
P
B
CH2
1’
2’ 3’
4’
5’O CH2OH
H
1’
2’3’
4’
5’ O
B
CH2OH
B
NucleotidesNucleotides~330 Dalton
1 Dalton = 1.64 10-24
1 g. H = 6.2 1023
1’--- 1’ = ~ 1nm
A few grams of DNA = the amount of all electronic information stored in all the world
--------
10
StringsStrings Strings over an alphabet are Strings over an alphabet are sequencessequences of of
symbols of the alphabet : symbols of the alphabet :
abbabbbaabbabbba
On strings a On strings a concatenationconcatenation associative associative operation - - is definedoperation - - is defined
(()) = = (()) = = = =
A language L is a set of stringsA language L is a set of strings
11
DNA Sequences are DNA Sequences are Mobile Double StringsMobile Double Strings
B B = {A, T, C, G} = {A, T, C, G}
B* = B* = strings over strings over BB
[i,j] [i,j]
||||
s is a s is a -strand -strand oror s : s : or or type(s )=type(s )= :n :n or or mult(mult()=n)=n
12
Complementation Complementation - - c c (involutive(involutive))
Reverse Reverse rev rev (involutive)(involutive)MirrorMirror mir mir ((involutiveinvolutive))
mirmir(()= )= revrev((cc) )
Reverse and ComplementationReverse and Complementation commutecommute
HybridizationHybridization ||||] [] [] ] [ [
PairingPairing
13
B B = {A, T, C, G} = {A, T, C, G} BBBB* = * = strings over strings over BB : fraction notation : fraction notation
Axiom :Axiom : = = rev(rev() ) rev( rev())
extext
Overlap Overlap ----xx-- --
overlapping concatenation overlapping concatenation ZZ-> up <- -> up <- down down-> -> ->/ ->/ = = ->/ ->/
14
BilinearityBilinearityComplementarityComplementarityAntiparallelismAntiparallelism
The marvelous formThe marvelous form
5’
3’
15
Hybridization :Hybridization : || || mirmir(())
] ] [ [ <==> <==> , , mirmir(())
] [] [ <==> <==> ] ] [ [ for some for some
Pairing :Pairing : ] [] [ ==> ==> / / revrev(() )
16
NotationNotation / / = = = = ->->
/ / mirmir(() = <) = <>>
/ / = = revrev(() = <- ) = <-
===> <===> <> = <> = <mirmir(()>)>
BB* BB* is the set of DNA strings , is the set of DNA strings , BB* BB* B* B*
17
A pool P of DNA molecules is a A pool P of DNA molecules is a multiset of strandsmultiset of strands
i) Set of strands typed by strings i) Set of strands typed by strings
ii) Set of strings with multiplicitiesii) Set of strings with multiplicities
P = {s1:P = {s1:1 , s2:1 , s2:2, ….}2, ….}
P = {P = {1: n1 , 1: n1 , 2: n2, ….}2: n2, ….}
multmultPP((1) = n1 , mult1) = n1 , multPP ( (2) = n2 2) = n2
s s P P
PP
19
Types of DNA Pools are Types of DNA Pools are Languages of BB*Languages of BB*
Type(T) = {Type(T) = { BB* | s : BB* | s : , s , s T } T }
20
Test Tube Operations in DNACTest Tube Operations in DNAC Denature (Melting)Denature (Melting) Renature (Hybridization, Annealing)Renature (Hybridization, Annealing) MixMix SplitSplit fish (by Affinity)fish (by Affinity) RemoveRemove lengthlength Separate (Gel Electrophoresis)Separate (Gel Electrophoresis) Ligate (Ligase)Ligate (Ligase) Extend (Polymerase) Extend (Polymerase) Synthetize (Oligos)Synthetize (Oligos) InfixInfix
21
SSTRANDTRAND H HYBRIDIZATIONYBRIDIZATION
22
23
24
Polymerase ExtensionPolymerase Extension
25
DNA LigaseDNA Ligase
’ ’
’ ’
Ligase Joins 5' phosphateto 3' hydroxyl
’ ’
26
Ligase CatenationLigase Catenation
27
BufferGel
Electrode
Electrode
Samples
Slower
GEL ELECTROPHORESIS – Separation of DNAfragments
28
More Complex OperationsMore Complex Operations
Amplification (PCR)Amplification (PCR)
SequencingSequencing
Restriction (R. Enzymes)Restriction (R. Enzymes)
Clonation (Plasmide Transinfection)Clonation (Plasmide Transinfection)
29
PCR: Polymerase Chain PCR: Polymerase Chain ReactionReaction
30ExponentialLinear
h(h(
long short
PCR with 3’ sticky endPCR with 3’ sticky end
31
PCR LemmaPCR Lemma
Given a pool P of type {Given a pool P of type {} and two primers } and two primers , , that that hybridize with hybridize with and and respectively ( respectively ( ] ][ [ ). ).
If the extensions e1 and e2 of the two primers with the If the extensions e1 and e2 of the two primers with the relative single strands overlap, then an exponential relative single strands overlap, then an exponential amplification of amplification of strands happens which has the blunt strands happens which has the blunt form :form :
<e1 <e1 ZZ exte2> exte2>
which appears within the first two steps.which appears within the first two steps.
32
T of
type L
Operation
T’ of type L’
33
MathematicallyMathematicallyTest Tube Operations Test Tube Operations
Type (T) = LType (T) = L means thatmeans that
Types of strands of T constitute the language LTypes of strands of T constitute the language L
Given some test tubes as arguments with some typesGiven some test tubes as arguments with some types
provide as resultsprovide as results
Test tubes with other typesTest tubes with other types
34
35
DNA Test Tube MachineDNA Test Tube Machine
Register Machines where:Register Machines where:
- Registers are Test Tubes- Registers are Test Tubes(multisets of strands instead of numbers)(multisets of strands instead of numbers)
- DNA Test Tubes operations- DNA Test Tubes operations(instead of arithmetic operations)(instead of arithmetic operations)
36
Adleman’s ProblemAdleman’s Problem
Given a Graph (of seven nodes)
Find (if there are)The paths from two given nodes (0,6)
Passing once for every node(hamiltonian paths)
37
Adleman - Lipton’s Extract ModelAdleman - Lipton’s Extract ModelIn Combinatorial ProblemsIn Combinatorial Problems
The Generation of all possible solutionsin linear time
The Extraction of true solutionsin linear time
Extraction is performed in a number of sub-steps and each of them selects all the strands that include a sub-strand of a given type
38
Adleman’s Graph
39
ic jc
Node i = i i
Arc ij = mir(i j)
Ai BiBj
Bj’ Ai’
i i
Adleman’s EncodingAdleman’s Encoding
|i| = |i| = 10 i , j = 1, …, 7
40
Adleman’s AlgorithmAdleman’s Algorithm
Generation of hamiltonian paths from v1 to v7 Generation of hamiltonian paths from v1 to v7
Generate paths of G (hybridization/ligation)Generate paths of G (hybridization/ligation)Perform PCR of primers Perform PCR of primers 0, mir(6)Separate paths of length 140 (7 x 20)Separate paths of length 140 (7 x 20)forfor J := 1 J := 1 toto 7 7 dodo Select strands where Select strands where jj occurs occursoutputoutput remaining strands remaining strands
41
MIX and Split MethodMIX and Split Method
Generation of space solutions of N variablesGeneration of space solutions of N variables
Merge X1 and Merge X1 and X1 in a tube TX1 in a tube TSplit T into A and BSplit T into A and BFor J := 2 To NFor J := 2 To N
Extend strands of A with XJExtend strands of A with XJExtend strands of B with Extend strands of B with XJXJMerge A and B into TMerge A and B into TSplit T into A and BSplit T into A and B
Merge A and BMerge A and B
42
Lipton’s Algorithm 3-Sat(N, M)Lipton’s Algorithm 3-Sat(N, M)
o Generate N-space solutions in TGenerate N-space solutions in To For J = 1 To MFor J = 1 To M
T1 := Extract [T, L(1,J)]T1 := Extract [T, L(1,J)] T := T - T1T := T - T1 T2 := Extrtact[T , L(2,J)]T2 := Extrtact[T , L(2,J)] T := T - T2 T := T - T2 T3 := Extract[T , L(3,J)]T3 := Extract[T , L(3,J)] T := Merge(T1, T2)T := Merge(T1, T2) T := Merge(T, T3)T := Merge(T, T3)
o Detect TDetect To ifif T T , , thenthen take a clone and sequence it (Solution) take a clone and sequence it (Solution)o elseelse “Unsolvable Problem” “Unsolvable Problem”
DNA ExtractionDNA ExtractionStrands of type Strands of type are called are called -strands-strands
(or instances of (or instances of ))
A A -strand with -strand with including including as substring is as substring is called a called a -superstrand (-superstrand ( is a is a -superstring)-superstring)
Problem: Problem:
Extract all the Extract all the -superstrands of a pool P-superstrands of a pool P
A Formulation of the DNA A Formulation of the DNA Extraction ProblemExtraction Problem
Given an input pool PP of heterogeneous DNA strands with the same length and with the same prefix and suffix, and given a string
Provide an output pool PP[[]] such that all and only the types of -superstrands of PP are represented in PP[[]] .
In other words, extraction of In other words, extraction of -superstrands of -superstrands of P meansP means
To provide a pool PTo provide a pool P[[]] such that for any two such that for any two strings strings : :
P <==> P <==> P P[[]]
i.e. the strings represented in Pi.e. the strings represented in P[[]] are all and are all and only the only the -superstrings belonging to P.-superstrings belonging to P.
4646
Cross Pairing PCRCross Pairing PCR
ShortlyShortly
XPCRXPCR
47
XPCR provides an efficient method for affix concatenationof double strands (Head’s null context splicing rule)
N.B. Genome Sequencing is related to Affix Concatenation Closure
Melting + Hybridization
Polymerase Extension
h()
Melting + Hybridization
Polymerase Extension
h()
50
Linear Amplification
h()
h()
Linear Amplification
Exponential Amplification
h()
52
53
XPCR was tested in XPCR was tested in many different situations many different situations
in pools generated by recombination of 22 in pools generated by recombination of 22 strands of lengths between 10 - 20strands of lengths between 10 - 20
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
RhoA XPCRRhoA XPCR
Lane 2: RhoA of 582 bpLane 3: of 253 bpLane 4: XPCR of 582+253 -229 = 606 bp Starts at position -229 of RhoA
55
XPCR DNA ExtractionXPCR DNA Extraction XPCR-Extract(P, XPCR-Extract(P, )) L:= length(P) , R1 := L:= length(P) , R1 := , R2 := , R2 := For eachFor each n n L L dodo
Q := separate(P, n)Q := separate(P, n)P := infix(Q, P := infix(Q, , , ))(P1, P2) := split(P)(P1, P2) := split(P)P1 := PCR(P1, P1 := PCR(P1, , , ))For eachFor each m < n m < n dodo R1 := R1 + separate(P1, m) R1 := R1 + separate(P1, m)P2 := PCR(P2, P2 := PCR(P2, , mir(, mir())))For eachFor each m < n m < n dodo R2 := R2 + separate(P2, m) R2 := R2 + separate(P2, m)Q := mix(R1, R2)Q := mix(R1, R2)Q := PCR(Q, Q := PCR(Q, , mir(, mir())))Q := separate(Q, n +|Q := separate(Q, n +|| + || + ||)|)
Output Output Q Q
5656
Consider a pool P of Consider a pool P of ……-strands that are -strands that are
either either -superstrands or -superstrands or ’-superstrands, and ’-superstrands, and
where all where all -superstrands are either -superstrands are either
1-superstrands, 1-superstrands, 2-superstrands, or 2-superstrands, or
3-superstrands … (3-superstrands … ( ’, ’, 1 1 2 2 3 …15 3 …15 bp). bp).
Experimental CheckExperimental Check
57
Experimental CheckExperimental Check
Our extraction is correct and complete in the Our extraction is correct and complete in the sense that:sense that:
1.1. XPCR-Extraction selected only XPCR-Extraction selected only -superstrands-superstrands2.2. XPCR-Extraction selected all kinds of XPCR-Extraction selected all kinds of -superstrands (-superstrands (1, 1, 2 , 2 , 3 …- 3 …-
superstrands).superstrands).
58
Gamma ExtractionGamma Extraction
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Lane 2: … strands of 120 bp ( 15 bp)Lane 3: … of 45 bpLane 4: XPCR … and … 150 bpLane 5: PCR(, a.s.) ( at -45)Lane 6: PCR(’, a.s.)Lane 7: PCR(1, a.s.) (1 at -125)Lane 8: PCR(2, a.s.) (2 at -75)
59
ApplicationsApplications
o XPCR in generation of space solutionsXPCR in generation of space solutions
o XPCR in in vitro mutagenesisXPCR in in vitro mutagenesis
o XPCR in gene extractionXPCR in gene extraction
60
61
XPCR −Mutagenesis(P, , ) =1. let P : {<>} 2. input Q : {<[−20,−1] [1, 20]>} 3. (P1, P2) := split(P)4. P1 := PCR(P1, [1, 20], mir([−18,−1]))5. P2 := PCR(P2, [1, 20], mir([−20,−1]))6. P1 := separate(P1, | |)7. P2 := separate(P2, | |)8. P1 := mix(P1,Q)9. P1 := PCR(P1, [1, 18], mir([1, 20]))10. P1 := separate(P1, || + | | + 20)11. P := mix(P1, P2)12. P := PCR(P, [1, 20],mir([−20,−1]))13. P := separate(P, || + || + ||)14. output P
XPCR MutagenesisXPCR Mutagenesis
62
XPCR MutagenesisXPCR Mutagenesis
Figure 10: Electrophoresis resultsLane 1: molecular size marker ladder (100bp)Lane 2: amplification of strand (230bp)Lane 3: amplification of strand (229bp)Lane 4: amplification of strand [-18, -1] [1,20] (188bp)Lane 5: cross pairing amplification of and [-18, -1] [1,20] (400bp) Lane 6: cross pairing amplification of and [1,20] (609bp)Lane 7: RhoAwt (582bp),lane 8: positive control by PCR( , [-20, -1]) (354 bp)
63
Ongoing ResearchOngoing Research
XPCR ClonationXPCR Clonation
Dry DNA ComputingDry DNA Computing