Post on 07-Sep-2020
Math 8803/4803, Spring 2008:Discrete Mathematical Biology
Prof. Christine Heitsch
School of Mathematics
Georgia Institute of Technology
Lecture 6 – January 18, 2008
Designing DNA codewords
Problem:Minimize all possible mishybridizations for {Wi}, {Ci}.Also satisfy biochemical constraints on melting temperature and free energy.
F F
Complements – Wi and Cj for i 6= j
Reverse-complements – Wi and Wj
(or Ci and Cj) for i 6= j
Inverted repeats – Wi with itself
Previously approaches were either “top-down” or “bottom-up.”
C. E. Heitsch, GA Tech 1
Previous Solution Approaches
• Top-down;
– Hamming distance and coding theory [2, 5, 4, 3].
– De Bruijn sequences and the 2-4 rule [1].
• Bottom-up:
– Culling according to biochemical constraints [6, 7].
– Stochastic local search [9, 8].
C. E. Heitsch, GA Tech 2
A “mixed” solution strategy
Top-down: Consider the set of (4!)43−1
4−3 ≈ 1.89× 1020 distinct
De Bruijn sequences with n = 4 and k = 3 having length 43.
F F
0
0
0
1
1
1
00
11
0 1
0
1
01
00010111
)
)10( ,
10( ,
)1( ,
)10( ,
0
Bottom-up: Select B(4, 3) (uniformly at random) which minimize
other mishybrizations and satisfy required biochemical constraints.
C. E. Heitsch, GA Tech 3
Recall our graph theory definitions
An Euler circuit is a walk in a graph G which begins and ends at
the same vertex and uses every edge exactly once.
A Hamiltonian circuit is a walk in a graph G which begins and
ends at the same vertex and visits every vertex (except the
endpoints) exactly once.
An arborescence is a rooted directed tree with all the edges
pointing in the direction of the root.
A spanning tree of a connected graph G is a subgraph which
contains all the vertices and is a tree.
C. E. Heitsch, GA Tech 4
Examples from our De Bruijn graph
Vertices v0, v1, v2, v3 where vertex
vi is labeled with integer i in binary.
An Euler circuit:
v3 → v2 → v0 → v0 → v1
→ v2 → v1 → v3 → v3
A Hamiltonian circuit:
v3 → v2 → v0 → v1 → v3
A spanning arborescence rooted at v3:
E = {(v0, v1), (v2, v1), (v1, v3)}
0
0
0
1
1
1
00
11
0 1
0
1
01
00010111
)
)10( ,
10( ,
)1( ,
)10( ,
0
C. E. Heitsch, GA Tech 5
Combinatorial explosion
2 3 4 5 6
2 1 24 20, 736 995, 318, 000 ≈ 3.87× 1015
3 2 373, 248 ≈ 1.89× 1020 ≈ 7.63× 1049 ·
4 16 ≈ 1.26× 1019 · · ·
5 2048 · · · ·
6 67, 108, 864 · · · ·
7 ≈ 1.44× 1017 · · · ·
How to generate a (uniformly) random B(n, k) De Bruijn sequence?
C. E. Heitsch, GA Tech 6
Random walks on the De Bruijn graph
0
0
0
1
1
1
00
11
0 1
0
1
01
00010111
)
)10( ,
10( ,
)1( ,
)10( ,
0
C. E. Heitsch, GA Tech 7
Random walks on the De Bruijn graph
0
0
0
1
1
1
11
0
1
01
00 10111
)
)10( ,
10( ,
)1( ,
)10( ,
00
1
00
0
C. E. Heitsch, GA Tech 7
Uniformly random De Bruijn sequences
Total of (4!)43−1
4−3 ≈ 1.89× 1020 distinct B(4, 3) with length 43.
Randomized Algorithm.
• For vertex vi, i = 0, . . . , 15, chose a random permutation pi of {0, 1, 2, 3}.
• Beginning at v1, build a sequence by walking around G(4, 3) according to pi.
• Stop when a permutation is exhausted. Accept if sequence has length 64.
The sequence is De Bruijn with probability 1/64;
about 64 trials on average will generate a B(4, 3).(Can be improved slightly to 4/81 and about 20 trials on average.)
C. E. Heitsch, GA Tech 8
Can we weave a net of DNA strands?
Woman Arranging a Fishing Net – Zanzibar, Tanzania, September 2000
By Martin Wierzbicki from http://photosbymartin.com/.
C. E. Heitsch, GA Tech 9
Biochemical constraints
Thermodynamics predictions calculated by mfold.
∆G = ∆H − T∆S
Tm =1000∆H
A + ∆S + R ln (C/4)− 273.15 + 16.6 log [Na+]
Free energy G chemical potential of a substance
Enthalpy H heat content of a substance
Entropy S disorder of a substance
Melting temperature Tm 50% duplex
C. E. Heitsch, GA Tech 10
Predicted properties
The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat
For RCA, there should be no (significant) secondary structures.
Sequence Secondary structure
∆G (kcal/mole) Tm (◦C)
Linear W -7.6 49.9
Circular W 0.48 23.7
RCA average -6.26/repeat 46.7
C. E. Heitsch, GA Tech 11
Rolling Circle Amplification (RCA)
Enzymatic Ligation
Circular DNA TemplateGuide DNA (20bp)
(64bp)
Circular Template
cDNA Primer
Amplified Sequence
DNA Polymerase
Circular DNA construction Sequence amplification
C. E. Heitsch, GA Tech 12
RCA product for random De Bruijn sequence
1kb
2kb
10kb23kb
4.4kb
9.4kb6.6kb
2.3kb
564bp
1 2 3 4 5 6 7 8 Lane 1: 1Kb marker
Lane 2: no circular DNA
Lane 3: RCA 5min
Lane 4: RCA 20min
Lane 5: RCA 35min
Lane 6: RCA 1hr
Lane 7: RCA overnight
Lane 8: Lamda marker
Observed on 0.8% agarose gel: significant amplification at 30 ◦C.
C. E. Heitsch, GA Tech 13
Predicted surface array hybridization
The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat
Parse W into overlapping 16mer targets Ti for i = 1, . . . , 64.
Predict binding of complement Pi with Ti and difference W \ Ti.
Sequence Hybridization with Pi
∆G (kcal/mole) Tm (◦C)Ti ≤ −18.6 ≥ 65.8
W \ Ti ≥ −3.8 ≤ 18.0
Also check hybridization with a second B(4, 3) DNA sequence W2.
C. E. Heitsch, GA Tech 14
Minimal free energy
C. E. Heitsch, GA Tech 15
Melting temperature
C. E. Heitsch, GA Tech 16
Surface Plasmon Resonance (SPR) imaging
C. E. Heitsch, GA Tech 17
Verification of complementary binding
aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat
= W and another random B(4, 3) W2 =
gctgccctaaagagtcggcagcgatcttccaacgtgacactcatttgtagggttatggaatacc
Probes:
P1 = ctaacaatacgcgatg
P2 = agtgtcacgttggaag
P3 = gtgtatccgacatgtg
Targets:
W
G = gggctctctattacgacctc
H = cttccaacgtgacact
C. E. Heitsch, GA Tech 18
SPR imaging result of complementary binding
P1 P2 P3
Hybridization of W onto DNA array
P1 P2 P3
Hybridization of G onto DNA array
P1 P2 P3
Hybridization of H onto DNA array
P3P2P1
C. E. Heitsch, GA Tech 19
Proof of principle
W1 = ctaacaatacgcgatg W2 = agtgtcacgttggaag
W3 = gtgtatccgacatgtg C2 = cttccaacgtgacact
P3P2P1 P1 P2 P3
Hybridization of H onto DNA array
“Proof of principle” experimental results support using
random De Bruijn sequences to design DNA codewords.
C. E. Heitsch, GA Tech 20
Acknowledgments
• Ming Li for his DNA9 PowerPoint slides for “Attomole Detection of DNA inan Array Format with SPR Imaging Using RCA for the ProgrammableNanoscale DNA Biosensors.”
• Ming Li, Prof. Rob Corn, and the 2000 – 2002 Corn Research Group at theUniversity of Wisconsin – Madison. Graduate Students: Greta Wegner, Terry Goodrich, MingLi, Shiping Fang, Yuan Li, Heesuk Kim, Johanna Kwok. Senior Staff Scientist: Hye Jin Lee. PostdoctoralAssociates: Dr. Alastair Wark, Dr. Eric Codner. Visitors: Dr. Tomonori Saeki from Hitachi, Ltd.Undergraduates: Misha Wolfson, Takeyoshi Goto.
C. E. Heitsch, GA Tech 21
References
[1] A. Ben-Dor, R. M. Karp, B. Schwikowski, and Z. Yakhini. Universal DNA tag systems: a
combinatorial design scheme. In RECOMB, pages 65–75, 2000.
[2] A. G. Frutos, Q. Liu, A. J. Thiel, A. M. W. Sanner, A. E. Condon, L. M. Smith, and R. M.
Corn. Demonstartion of a word design strategy for DNA computing on sur faces. Nucleic
Acids Res., 25(23):4748 – 4757, 1997.
[3] P. Gaborit and O. D. King. Linear constructions for DNA codes. Theoret. Comput. Sci.,
334(1-3):99–113, 2005.
[4] O. D. King. Bounds for DNA codes with constant GC-content. Electron. J. Combin.,
10:Research Paper 33, 13 pp. (electronic), 2003.
[5] M. Li, H. J. Lee, A. E. Condon, and R. M. Corn. DNA word design strategy for creating sets
of non-interacting sets of oligonucleotides for DNA microarrays. Langmuir, 18(3):805–812,
2002.
[6] M. R. Shortreed, S. B. Chang, D. Hong, M. Phillips, B. Campion, D. C. Tulpan,
M. Andronescu, A. Condon, H. H. Hoos, and L. M. Smith. A thermodynamic approach to
designing structure-free combinatorial DNA word sets. Nucleic Acids Res,
33(15):4965–4977, 2005.
[7] D. Tulpan, M. Andronescu, S. B. Chang, M. R. Shortreed, A. Condon, H. H. Hoos, and
L. M. Smith. Thermodynamically based DNA strand design. Nucleic Acids Res,
33(15):4951–4964, 2005.
C. E. Heitsch, GA Tech 22
[8] D. Tulpan and H. Hoos. Hybrid randomised neighbourhoods improve stochastic local search
for dna code design, 2003.
[9] D. Tulpan, H. Hoos, and A. Condon. Stochastic local search algorithms for dna word design,
2002.
C. E. Heitsch, GA Tech 23