Math 8803/4803, Spring 2008: Discrete Mathematical...

Post on 07-Sep-2020

0 views 0 download

Transcript of Math 8803/4803, Spring 2008: Discrete Mathematical...

Math 8803/4803, Spring 2008:Discrete Mathematical Biology

Prof. Christine Heitsch

School of Mathematics

Georgia Institute of Technology

Lecture 6 – January 18, 2008

Designing DNA codewords

Problem:Minimize all possible mishybridizations for {Wi}, {Ci}.Also satisfy biochemical constraints on melting temperature and free energy.

F F

Complements – Wi and Cj for i 6= j

Reverse-complements – Wi and Wj

(or Ci and Cj) for i 6= j

Inverted repeats – Wi with itself

Previously approaches were either “top-down” or “bottom-up.”

C. E. Heitsch, GA Tech 1

Previous Solution Approaches

• Top-down;

– Hamming distance and coding theory [2, 5, 4, 3].

– De Bruijn sequences and the 2-4 rule [1].

• Bottom-up:

– Culling according to biochemical constraints [6, 7].

– Stochastic local search [9, 8].

C. E. Heitsch, GA Tech 2

A “mixed” solution strategy

Top-down: Consider the set of (4!)43−1

4−3 ≈ 1.89× 1020 distinct

De Bruijn sequences with n = 4 and k = 3 having length 43.

F F

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

Bottom-up: Select B(4, 3) (uniformly at random) which minimize

other mishybrizations and satisfy required biochemical constraints.

C. E. Heitsch, GA Tech 3

Recall our graph theory definitions

An Euler circuit is a walk in a graph G which begins and ends at

the same vertex and uses every edge exactly once.

A Hamiltonian circuit is a walk in a graph G which begins and

ends at the same vertex and visits every vertex (except the

endpoints) exactly once.

An arborescence is a rooted directed tree with all the edges

pointing in the direction of the root.

A spanning tree of a connected graph G is a subgraph which

contains all the vertices and is a tree.

C. E. Heitsch, GA Tech 4

Examples from our De Bruijn graph

Vertices v0, v1, v2, v3 where vertex

vi is labeled with integer i in binary.

An Euler circuit:

v3 → v2 → v0 → v0 → v1

→ v2 → v1 → v3 → v3

A Hamiltonian circuit:

v3 → v2 → v0 → v1 → v3

A spanning arborescence rooted at v3:

E = {(v0, v1), (v2, v1), (v1, v3)}

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

C. E. Heitsch, GA Tech 5

Combinatorial explosion

2 3 4 5 6

2 1 24 20, 736 995, 318, 000 ≈ 3.87× 1015

3 2 373, 248 ≈ 1.89× 1020 ≈ 7.63× 1049 ·

4 16 ≈ 1.26× 1019 · · ·

5 2048 · · · ·

6 67, 108, 864 · · · ·

7 ≈ 1.44× 1017 · · · ·

How to generate a (uniformly) random B(n, k) De Bruijn sequence?

C. E. Heitsch, GA Tech 6

Random walks on the De Bruijn graph

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

C. E. Heitsch, GA Tech 7

Random walks on the De Bruijn graph

0

0

0

1

1

1

11

0

1

01

00 10111

)

)10( ,

10( ,

)1( ,

)10( ,

00

1

00

0

C. E. Heitsch, GA Tech 7

Uniformly random De Bruijn sequences

Total of (4!)43−1

4−3 ≈ 1.89× 1020 distinct B(4, 3) with length 43.

Randomized Algorithm.

• For vertex vi, i = 0, . . . , 15, chose a random permutation pi of {0, 1, 2, 3}.

• Beginning at v1, build a sequence by walking around G(4, 3) according to pi.

• Stop when a permutation is exhausted. Accept if sequence has length 64.

The sequence is De Bruijn with probability 1/64;

about 64 trials on average will generate a B(4, 3).(Can be improved slightly to 4/81 and about 20 trials on average.)

C. E. Heitsch, GA Tech 8

Can we weave a net of DNA strands?

Woman Arranging a Fishing Net – Zanzibar, Tanzania, September 2000

By Martin Wierzbicki from http://photosbymartin.com/.

C. E. Heitsch, GA Tech 9

Biochemical constraints

Thermodynamics predictions calculated by mfold.

∆G = ∆H − T∆S

Tm =1000∆H

A + ∆S + R ln (C/4)− 273.15 + 16.6 log [Na+]

Free energy G chemical potential of a substance

Enthalpy H heat content of a substance

Entropy S disorder of a substance

Melting temperature Tm 50% duplex

C. E. Heitsch, GA Tech 10

Predicted properties

The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

For RCA, there should be no (significant) secondary structures.

Sequence Secondary structure

∆G (kcal/mole) Tm (◦C)

Linear W -7.6 49.9

Circular W 0.48 23.7

RCA average -6.26/repeat 46.7

C. E. Heitsch, GA Tech 11

Rolling Circle Amplification (RCA)

Enzymatic Ligation

Circular DNA TemplateGuide DNA (20bp)

(64bp)

Circular Template

cDNA Primer

Amplified Sequence

DNA Polymerase

Circular DNA construction Sequence amplification

C. E. Heitsch, GA Tech 12

RCA product for random De Bruijn sequence

1kb

2kb

10kb23kb

4.4kb

9.4kb6.6kb

2.3kb

564bp

1 2 3 4 5 6 7 8 Lane 1: 1Kb marker

Lane 2: no circular DNA

Lane 3: RCA 5min

Lane 4: RCA 20min

Lane 5: RCA 35min

Lane 6: RCA 1hr

Lane 7: RCA overnight

Lane 8: Lamda marker

Observed on 0.8% agarose gel: significant amplification at 30 ◦C.

C. E. Heitsch, GA Tech 13

Predicted surface array hybridization

The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

Parse W into overlapping 16mer targets Ti for i = 1, . . . , 64.

Predict binding of complement Pi with Ti and difference W \ Ti.

Sequence Hybridization with Pi

∆G (kcal/mole) Tm (◦C)Ti ≤ −18.6 ≥ 65.8

W \ Ti ≥ −3.8 ≤ 18.0

Also check hybridization with a second B(4, 3) DNA sequence W2.

C. E. Heitsch, GA Tech 14

Minimal free energy

C. E. Heitsch, GA Tech 15

Melting temperature

C. E. Heitsch, GA Tech 16

Surface Plasmon Resonance (SPR) imaging

C. E. Heitsch, GA Tech 17

Verification of complementary binding

aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

= W and another random B(4, 3) W2 =

gctgccctaaagagtcggcagcgatcttccaacgtgacactcatttgtagggttatggaatacc

Probes:

P1 = ctaacaatacgcgatg

P2 = agtgtcacgttggaag

P3 = gtgtatccgacatgtg

Targets:

W

G = gggctctctattacgacctc

H = cttccaacgtgacact

C. E. Heitsch, GA Tech 18

SPR imaging result of complementary binding

P1 P2 P3

Hybridization of W onto DNA array

P1 P2 P3

Hybridization of G onto DNA array

P1 P2 P3

Hybridization of H onto DNA array

P3P2P1

C. E. Heitsch, GA Tech 19

Proof of principle

W1 = ctaacaatacgcgatg W2 = agtgtcacgttggaag

W3 = gtgtatccgacatgtg C2 = cttccaacgtgacact

P3P2P1 P1 P2 P3

Hybridization of H onto DNA array

“Proof of principle” experimental results support using

random De Bruijn sequences to design DNA codewords.

C. E. Heitsch, GA Tech 20

Acknowledgments

• Ming Li for his DNA9 PowerPoint slides for “Attomole Detection of DNA inan Array Format with SPR Imaging Using RCA for the ProgrammableNanoscale DNA Biosensors.”

• Ming Li, Prof. Rob Corn, and the 2000 – 2002 Corn Research Group at theUniversity of Wisconsin – Madison. Graduate Students: Greta Wegner, Terry Goodrich, MingLi, Shiping Fang, Yuan Li, Heesuk Kim, Johanna Kwok. Senior Staff Scientist: Hye Jin Lee. PostdoctoralAssociates: Dr. Alastair Wark, Dr. Eric Codner. Visitors: Dr. Tomonori Saeki from Hitachi, Ltd.Undergraduates: Misha Wolfson, Takeyoshi Goto.

C. E. Heitsch, GA Tech 21

References

[1] A. Ben-Dor, R. M. Karp, B. Schwikowski, and Z. Yakhini. Universal DNA tag systems: a

combinatorial design scheme. In RECOMB, pages 65–75, 2000.

[2] A. G. Frutos, Q. Liu, A. J. Thiel, A. M. W. Sanner, A. E. Condon, L. M. Smith, and R. M.

Corn. Demonstartion of a word design strategy for DNA computing on sur faces. Nucleic

Acids Res., 25(23):4748 – 4757, 1997.

[3] P. Gaborit and O. D. King. Linear constructions for DNA codes. Theoret. Comput. Sci.,

334(1-3):99–113, 2005.

[4] O. D. King. Bounds for DNA codes with constant GC-content. Electron. J. Combin.,

10:Research Paper 33, 13 pp. (electronic), 2003.

[5] M. Li, H. J. Lee, A. E. Condon, and R. M. Corn. DNA word design strategy for creating sets

of non-interacting sets of oligonucleotides for DNA microarrays. Langmuir, 18(3):805–812,

2002.

[6] M. R. Shortreed, S. B. Chang, D. Hong, M. Phillips, B. Campion, D. C. Tulpan,

M. Andronescu, A. Condon, H. H. Hoos, and L. M. Smith. A thermodynamic approach to

designing structure-free combinatorial DNA word sets. Nucleic Acids Res,

33(15):4965–4977, 2005.

[7] D. Tulpan, M. Andronescu, S. B. Chang, M. R. Shortreed, A. Condon, H. H. Hoos, and

L. M. Smith. Thermodynamically based DNA strand design. Nucleic Acids Res,

33(15):4951–4964, 2005.

C. E. Heitsch, GA Tech 22

[8] D. Tulpan and H. Hoos. Hybrid randomised neighbourhoods improve stochastic local search

for dna code design, 2003.

[9] D. Tulpan, H. Hoos, and A. Condon. Stochastic local search algorithms for dna word design,

2002.

C. E. Heitsch, GA Tech 23