Math 8803/4803, Spring 2008: Discrete Mathematical...

25
Math 8803/4803, Spring 2008: Discrete Mathematical Biology Prof. Christine Heitsch School of Mathematics Georgia Institute of Technology Lecture 6 – January 18, 2008

Transcript of Math 8803/4803, Spring 2008: Discrete Mathematical...

Page 1: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Math 8803/4803, Spring 2008:Discrete Mathematical Biology

Prof. Christine Heitsch

School of Mathematics

Georgia Institute of Technology

Lecture 6 – January 18, 2008

Page 2: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Designing DNA codewords

Problem:Minimize all possible mishybridizations for {Wi}, {Ci}.Also satisfy biochemical constraints on melting temperature and free energy.

F F

Complements – Wi and Cj for i 6= j

Reverse-complements – Wi and Wj

(or Ci and Cj) for i 6= j

Inverted repeats – Wi with itself

Previously approaches were either “top-down” or “bottom-up.”

C. E. Heitsch, GA Tech 1

Page 3: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Previous Solution Approaches

• Top-down;

– Hamming distance and coding theory [2, 5, 4, 3].

– De Bruijn sequences and the 2-4 rule [1].

• Bottom-up:

– Culling according to biochemical constraints [6, 7].

– Stochastic local search [9, 8].

C. E. Heitsch, GA Tech 2

Page 4: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

A “mixed” solution strategy

Top-down: Consider the set of (4!)43−1

4−3 ≈ 1.89× 1020 distinct

De Bruijn sequences with n = 4 and k = 3 having length 43.

F F

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

Bottom-up: Select B(4, 3) (uniformly at random) which minimize

other mishybrizations and satisfy required biochemical constraints.

C. E. Heitsch, GA Tech 3

Page 5: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Recall our graph theory definitions

An Euler circuit is a walk in a graph G which begins and ends at

the same vertex and uses every edge exactly once.

A Hamiltonian circuit is a walk in a graph G which begins and

ends at the same vertex and visits every vertex (except the

endpoints) exactly once.

An arborescence is a rooted directed tree with all the edges

pointing in the direction of the root.

A spanning tree of a connected graph G is a subgraph which

contains all the vertices and is a tree.

C. E. Heitsch, GA Tech 4

Page 6: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Examples from our De Bruijn graph

Vertices v0, v1, v2, v3 where vertex

vi is labeled with integer i in binary.

An Euler circuit:

v3 → v2 → v0 → v0 → v1

→ v2 → v1 → v3 → v3

A Hamiltonian circuit:

v3 → v2 → v0 → v1 → v3

A spanning arborescence rooted at v3:

E = {(v0, v1), (v2, v1), (v1, v3)}

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

C. E. Heitsch, GA Tech 5

Page 7: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Combinatorial explosion

2 3 4 5 6

2 1 24 20, 736 995, 318, 000 ≈ 3.87× 1015

3 2 373, 248 ≈ 1.89× 1020 ≈ 7.63× 1049 ·

4 16 ≈ 1.26× 1019 · · ·

5 2048 · · · ·

6 67, 108, 864 · · · ·

7 ≈ 1.44× 1017 · · · ·

How to generate a (uniformly) random B(n, k) De Bruijn sequence?

C. E. Heitsch, GA Tech 6

Page 8: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Random walks on the De Bruijn graph

0

0

0

1

1

1

00

11

0 1

0

1

01

00010111

)

)10( ,

10( ,

)1( ,

)10( ,

0

C. E. Heitsch, GA Tech 7

Page 9: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Random walks on the De Bruijn graph

0

0

0

1

1

1

11

0

1

01

00 10111

)

)10( ,

10( ,

)1( ,

)10( ,

00

1

00

0

C. E. Heitsch, GA Tech 7

Page 10: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Uniformly random De Bruijn sequences

Total of (4!)43−1

4−3 ≈ 1.89× 1020 distinct B(4, 3) with length 43.

Randomized Algorithm.

• For vertex vi, i = 0, . . . , 15, chose a random permutation pi of {0, 1, 2, 3}.

• Beginning at v1, build a sequence by walking around G(4, 3) according to pi.

• Stop when a permutation is exhausted. Accept if sequence has length 64.

The sequence is De Bruijn with probability 1/64;

about 64 trials on average will generate a B(4, 3).(Can be improved slightly to 4/81 and about 20 trials on average.)

C. E. Heitsch, GA Tech 8

Page 11: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Can we weave a net of DNA strands?

Woman Arranging a Fishing Net – Zanzibar, Tanzania, September 2000

By Martin Wierzbicki from http://photosbymartin.com/.

C. E. Heitsch, GA Tech 9

Page 12: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Biochemical constraints

Thermodynamics predictions calculated by mfold.

∆G = ∆H − T∆S

Tm =1000∆H

A + ∆S + R ln (C/4)− 273.15 + 16.6 log [Na+]

Free energy G chemical potential of a substance

Enthalpy H heat content of a substance

Entropy S disorder of a substance

Melting temperature Tm 50% duplex

C. E. Heitsch, GA Tech 10

Page 13: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Predicted properties

The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

For RCA, there should be no (significant) secondary structures.

Sequence Secondary structure

∆G (kcal/mole) Tm (◦C)

Linear W -7.6 49.9

Circular W 0.48 23.7

RCA average -6.26/repeat 46.7

C. E. Heitsch, GA Tech 11

Page 14: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Rolling Circle Amplification (RCA)

Enzymatic Ligation

Circular DNA TemplateGuide DNA (20bp)

(64bp)

Circular Template

cDNA Primer

Amplified Sequence

DNA Polymerase

Circular DNA construction Sequence amplification

C. E. Heitsch, GA Tech 12

Page 15: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

RCA product for random De Bruijn sequence

1kb

2kb

10kb23kb

4.4kb

9.4kb6.6kb

2.3kb

564bp

1 2 3 4 5 6 7 8 Lane 1: 1Kb marker

Lane 2: no circular DNA

Lane 3: RCA 5min

Lane 4: RCA 20min

Lane 5: RCA 35min

Lane 6: RCA 1hr

Lane 7: RCA overnight

Lane 8: Lamda marker

Observed on 0.8% agarose gel: significant amplification at 30 ◦C.

C. E. Heitsch, GA Tech 13

Page 16: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Predicted surface array hybridization

The DNA word W from a random De Bruijn sequence B(4, 3):aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

Parse W into overlapping 16mer targets Ti for i = 1, . . . , 64.

Predict binding of complement Pi with Ti and difference W \ Ti.

Sequence Hybridization with Pi

∆G (kcal/mole) Tm (◦C)Ti ≤ −18.6 ≥ 65.8

W \ Ti ≥ −3.8 ≤ 18.0

Also check hybridization with a second B(4, 3) DNA sequence W2.

C. E. Heitsch, GA Tech 14

Page 17: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Minimal free energy

C. E. Heitsch, GA Tech 15

Page 18: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Melting temperature

C. E. Heitsch, GA Tech 16

Page 19: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Surface Plasmon Resonance (SPR) imaging

C. E. Heitsch, GA Tech 17

Page 20: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Verification of complementary binding

aatgctggagcaaccactcctttcatcgcgtattgttagggtgaaagtctacggcccgacagat

= W and another random B(4, 3) W2 =

gctgccctaaagagtcggcagcgatcttccaacgtgacactcatttgtagggttatggaatacc

Probes:

P1 = ctaacaatacgcgatg

P2 = agtgtcacgttggaag

P3 = gtgtatccgacatgtg

Targets:

W

G = gggctctctattacgacctc

H = cttccaacgtgacact

C. E. Heitsch, GA Tech 18

Page 21: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

SPR imaging result of complementary binding

P1 P2 P3

Hybridization of W onto DNA array

P1 P2 P3

Hybridization of G onto DNA array

P1 P2 P3

Hybridization of H onto DNA array

P3P2P1

C. E. Heitsch, GA Tech 19

Page 22: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Proof of principle

W1 = ctaacaatacgcgatg W2 = agtgtcacgttggaag

W3 = gtgtatccgacatgtg C2 = cttccaacgtgacact

P3P2P1 P1 P2 P3

Hybridization of H onto DNA array

“Proof of principle” experimental results support using

random De Bruijn sequences to design DNA codewords.

C. E. Heitsch, GA Tech 20

Page 23: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

Acknowledgments

• Ming Li for his DNA9 PowerPoint slides for “Attomole Detection of DNA inan Array Format with SPR Imaging Using RCA for the ProgrammableNanoscale DNA Biosensors.”

• Ming Li, Prof. Rob Corn, and the 2000 – 2002 Corn Research Group at theUniversity of Wisconsin – Madison. Graduate Students: Greta Wegner, Terry Goodrich, MingLi, Shiping Fang, Yuan Li, Heesuk Kim, Johanna Kwok. Senior Staff Scientist: Hye Jin Lee. PostdoctoralAssociates: Dr. Alastair Wark, Dr. Eric Codner. Visitors: Dr. Tomonori Saeki from Hitachi, Ltd.Undergraduates: Misha Wolfson, Takeyoshi Goto.

C. E. Heitsch, GA Tech 21

Page 24: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

References

[1] A. Ben-Dor, R. M. Karp, B. Schwikowski, and Z. Yakhini. Universal DNA tag systems: a

combinatorial design scheme. In RECOMB, pages 65–75, 2000.

[2] A. G. Frutos, Q. Liu, A. J. Thiel, A. M. W. Sanner, A. E. Condon, L. M. Smith, and R. M.

Corn. Demonstartion of a word design strategy for DNA computing on sur faces. Nucleic

Acids Res., 25(23):4748 – 4757, 1997.

[3] P. Gaborit and O. D. King. Linear constructions for DNA codes. Theoret. Comput. Sci.,

334(1-3):99–113, 2005.

[4] O. D. King. Bounds for DNA codes with constant GC-content. Electron. J. Combin.,

10:Research Paper 33, 13 pp. (electronic), 2003.

[5] M. Li, H. J. Lee, A. E. Condon, and R. M. Corn. DNA word design strategy for creating sets

of non-interacting sets of oligonucleotides for DNA microarrays. Langmuir, 18(3):805–812,

2002.

[6] M. R. Shortreed, S. B. Chang, D. Hong, M. Phillips, B. Campion, D. C. Tulpan,

M. Andronescu, A. Condon, H. H. Hoos, and L. M. Smith. A thermodynamic approach to

designing structure-free combinatorial DNA word sets. Nucleic Acids Res,

33(15):4965–4977, 2005.

[7] D. Tulpan, M. Andronescu, S. B. Chang, M. R. Shortreed, A. Condon, H. H. Hoos, and

L. M. Smith. Thermodynamically based DNA strand design. Nucleic Acids Res,

33(15):4951–4964, 2005.

C. E. Heitsch, GA Tech 22

Page 25: Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect6-slid… · Lecture 6 – January 18, 2008. Designing DNA codewords

[8] D. Tulpan and H. Hoos. Hybrid randomised neighbourhoods improve stochastic local search

for dna code design, 2003.

[9] D. Tulpan, H. Hoos, and A. Condon. Stochastic local search algorithms for dna word design,

2002.

C. E. Heitsch, GA Tech 23