Greedy Algorithms in the Libraries of Biology

28
17-Apr-2008 3:30-3:45 PM Avogadro-Scale Computing MIT Bartos E15 Thanks to: Greedy Algorithms in the Libraries of Biology PG P

description

Greedy Algorithms in the Libraries of Biology. P G P. 17-Apr-2008 3:30-3:45 PM Avogadro-Scale Computing MIT Bartos E15. Thanks to:. Is biology optimal?. Present 26720 km/h 4500m pm-Mm 3 o K 2000 yr. Human Past Locomotion 50 km/h Ocean depth 75m - PowerPoint PPT Presentation

Transcript of Greedy Algorithms in the Libraries of Biology

Page 1: Greedy Algorithms in the Libraries of Biology

17-Apr-2008 3:30-3:45 PMAvogadro-Scale Computing MIT Bartos E15

Thanks to:

Greedy Algorithms in the Libraries of Biology

PGP

Page 3: Greedy Algorithms in the Libraries of Biology

1E-4

1E-2

1E+0

1E+2

1E+4

1E+6

1E+8

1E+10

1E+12

1E+14

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020

Daltons synth

Bits/sec

Seq bp/$

3 Exponential technologies1 to 18 month doubling times

Shendure J, Mitra R, Varma C, Church GM, 2004. Carlson 2003; Kurzweil 2002; Moore 1965.

urea B12tRNA

telegraph

Computation &Communication

Analytic tRNA

Synthetic chemistry

human

Gb chips

Page 4: Greedy Algorithms in the Libraries of Biology

Avogadro scale, >>Yottaflops (from CMOS to sea moss)

Ultra-parallel 1038 units (lab libraries:108 to 1015 25mers)

AdaptableEvolution (years), Immune (days), Neural (seconds)

Thermodynamic limit 2x1019 op/J (irreversible) 3 x1020 for polymerase (1010 for current computers)

Memory density: Neural: (1012 op/s & 106 bits)/mm3, DNA: (103 op/s & 1 bit)/nm3

Error rate: DNA: 10-9 ; RNA/protein: 10-4

Biofuel: 4x107 J/kg (~=$) Adleman 1994

Page 5: Greedy Algorithms in the Libraries of Biology

DNA error rates

Ellis et al. PNAS 2001Constantino & Court. PNAS 2003

DNA Replication Fork

3. Mismatch repair

1. Incorporation 5’to 3’

2. Proofreading exonuclease 3’to 5’

Page 6: Greedy Algorithms in the Libraries of Biology

Bionano – Inorganic-microfab interfaces

• Metal-oxide-semiconductors (sponge silicateins for Ti  & Ga oxides)  • Magnetic components (magnetosomes in magnetotactic bacteria)• Optical fibers & lenses (e.g. venus basket sponge) • Bacterial reduction of salts to metals (e.g. Se, Au, Ag)

• Reading and writing DNA

Page 7: Greedy Algorithms in the Libraries of Biology

Reading DNA : Open-source hardware, software, wetware Polonator G007

~10 to $400/Gbp 1E-6 @ >3X redundancy

Page 8: Greedy Algorithms in the Libraries of Biology

Synthetic Biology: augmentation & combinatorics (not minimization)

1. Synthetic DNA: 1Mbp per month (Codon Devices)

2. New polymers in vitro – affinity selection (Vanderbilt)

3. Hydrocarbon & other chemical syntheses in E.coli (LS9)

4. Bacterial & stem cell therapies (SynBERC & MGH)

5. New codes: Viral resistant cells & new aminoacids (MIT)

6. Synthetic Ecosystems – Evolve secretion & signaling

7. Interfaces of Genomics & Society

Hierarchical, modular, evolvable

Page 9: Greedy Algorithms in the Libraries of Biology

DNA origami -- highly predictable 3D nanostructures

DNA-nanotube-induced alignment of membrane

proteins for NMR structure determination

RothemundNature’06

Douglas, et al. PNAS’07

Page 10: Greedy Algorithms in the Libraries of Biology

10 Mbp of DNA / $300 chip

8K Atactic/Xeotron/Invitrogen

Photo-Generated Acid

12K Combimatrix Electrolytic

44K Agilent Ink-jet standard reagents

380K Nimblegen/GA Photolabile 5'protection

Tian et al. Nature. 432:1050 Carr & Jacobson 2004 NAR

Smith & Modrich 1997 PNAS

Spatially patterned chemistry

Amplify pools of 50mers using flanking universal PCR primers &

3 paths to 10X error correction

Page 11: Greedy Algorithms in the Libraries of Biology

Mirror world : resistant to enzymes, parasites, predators

Mirror aptamers, ribozymes, etc. require mirror polymerases

352 aminoacid long Dpo4 Sulfolobus DNA polymerase IV347 peptide bonds done; 4 to go.

L-aminoacidsD-nucleotides

(current biosphere)

D-aminoacidsL-nucleotides (Mirror-biopolymers)

Page 12: Greedy Algorithms in the Libraries of Biology

• Molecular Biology Central Dogma DNA > RNA > Protein

PCR, T7 RNA pol, in vitro translation.

• Production of devices larger than or toxic to cells.• Directed evolution of drugs & affinity agents.

• Mirror-image proteins

Tony Forster(Vanderbilt)

Duhee Bang (HMS)

Why synthesize (minimal) in vitro self-replication?

Page 13: Greedy Algorithms in the Libraries of Biology

113 kbp DNA 151 genes

ideal for comprehensiveatomic, ODE &

stochastic models

Forster & Church

MSB ‘05 GenomeRes.’06Shimizu, Ueda

et al ‘01

Pure in vitro

translating & replicating

system

Page 14: Greedy Algorithms in the Libraries of Biology

Genome engineering CAD

70b 15Kb 5Mb 250 Mb

Polymerase in vitro

Isaacs, Carr, Emig, Gong, Tian, Reppas, Jacobson, Church

Recombination in vivo E.coli

Error CorrectionMutS 1E-4

Recombination in human cells

Bacterial (Artificial) Chromosomes

BACs

Human(Artificial) Chromosomes

HACs

Sequencing 1E-7

Chemical Synthesis

1E-2

Page 15: Greedy Algorithms in the Libraries of Biology

Native DNA computing : Lab Evolution

Reppas/Lin Trp/Tyr exchangeTolonen Ethanol resistance Lenski Citrate utilizationPalsson Glycerol utilizationEdwards Radiation resistanceIngram Lactate productionMarliere ThermotoleranceJ&J Diarylquinoline resistance

(TB)DuPont 1,3-propanediol production

About 3 serial additive changes per 30 days vs 2^30 exhaustive search

Page 16: Greedy Algorithms in the Libraries of Biology

rE.coli Strategy #3: ss-Oligonucleotide Repair

Obtain 25% recombination efficiency in E. coli strains lacking mismatch repair genes (mutH, mutL, mutS, uvrD, dam)

Ellis et al. PNAS 2001Constantino & Court. PNAS 2003

DNA Replication Fork

Improved Recombination Frequency:10-4 0.25 (> 3 log increase!)

Page 17: Greedy Algorithms in the Libraries of Biology

Multiplex Automated Genome Engineering (MAGE)

Wash with water &

DNA pool (50)

Concentrate, electroporate

Resuspend, bubble, select

O-ring membrane

Concentrate

Wang, Isaacs, Terry

Page 18: Greedy Algorithms in the Libraries of Biology

GEMASS Prototype

H. Wang, Church Lab, Harvard, 2008

Page 19: Greedy Algorithms in the Libraries of Biology

Recombination-Cycling for Combinatorial Accelerated Evolution

0

5

10

15

20

25

0 1 2 3 4 5 6 7

# mutations/clone

Fre

qu

en

cy

Mutation Distribution: 11 oligos, 15 cycles Mutation Distribution: 54 oligos, 45 cycles

Oligo Pool

# cycles Best Clone (98 %tile)

Fraction of mutated sites Time*

11 15 7 7/11 3 days

54 45 23 23/54 9 days

* Continuous cycling

Scaling & Automation Increase Efficiency of Recombination

Wang, Isaacs, Carr, Jacobson, Church

Page 20: Greedy Algorithms in the Libraries of Biology

Avogadro scale, >>Yottaflops (from CMOS to sea moss)

Ultra-parallel 1038 units (lab libraries:108 to 1015 25mers)

AdaptableEvolution (years), Immune (days), Neural (seconds)

Thermodynamic limit 2x1019 op/J (irreversible) 3 x1020 for polymerase (1010 for current computers)

Memory density: Neural: (1012 op/s & 106 bits)/mm3, DNA: (103 op/s & 1 bit)/nm3

Error rate: DNA: 10-9 ; RNA/protein: 10-4

Biofuel: 4x107 J/kg (~=$) Adleman 1994

Page 21: Greedy Algorithms in the Libraries of Biology

.

Page 22: Greedy Algorithms in the Libraries of Biology

Multiplex Automated Genome Engineering (MAGE)

syringe pump

electrically actuated valves

electroporation cuvette w/ membrane filter

OD sensor

data acquisition systemcomputer communication /

Wang, Isaacs, Terry

Page 23: Greedy Algorithms in the Libraries of Biology

Fab vs. Bio-fab+ Plays well with digital computers - No habla C++- Doesn’t get DNA + DNA is it’s native digital media- Needs us to replicate + We need them- Needs expensive Fab (e.g. ICs) + Simple or complex inputs - Intelligent Design + Evolution

Page 24: Greedy Algorithms in the Libraries of Biology

Cross-feeding symbiotic systems:aphids & Buchnera

• obligate mutualism• nutritional interactions: amino acids & vitamins• established 200-250 million years ago• close relative of E. coli with tiny genome (618~641kb)

Aphids

http://buchnera.gsc.riken.go.jphttp://buchnera.gsc.riken.go.jp

MILKFTWVMILKFTWV HR

Page 25: Greedy Algorithms in the Libraries of Biology

Shigenobu et al. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp.APS. Nature 407, 81-86 (2000).

Pink= enzymes apparently missing in Bucherna

Page 26: Greedy Algorithms in the Libraries of Biology

trp/tyrA pair of genomes shows best co-growth

Reppas, Lin et al. ; Accurate Multiplex Polony Sequencing

of an Evolved Bacterial Genome 2005 Science

SecondPassage

First Passage

Synthetic genome pair evolution

Page 27: Greedy Algorithms in the Libraries of Biology

Co-evolution of mutual biosensors/biosynthesissequenced across time & within each time-point

Independent lines of Trp & Tyr co-culture

5 OmpF: (pore: large,hydrophilic > small)

42R-> G,L,C, 113 D->V, 117 E->A

2 Promoter: (cis-regulator) -12A->C, -35 C->A

5 Lrp: (trans-regulator) 1b, 9b, 8b, IS2 insert, R->L in

DBD.

Heterogeneity within each time-point .

Reppas, Shendure, Porecca -12 -11 -10 -9 -8 -7 -6

At late times Tyr- becomes prototroph!

Page 28: Greedy Algorithms in the Libraries of Biology

Reducing costs of open-sourcehardware & wetware

Factor • 30 Equipment speed: from 1 up to 30 Mpixels/sec camera• 4 Equipment cost: from $500K down to $150K (Danaher Inc)• 36 Parallelism: 36 flow-cells per camera, 2 billion beads ------------------• 75 Flow cell volume: 1.5 mm down to 0.0085 mm thin• 40 Kit costs: $2000 down to $50 at standard enzyme costs• 10 Enzymes: $4000/mg down to <$400 (Enzymatics Inc)• 50 Genomic subset (Exome – 1% genome)