Tom Knight Ginkgo Bioworks Life with four billion atoms.
-
Upload
maximillian-mcgee -
Category
Documents
-
view
245 -
download
1
Transcript of Tom Knight Ginkgo Bioworks Life with four billion atoms.
Tom KnightGinkgo Bioworks
Life with fourbillion atoms
Energy inthe 1800s
The steam engine has given more to science thanscience has given to the steam engine. --- Lord Kelvin
Informationin the 1900s
AnabolismCatabolism
NaturalComplexity
(Food)
SpecifiedComplexity
(Organisms)
Core of simple universal parts (central metabolites)
(Energy carriers)
Design Information(Genome)
Harold Morowitz 1962 & 1984
Some history…
• Confusion over what PPLO/Mycoplasma were“The Microbe of pleuorpneumonia” Nocard 1896
• 1932 isolation of “PPLO” Koch postulates.• 1958 Klieneberger-Nobel: free living bacterial
species• Morowitz 1962 SciAm: “the smallest living cell”• 1980 Gilbert effort to sequence M. capricolum• 1982 Morowitz “complete understanding of life”• 1996 Fraser et al. M. genitalium sequence• 1999 Hutchison et al. Minimal genome set for
M. genitalium• 2009 Gibson & Lartigue: Genome transplantation• 2012 Serrano et al. Comprehensive model
Complexity, minimality &simplicity
• Complex systems have many parts• Reducing the part count leads generally to
simpler system (minimal part count)• Extreme part count reduction leads to
shared partsThese systems are ironically less simple
• There is an optimal part count for modular design
• Conservation of complexity• Stratified design
Structure function
Complexity ReductionUser
Application software
Operating system, user interface
Programming language
Instruction set architecture
Virtual machine
Computer hardware design
Functional computing units
Logic synthesis
Logic gates
Circuit design
Transistors
Mask geometry
Fabrication technologies
Semiconductor physics
Quantum physics
100’s of OS calls100 statements
100’s of instructions
10’s of units
10’s of gate types
4 types of transistors
15 mask layers
6 materials
Complexity Reduction
• Good News:
Biology is modular and abstract
Evolution needs modular design as much as we do
We can discover the modular designs, modify them, and use them
Engineered Simple Organisms• modular• understood• malleable• low complexity
• Start with a simple existing organism• Remove structure until failure• Rationalize the infrastructure• Learn new biology along the way
The chassis and power supply for our computing
Relative Complexity
100 1K 10K 100K 1M 10M 100M 1G 10G 100G
Gen
e
Plas
mid
Myc
opla
sma
geni
taliu
m (5
80 k
B)
Mes
opla
sma
floru
m (7
93 k
B)
E. c
oli (
4.6
MB
)S.
cer
evis
iae
(12
MB
)
Hum
an (3
.3 G
B)
Lilly
(12
GB
)
T7 P
hage
(36
kB)
Log Genome Size, base pairs
Alive
Autotroph
Choosing an organism:Mesoplasma florum
• Isolation from the flower of a lemon tree, Florida (McCoy84)
• Safe BSL-1 organism -- an insect commensal
• Not a human or plant or animal pathogen
• No growth at 37C
• Fast growing
40 minute vs.
six hours for doubling in M. genitalium
• Convenient to work with
Facultative anaerobe
• Small genome:
793,244 bp
682 coding regions
Tomographic EM of Me. florum
Grant Jensen – Caltech
3-D TEM image ofMesoplasma florumReconstructed from angledTEM images
300 x 400 nm6 nm membranes5 nm ribosomesFalse colored DNA
How many atoms?
• Cell diameter is about 400 nm• Approximately 2000 atoms in
diameter• About four billion atoms
70% of these are in water molecules
1.2 billion atoms in biomolecules
DNA is about 40 million atoms, 3%
Genome characteristics• 793281 base pairs• 26.52% G + C• 682 protein coding regions
UGA for tryptophanNo CGG codon or corresponding tRNAClassic circular genomeoriC, terminator region, gene orientation
• 39 stable RNAs29 tRNAs2x 16S, 23S, 5SRNAse-P, tmRNA, SRP
• One inactive insertion sequence• Gene direction largely oriented with replication
fork
Understand the metabolism• Identify major metabolic pathways by
finding critical genes coding for known enzymes
• Predict necessary enzymes which may not have been found
• Evaluate the list of unknown function genes for candidates
• Build the major metabolic pathway map of the organism
• Consider elimination of entire pathways
Mfl214, Mfl187
Mfl516, Mfl527, Mfl187
Mfl500 Mfl669Mfl009, Mfl033,Mfl318, Mfl312
Mfl666, Mfl667, Mfl668
Mfl023, Mfl024,Mfl025, Mfl026
ribose ABC transporter
glucosesucrose trehalose xylose
unknownfructose
sn-glycerol-3-phosphate ABC transporter
Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281
Glycolysis
Mfl497 Mfl515, Mfl526 Mfl499 Mfl317?, Mfl313? ?
Mfl181
beta-glucoside
Mfl009, Mfl011, Mfl012, Mfl425, Mfl615, Mfl034, Mfl617, Mfl430, Mfl313?
PTS II SystemMfl519, Mfl565
chitin degradation
Mfl223, Mfl640, Mfl642, Mfl105, Mfl349
Pentose-Phosphate Pathway
glyceraldehyde-3-phosphate
Mfl619, Mfl431, Mfl426
Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl170, Mfl195, Mfl372
Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648,Mfl143, Mfl466, Mfl198, Mfl556, Mfl385
Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375
Purine/Pyrimidine Salvage
glucose-6-phosphate
ribose-5-phosphate
Mfl413, Mfl658
xanthine/uracilpermease
DNA RNAMfl027, Mfl369
competence/DNA transport
DNA Polymerase
degradation
RNA Polymerase
Mfl047, Mfl048, Mfl475
Mfl237
protein translocation complex (Sec)
protein secretion (ftsY)
srpRNA, Mfl479
Signal Recognition Particle (SRP) Ribosome
Export
Mfl182, Mfl183, Mfl184
Mfl509, Mfl510, Mfl511
Mfl652Mfl557
Mfl605Mfl019
Mfl094, Mfl095, Mfl096, Mfl097,
Mfl098
Mfl015
spermidine/putrescineABC transporter
unknown amino acidABC transporterglutamine
ABC transporter
oligopeptide ABC transporter
arginine/ornithineantiporter lysine
APC transporteralanine/Na+ symporter
glutamate/Na+symporter
Mfl016, Mfl664
putrescine/ornithineAPC transporter
23sRNA, 16sRNA, 5sRNA,
Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082,Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080,
Mfl623, Mfl137, Mfl492, Mfl406
Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227,
Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440?
proteins
degradation
Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA
tRNA aminoacylation
ribosomal RNA transfer RNA
messenger RNA
Mfl029, Mfl412, Mfl540, Mfl014, Mfl196,Mfl156, Mfl282, Mfl387, Mfl682, Mfl673, Mfl077, rnpRNA
Mfl563, Mfl548, Mfl088, Mfl258, Mfl329, Mfl374, Mfl541, Mfl005, Mfl647, Mfl231, Mfl209
Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589,
Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355
Mfl086, Mfl162, Mfl163, Mfl161
amino acids
Amino Acid Transport
intraconversion?
Mfl590, Mfl591
Lipid SynthesisMfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626
fatty acid/lipid transporter
Identified Metabolic Pathways in
Mesoplasma florum
Mfl384, Mfl593,Mfl046, Mfl052
L-lactate,acetate
Mfl099, Mfl474,Mfl315, Mfl325,Mfl482
cardiolipin/phospholipids
membrane synthesis
x22
Mfl444, Mfl446, Mfl451
variable surface lipoproteins
hypotheticallipoproteins
phospholipid membrane
Mfl063, Mfl065, Mfl038,Mfl388
Mfl186 formate/nitratetransporter
Mfl060, Mfl167, Mfl383, Mfl250
Formyl-THF Synthesis
THF?
x57hypothetical transmembrane proteins
met-tRNA formylationMfl409, Mfl569
Mfl152, Mfl153, Mfl154
Mfl233, Mfl234, Mfl235
Mfl571, Mfl572
Mfl356, Mfl496, Mfl217
Mfl064, Mfl178Nfl289, Mfl037, Mfl653, Mfl193
Mfl109, Mfl110, Mfl111, Mfl112, Mfl113, Mfl114,
Mfl115, Mfl116
ATP Synthase Complex
ATP ADP
phosphate ABC transporter
phosphonate ABC transporter
metal ion transporter
Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582,
Mfl055, Mfl328
Mfl150, Mfl598, Mfl597, Mfl270, Mfl649
acetyl-CoA
cobalt ABC transporter
Mfl165, Mfl166
K+, Na+transporter
Mfl378
malate transporter?
Mfl340, Mfl373, Mfl521, Mfl588
Pyridine Nucleotide Cycling
NAD+
Electron Carrier Pathways
NADHNADPH
NADP
Flavin Synthesis
riboflavin?
FMN, FADMfl283, Mfl334
Mfl193
Mfl057, Mfl068, Mfl142,Mfl090,
Mfl275
Mfl347, Mfl558
G. Fournier02/23/04
x13+
unknown substrate transporters
PRPP
niacin?
How Simple is this?• Missing cell wall, outer membrane• Missing TCA cycle• Missing amino acid synthesis• Missing fatty acid synthesis• One sigma factor• Small number of dna binding proteins• One insertion sequence, probably not active• One restriction system (Sau3AI-like)• CTG/CAG methylation (function?)• Evidence for shared protein function
MDH/LDH (Pollack 97 Crit rev microbiol 23:269)
Proteome• Collaboration with Steve Tannenbaum / Yingwu
Wang• 2-D gels + MS spot ID• LC/LC/MS/MS ID of trypsin digests
Proteome Results
• 180 spots picked and analyzed
• Mudpit LC/LC/MS/MS also carried out
• 369 proteins identified by trypsin digestion and mass spec out of 682 annotated coding regions
• Transcription of 16S ribosomal RNA
• Stops don’t always stop Instead they cause frame shifts into other frames
Transposome insertions• Engineered tetM tetracycline resistance gene• Promoter from Tn4001 tetM gene• Outward directed primers for insertion site
verification• Unique I-SceI cut site
• In vitro binding of Tn5 transposase• Electroporation of Tn5 transposome• Selection with tetracycline• Genomic DNA prep• Cut with MboI frequent cutter & religate• PCR with outward directed primers• Sequence to identify insertion site• Locate disrupted genes• Alternatively: directly sequence from genomic DNA
Custom Transposon
tetM-recoded-transposon32165 bp
tetM
ME EndME End
stop codons
stop codons
SP6 promoter
T3 Promoter
I-sceI site
T7 promoter
M13-forward
EcoRI (78)
PstI (2064)
SpeI (2046)
XbaI (117)
AvrII (2139)
NotI (2054)
PvuII (4) PvuII (2163)
Tn5 Transposomes• Transposon design issues
• Codon usage
• Promoter design
• Restriction site avoidance
• Cell transformation
• Electroporation voltage
• Selection medium
Transposome insertion events
• 2700 currently picked, saved, and sequenced
• 337 Essential Genes + 29 tRNA + 7 essential RNA genes
• Most are unsurprising surface lipoproteins and “unknown function”
• Some surprises: inessential ftsZ, mreB, many ribosomal & tRNA modification proteins, But the Sau3AI homologous restriction system appears
essential About 80 unknown function genes (many GTPases) are
essential
• Compare with Dybvig08 results on M. arthritidis French08 results on M. pulmonis Glass06 results on M. genitalium
• Ordered library of cells with 330 inactivated genes
Functional Categories of Essential Genes
Category Number
DNA Replication 22
Cell Division 5
Transcription 12
Nucleotide Transport & metabolism
20
Protein translation 112
Post translational modification 6
Protein secretion 7
Lipid metabolism 7
Coenzyme metabolism 7
Energy production 11
Transporters 35
Transposome insertion events
Essential is not absolute
• Multi-copy genes are not identified as essentialNADH oxidaseAcyl carrier protein
• Essentiality is defined by the culture conditions
• Genes with stability and reliability function are marked as dispensableDNA repairChaparonesSome RNA modification enzymes
• This is a much more important effect in larger genomes
Next in Analysis and Tools
• Genome re-engineering with knock-in/knock-out
• Resequencing
• Whole cell metabolic models
• Plug and play modules for additional function
• Biosafety issues
Genome re-engineering tools
• Plasmid: S. citri pSci2 PE protein based (Breton08) J70302 registry part, under test now
• recET recombination system (S. citri recT gene) J70007 recT part, DNA available, being mutated Chloramphenicol resistance gene cassette PheS mutant gene cassette
• Phase 1: Turn on recombination Insert PheS/cat cassette in the target location Select with Chloramphenicol
• Phase 2: Turn on recombination Insert final modification Select with p-chlorophenylalanine
• Result: seamless editing of the chromosome
Resequencing• Illumina sequencing is cheap and very high throughput• Relatively straightforward with a pre-existing scaffold
sequence• We get millions of reads of limited length (35-70 bp)
Paired ends, 250-500 bp fragments• Bar coded samples can multiplex the sequencing effort
Allows many samples to be sequenced in a single run
• Resequence the Mesoplasma florum genome• De novo sequence for sixteen additional strains
Collection of Robert Whitcomb• De novo sequencing for several closely related species
Mesoplasma entomophilumMesoplasma lactucae
Whole cell modeling
• Approximately 2000 chemical reactions
• About 300 small molecule species
• Faster implementations of stochastic models
• Faster computers
• Comparison against realityMass spec quantitation of metabolites
Open Cell Modules
• Energy sources Arginine vs. glucose Photosynthesis
pathway Citric acid cycle
(reverse?)• Amino acid synthesis
Add unnatural AAs• Nucleotide synthesis• Lipid synthesis• Cofactor synthesis• Measurement structures• Environmental niche
Halobacterium Sulfur reducer Temperature optimum
• Membrane export / import• Membrane structure• Sensing of chemical
environment• Flagellar motion• Light sensing• Light production• Cell cycle control
(sporulation)• Biosafety modules
Biosafety Barriers
• Codon isolationCGG containing genes are unusable insideTGA containing genes are unusable outsideExtend this idea with more codons
• Pairs of required essential nutrientsReduces likelihood of gradual evolution of
workarounds
• Explicit “kill” switchesOtherwise benign chemicals lethal to this
organismShared function with critical metabolism reduces
drift
Engineeredorganism
Naturalorganism
No transfer: UGAnot translated
No transfer: CGGnot translated
X
X
Natural PhageEngineeredphage
XX
Recoding the genome of entire organisms
Kit Part the genome
• Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element
• Develop techniques for recombining parts into coherent modulesYAC editing and assembly, e.g.Lambda RED or RecET recombination
• Enable the bootstrapping of cells based on the redesigned genomeLiposome fusion, e.g.
• Learn the design rules for chromosomes
Thanks to…• Harold Morowitz• Greg Fournier• Gail Gasparich• Bob Whitcomb• Eric Lander• Bruce Birren• Nicole Stange-
Thomann• George Church• Roger Brent• Grant Jensen• Yingwu Wang• Samantha Burke• PJ Steiner
• Nick Papadakis• Ron Weiss• Drew Endy• Randy Rettberg• Austin Che• Reshma Shetty• MIT Synthetic biology
working group• DARPA, NTT, NSF,
Microsoft• Colleagues at Ginkgo
Bioworks
Thank you for your attention
Our Plan
• Completely understand a simple organism• Build excellent models and predictive tools• Simplify the organism further
Remove inessential genes Replace dual function genes with single function
equivalents
• Abstract useful modules from other living systems
• Understand and create good models for these modules
• Selectively add these modules to the existing simple cell
The code’s 4 billion years old; it’s time for a rewrite
The Mollicute Bibliome
Complete collection of mycoplasma related papers:
• 6,411 and counting
• Books and book chapters also
• Endnote file: mycoplasmas.enl
• Downloaded .pdfs for articles > 1995
• Scanned articles and books, OCR
•Collaboration for “shallow semantic” understanding
• people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso