From Genome Sequences to Regulatory Network Phenotypes
description
Transcript of From Genome Sequences to Regulatory Network Phenotypes
From Genome Sequences to Regulatory Network Phenotypes
• Study the systematic operation of genes and their products in whole genome, whole cell contexts.
• Discover the effect of every gene on growth, expression, & interaction .
• Test quantitative network models.
(bioinformatic functional genomics:)
Growth, Expression, & InteractionHarvard Center for
Computational Genetics
John Aach
Tim Chen
George Church
Jason Hughes
Jason Johnson
Abby McGuire
Jong Park
Fritz RothAffymetrix
David Lockhart
Eric Gentalen
NCBI
Andrew Neuwald
DOE, DARPA, Lipper, NIST, HMR
HMS Genetics
Andy Link, Doug Selinger
Pete Estep, Michael Ching
Martha Bulyk, Sonali Bose
Martin Steffen
Saeed Tavazoie, Annie Chan
Dereth Phillips, Chris Harbison
UCSD
Bernhard Palsson
Sequenced genomes
Organism # Genes% Unknown
functionS cerevisiae 6034 49%E coli 4288 38%B subtilus 4000 42%Synechocystis sp. 3168 56%A fulgidus 2471 52%H influenzae 1740 42%M thermoautotrophicum 1855 56%H pylori 1590 43%M jannaschii 1692 54%B burgdorgeri 863 42%M pneumoniae 677 51%M genitalium 470 31%
Total 28848 47%
Science 277: 1433 (1997) FUNs
Choice of Cells
Small genome size: Mycoplasma, Haemophilus, MethanococcusEnergy relevance: Methanobacterium, Synechocystis Major Pathogens: Mycobacterium, Escherichia, HelicobacterBiotech Production: Escherichia, Saccharomyces, Homo Recombinant protein production, in vivo combinatorial chemistry,BACs, gene delivery, etc.
15 going on 40 complete genomes. 30,000 going on 150,000 complete genes (& intergenic regions).
Smith, et al. (1997) J. Bacteriol. 179:7135-55. MethanobacteriumBlattner, et al. (1997) Science 277, 1453-74. EscherichiaGoffeau, et al. (1996) Science 274, 563-7. Saccharomyces
Metabolic & regulatory databases
4288 / 4909 E. coli orfs / genes 587 - 804 enzymes720 - 988 metabolic reactions436 / 1303 metabolites / compounds
Varma & Palsson (1994) Appl. Env. Micro. 60:3724.Karp et al. (1998) NAR 26:50. EcoCycSelkov, et al. (1997) NAR 25:37. WITRobison and Church http://arep.med.harvard.edu
has
exhibits
used in
described by
has
described by
described bydescribed by
exhibits
exhibits
exhibits
exhibits
exhibits
exhibits
exhibits
input to
used in
used in
used in
Strain Phenotype Expt
Starting Cell CountStarting Cell Density
Condition Set
Condition Set NumberDescriptionComment
Experiment Measures Set
Expt Measures Set NoTime of MeasurementExpt Measures Set TypeDescriptionCommentRaw Data Sets DescripData Transform DescripOutcome CommentSuccess CodeDate RecordedSample SizeOpenInd
Growth
Rel Growth MutantStd dev Rel Growth MutantWinner Mutant IndRel Growth AllStd dev Rel Growth AllWinner All Ind
mRNA Expression
mRNA Expression LevelStd dev Express Level
Protein Expression
Cell FractionProtein State Exp LevelStd Dev Prot State Level
Strain Mix
Strain Mix NumberStrain Mix NameDescriptionPreparation Comments
Conceptual Data Model
Project : TBEID1
Model : TBEID
Author : John Aach Version: 1.04 7/7/97
Footprint
Fraction OccupancySt Dev Frac Occupancy
DNA Protein Binding Expt
DNA Seq Binding
DNA Seq Bind Const NumDNA SequenceBinding ConstantStd Dev Binding Constant
Protein Preparation Set
Prot Prep Set NumberDescriptionComment
Protein Protein Binding
Binding LevelStd Dev Binding Level
Protein Protein Binding Expt
Submodel cross-references: * = main model, C = Condition Set Entities, D = DNA and Protein Elements, N = Names, P = Protein Preparation Entities, S = Strain and Strain Mix Entities
(P)
Competition Phenotype Expt
Starting Cell CountStarting Cell Density
(S)
(C)
(S,N)
Non Specific DNA Binding
Non Specific Binding ConstStd Dev Non Spec Bind Const
Experiment Info
Experiment NumberExperiment TypeExperimenter NameDescriptionCommentStart TimeEnd TimeOutcome CommentSuccess CodeSample SizeOpenInd
Strain
Strain NumberProgenitorIndDescriptionComment
Results Selection
Results Selection CodeExpt Measures Set TypeResults Selection Description
BIGED
Biomolecule Interaction,
Growth, Expression, &
Database:
John AachHarvard Center for Computational Genetics
Functional Genomics: Growth, Expression, & Interaction
Why?Sampled sequence vs. Completed genomesRandom vs. Engineered mutations & environmentsEvolutionary models vs. High-throughput assays
Pure comparative genomics challenge:15% amino acid identity:Globins retain heme & oxygen binding functions
100% amino acid identity:Enolase functions vary from enzymatic to major vertebrate lens structural component.
Environments
Metabolites
Growth rate
RNADNA Protein
Expression
InteractionskD
kR kP
kI
kc
kD , kD , kD : Initiate, Elongate, Terminate, Fold, Modify, Localize, Degrade
Escherichia coli & Saccharomyces cerevisiaeRegulatory and Metabolic Networks
Automate Data Model Similarity quality quality search
X-ray 1960 resolution |o-c|/o DALIdiffraction < 0.2nm R < 0.2
Sequence 1988 discrepancy conserved BLAST bp <0.01% proteins
Function 1999 completion DNAgibbs CorFun (growth, expression, & interaction; CorEnvironment)
Translating successful strategies: Metrics(physics envy & killer applications)
Ratio of strains over environments, e ,times, te , selection coefficients, se,R = Ro exp[-sete]
80% of 34 random yeast insertions have s<0.3% or s>0.3%t=160 generations, e=1 (rich media); ~50% for t=15, e=7.Should allow comparisons with population allele models.
Other multiplex competitive growth experiments:Thatcher, et al. (1998) PNAS 95:253.Link AJ (1994) thesis; (1997) J Bacteriol 179:6228.Smith V, et al. (1995) PNAS 92:6479. Shoemaker D, et al. (1996) Nat Genet 14:450.
Multiplex DNA sequencing.Church GM. Kieffer-Higgins S. (1988) Science. 240:185.
Physical mapping of complex genomes by cosmid multiplex analysis. Evans GA. Lewis KA. (1989) PNAS 86: 5030.
Multiplexed biochemical assays with biological chips.Fodor SP, et al. (1993) Nature 364:555.
Lashkari DA, et al. (1995) An automated multiplex oligonucleotide synthesizer. PNAS 92(17):7912.
Multiplex: Tag(Mix) > Process > DecodeInternal standards, identical conditions, microscale
Multiplex Competitive Growth Experiments
In-framemutants+ wild-type
Pool Select
MultiplexPCRsize-tagor chipreadout
40° pH5 NaCl Complex
t=0
107 Environments (so far)
minimal mediayeast extractsynthetic richLow NLow PNaClurinepancreatinBile Cholatetriton X-1002 acetate4 butyrate6 hexanoatehomoserine lactone
Combinatorial:a,H,F,Q,tg,L,Y,N,SC,I,W,u,E M,K,T,D,dapV,P,R,G,thiaminea,g,C,M,thiamine H,L,I,K,VF,Y,W,T,PQ,N,u,D,Rt,S,E,dap,G
pH: 5, 6, 7, 8, 9Temperature: 25, 30, 37, 45
pyridoxin,nicotinate,biotin,pantothenate,A
Genome EngineeringChallenges: Construct any mutant in any background,multiple mutants, minimizing hitchhiking mutants.
Avoid undesired residual activities and neomorphic effects on adjacent genes in most deletion, insertionnonsense, or antisense alleles.Full in-frame replacements, computationally track gene overlaps, primer & genomic repeats.
Link, et al. (1997) J. Bacteriol. 179: 6228-6237. (pKO3)http://arep.med.harvard.edu
ATG
TAA
Primer with NotI site
c-tag
tagATG
TAA
ATG
TAA
Primer with Bam site
TAAATG
tag
Crossover PCR in-frame deletions / tag substitutions
nearby genegene of interest
30°sucrose
Resolving the cointegrant
2 = mutantwild type = 1
repAts
camR
sacB
M13 ori
43° Cam
pKO3: in-frame tagged deletions
tag
tag
Deleted Orf
yiaU
yhcS
ydhB
yfiE
pssR
789
518
348
266
194
141
106
universaltag primer
Primer design for size-tagged PCR3% agarose
size-tagged primerslength
ygfX
ygoX
Competitive Growth Rate Tag Readout
ygfX
yiaU
ydhB
yfiE
ygoX
pssR
yhcS
1 2
rich P- minimal N- minimal
111 222
Effects of pH in rich media
-200
-100
0
100
200
300
400
500
600
700
pssR farR nhaR ydhB yhcS yidP yhiF yidL uw6519
% c
han
ge f
rom
in
ocu
late
r' pH5
r' pH6
r' pH7
r' pH8
r' pH9
Genome EngineeringCurrent status
5 Highly Expressed Genes Link46 Putative regulatory FUNs Phillips24 Highly conserved FUNs Loferer20 Flux Balance Predictions in prep.
Flux balance modelwith max growth objective:
S . v = bS = stoichiometric matrix (m x n)v = vector of n fluxesb = I/O rate vectorn = 720 metabolic fluxesm= 436 metabolites
Predict major flux changes:
zwf-
zwf- pnt-
& synthetic lethals:
zwf- pgi-
GA3P
DPG
FDP
F6P
G6P
10.5010.5010.50
Glucose
3.929.279.36
3PG
2PG
PEP
Pyr
DHAP
6PGA 6PG
Ru5P
E4P
X5P
R5P
S7P
For
OAA
Mal
Fum
Succ
SuccCoA
KG
Ic i tC it
AcCoA
QH2
FADH
NADH
ATP
NADPH
H+
Ac
6.1600
3.9210.0810.11
2.700.590.64
1.87- 0 .1 8- 0 .6 2
1.54- 0 .5 1- 0 .4 7
3.929.279.36
3.929.279.36
1.89- 0 .1 6- 0 .1 5
3.44- 0 .6 7- 0 .6 2
15.9218.0018.21
15.9218.0018.21
14.5216.6216.93
14.5216.6216.93
10.5
0.953.07 0
0.522.525.18
0.522.525.18
0.122.134.82
1.403.405.99
1.403.405.99
1.403.405.99
1.343.345.94
1.343.342.33
0 03.61
5.085.253.54
9.3711.51 0
0 012.19
0.522.525.18
010.2 0
36.2731.5633.43
30.0
00.04 0
29.1227.1224.52
2.382.355.79
Non-coding regions:E. coli: 11%Yeast: 25%Human: 95%
Similarity searching for environments,growth, expression, & interaction data and then theChallenges of DNA sequence motifs:short motifs & limited alphabet (4)
Yggn
pspAo85
YiaK
carAB
f214
hrsAf105
ppiA
o184mtlA5’
mtlA3’
rspA
YidX
kdgT
Yggn
pspAo85
YiaK
carAB
f214
hrsAf105
ppiA
o184
mtlA
5’
mtlA
3’
YidX
rspA
kdgT
A
B
C
D
E
F
Positive correlationNegative correlation
Catabolite repressionglucose & Crp regulated
CorFun = Zg.Zg
T /nn = #environ+genotypesg = gene sites
(switching n & g gives CorEnv)
Log vs. stationary-phase regulated
growth, expression, &/or interaction
Expression data from four cultures,allow three comparisons
glucose 30oC
Mating type a
galactose 30oC
Mating type a
glucose 30oC
Mating type
glucose 30o C -> 39o C shock
Mating type a
Expression Quantitation Options
1) n-dimensional cDNA or protein displays2) Computer selected oligomer-arraysphotolithographic or piezoelectric deposition3) Gridded microarrays from clones4) Counting 13-bp cDNA tags (SAGE)(20,000 tags means <800 RNAs have S/N>4)
Lockhart, et al. (1997) Nature Biotechnology 15:1359. DeRisi, et al. (1997) Science 278:680.Velculescu, et al. (1997) Cell 88:243.
Galactose Regulatory Network
Gal4p-Gal80p active complex
Gal3p
GAL1MEL1 GAL7PGM2 GAL2 GAL10
Gal4p-Gal80p inactive complex
GALACTOSE
GAL80
GAL4
GCY1
Structural Genes For Galactose Metabolism
?
GAL3
Gal1p
Fold Change in GAL3 in Galactose vs. Glucose(Median Fold Change is 3.1)
GAL3: Fold Change in Expression between Growth in Galactose and Growth in Glucose
0
5
10
15
20
25
1 3 5 7 9
11 13
15
17
19
Probe Number
Fo
ld C
ha
ng
e
orfID/gene:chip#probes medFC consFC thrshld missingMM? expr ratio log expr ratio BINS log expr ratioFRE Q
Y BR020w/GAL1:A 21 64.81 24.57 2 64.81 1.81164202 -2 0
Y BR018c/GAL7:A 21 41.91 10.58 2 41.91 1.62231766 -1.95 0
Y BR019c/GAL10:A 20 37.8 13.03 2 37.8 1.5774918 -1.9 0
Y DR345c/HXT3:A 20 -25.05 -13.58 0.03992016 -1.39880773 -1.85 0
Y OR120W /GCY 1:D 20 12.31 7.81 2 12.31 1.09025805 -1.8 0
Y LR081w/GAL2:C 21 8.19 3.56 2 8.19 0.9132839 -1.75 0
Y GL189C/RP S 26A:B 19 -7.82 -0.45 0.12787724 -0.89320675 -1.7 0
Y P L066W /VP S 28:D 20 6.35 2.75 2 6.35 0.80277373 -1.65 0
Y HR094c/HXT1:B 20 -6.26 -2.38 1 0.15974441 -0.79657433 -1.6 0
Y OL154W /:D 21 -6.04 -3.27 0.16556291 -0.78103694 -1.55 0
Y P L067C/:D 21 5.95 3.13 2 5.95 0.77451697 -1.5 0
Y GL030W /RP L32_ex1:B21 -5.32 -3.11 0.18796992 -0.72591163 -1.45 0
Y FL045C/S E C53:B 21 -5.17 -2.73 0.1934236 -0.71349054 -1.4 0
Y BR106w/:A 21 -5.03 -2.66 1 0.19880716 -0.70156799 -1.35 1
Y E R190w/_f:B 20 -4.9 -2.48 1 0.20408163 -0.69019608 -1.3 0
Y MR318C/:D 20 4.02 2.36 4.02 0.60422605 -1.25 0
Y NL015W /P BI2:D 20 3.89 2.3 2 3.89 0.5899496 -1.2 0
Y BR011c/IP P 1:A 20 -3.73 -1.75 0.26809651 -0.57170883 -1.15 0
Y E R178w/P DA1:B 20 -3.46 -2.22 0.28901734 -0.5390761 -1.1 0
Y OL058W /ARG1:D 20 3.36 2.24 3.36 0.52633928 -1.05 0
Y CR005c/CIT2:A 20 -3.3 -2.15 0.3030303 -0.51851394 -1 0
Y HR092c/HXT4:B 20 -3.27 -1.52 1 0.3058104 -0.51454775 -0.95 0
25srRnaa:A::25srRnaa:B::25srRnaa:C::25srRnaa:D84 -3.27 -1.49 0.3058104 -0.51454775 -0.9 0
Y GL055W /OLE 1:B 20 3.21 1.98 3.21 0.50650503 -0.85 1
Y FR024C/_r:B 20 -3.21 -1.43 1 0.31152648 -0.50650503 -0.8 0
Y HR033W /:B 20 3.15 1.52 3.15 0.49831055 -0.75 2
Y DR009W /GAL3:A 20 3.08 1.38 2 3.08 0.48855072 -0.7 3
Y GR244C/:B 20 2.99 1.55 2 2.99 0.47567119 -0.65 1
Y KL096W /CW P 1:C 21 -2.97 -1.78 0.33670034 -0.47275645 -0.6 0
Y NL052W /COX5A:D 20 2.94 1.96 2.94 0.46834733 -0.55 1
Y J R073C/OP I3:C 20 -2.92 -1.52 0.34246575 -0.46538285 -0.5 5
Y MR256c/COX7:D 21 2.84 1.64 2.84 0.45331834 -0.45 3
0
5
10
15
20
25
30
Food Gas Motel
JanFebMarAprMayJun
Relative expression of all genes: Galactose vs. Glucose
0.1
1
10
100
1000
10000
-2.0
-1.5
-1.0
-0.5 0.0
0.5
1.0
1.5
2.0
Log of Fold Change
Num
ber
of G
enes
To analyze the most induced genes, we...
• Extracted the intergenic DNA sequence upstream of each translation start using the Saccharomyces Genome Database.
• Used an algorithm for multiple sequence alignment to look for sequence motifs conserved among the most induced (or repressed).
• Looked at the intersection of genes which both matched a conserved motif and were induced (or repressed)
Gibbs Motif Sampling Strategy1 Initialize the alignment by choosing a random subset of all
possible sites as the ‘site’ alignment, and use all remaining sequences to give a ‘non-site’ alignment.
2 Select a potential site from among all possible sites.3 If the site is in the alignment, take it out.4 Calculate the relative likelihood that the potential site belongs
with the site alignment rather than the ‘non-site’ alignment, based on a Bayesian multinomial distribution model.
5 Randomly choose whether or not to add the site, weighted by this relative likelihood.
6 Repeat Step 2
‘DNAGibbs’: A Modified Gibbs Motif Sampler Optimized for DNA searches.
• Either forward or reverse strand of a potential site -- but not both -- may be added to the alignment.
• Near-optimum sampling method was improved so that it is faster and tends to result in higher scoring alignments.
• Simultaneous multiple motif searching was replaced with a more efficient iterative masking approach.
• The model for base frequencies of non-site sequence was fixed using the average nucleotide frequencies of S. cerevisiae.
• Now runs on DEC Unix and Windows platforms, in addition to the formerly supported SGI and Sun Unix platforms.
• DNAGibbs (maximum log a posteriori likelihood ratio) scores less than 5. .
• Good matches (Z < 3 sd below the mean of the aligned positive motifs) with greater than 10% of all yeast genes (ORFs)
Finally, exclude motifs with:
*O.G. Berg & P.H. von Hippel, J. Mol. Biol., 193: 723-750 (1987)
Using the top 10 genes induced in galactose, DNAGibbs found UASG, the site recognized by Gal4p
Info
rmat
ion
(B
its)
sequence logos were developed by T.D. Schneider & R.M. Stephens, Nucleic Acids Res., 18: 6097-6100 (1990).
CGYTCGGA-GA-AGT---CCGA Previous UASG consensus
Genes that changed between galactose and glucose by more than 2-fold and have strong matches to the UASG motif
Gene Fold Change Best Z-Score # of SitesGAL1 >65 -1.4 5GAL7 >42 -0.7 2GAL10 >38 -1.4 5GCY1 >12 0.5 1GAL2 >8 0.4 4YPL066W >6 -1.1 1YPL067C >6 -1.1 1YMR318C 4 1.1 1GAL3 >3 2 2
Galactose Regulatory Network
Gal4p-Gal80p active complex
Gal3p
GAL1MEL1 GAL7PGM2 GAL2 GAL10
Gal4p-Gal80p inactive complex
GALACTOSE
GAL80
GAL4
GCY1
Structural Genes For Galactose Metabolism
YPL067C YPL066W
?
?
YMR318CGAL3
Gal1p
DNAGibbs and mating type
Motif Score %ORF Consensus Similaritymt-1 (A) 8.9 0.11 ttcctarttng P Boxmta-1 (B) 8.5 0.05 anwncwnkmaananantcwtbwtnw -mta-2 (C) 5.0 0.10 aaaycawmawnanwa -mta-3 (D) 28.1 0.31 grnawktacayg 2-bind, mt-mta-1mt-mta-1 (E) 20.7 0.34 crtgtanntwyc 2-bind mta-3mt-mta-2 (F) 5.3 0.13 kwtnywnnnknnntgtttsa PRE, mt-mta-2mt-mta-3 (G) 8.6 0.27 tgamaywwtnaama PRE, mt-mta-1mt-mta-4 (H) 5.3 0.31 rmtgmcngcma Q Box
Expect DNABP Consensus Ref: Herskowitz, et al.,P Box Mcm1p tttcctaattaggnan in Gene Expression, E. W. Jones, Q Box Mat1p tcaatgacag et al., Eds. (CSHL Press, NY, 1992) .2-bind Mat2p crtgtaawt vol. 2: pp. 583-656PRE Ste12p tgaaaca
0 1 2 3 4 5 6 7 8 9 10Z-score
rpoD15rpoD17rpoD16rpoD18
ompRhnslrp
rpoD19malTrpoS
crpdnaA
fisnarLfarR
glpRtrpRsoxS
ihfoxyRmetRtyrRargRcytR
furmetJphoBfruRcspAtorR
nagCfadRpurRarcA
pdhRlexAgcvA
fnrgalRntrCrhaSiclRfhlA
cynRada
deoRcarPlacI
marRrpoH14
ilvYrpoH13
araCtus
hipBflhCD
rpoEmelRcysBrpoN
rpoD15rpoD17rpoD16rpoD18
ompRhnslrp
rpoD19malTrpoS
crpdnaA
fisnarLfarR
glpRtrpRsoxS
ihfoxyRmetRtyrRargRcytR
furmetJphoBfruRcspAtorR
nagCfadRpurRarcA
pdhRlexAgcvA
fnrgalRntrCrhaSiclRfhlA
cynRada
deoRcarPlacI
marRrpoH14
ilvYrpoH13
araCtus
hipBflhCD
rpoEmelRcysBrpoN
Calibration of 60 E. coli binding site matrices
Interaction Quantitation Options
Over-expression:Yeast two-hybrid screens (in vivo complexity)
In vitro chip assaysMartha Bulyk, David Lockhart, Erik Gentalen
Natural levels, environmental regulation:Subcellular fractionation (unstable)In vivo footprinting (partners unknown)In vivo crosslinking
xmask 2
3'
A A o o o oxx x
h
Combinatorial ds-DNA Chips(chemical, photo & enzymatic synthesis)
SiO2
A A C C G G
3'
specific 16-mer
A C A C A C
A A C C G GA C A C A C
Polymerase
cg
GC
GC
cg
5'
3'5'
spacern-mer
primer
2nd strandsequenceat half-sites
GTAGTAAGTACGTAGGTATGTCGTCAGTCCGTCGGTCTGTGGTGAGTGCGTGGGTGT
length of spacer between half-sites
14 0 14 0 14 0 14 0
length of spacer between half-sites
BEFORE RsaI Digestion(zoomed in view)
AFTER RsaI Digestion(zoomed in view)
RsaI Digestion of a Fixed Density Double-Stranded DNA Chip with a Variable Spacer Length of 0 to 14 bp Between the Half-Sites
Conclusion: Loss of Signal Intensity Corresponds to Cleavage of dsDNA by RsaI
Significance:1) Double-Stranded DNA is Created by Primer Extension of ssDNA Chips
2) Double-Stranded DNA on the Surface of the Chip is Accessible for Interaction with a DNA-Binding Protein
5'
GTAC
GTAC
CA*TG
CA*TG
RsaI
Interaction Quantitation Options
Over-expression:Yeast two-hybrid screens (in vivo complexity)
In vitro chip assays
Natural levels, environmental regulation:Subcellular fractionation (unstable)In vivo footprinting (partners unknown)In vivo crosslinkingMartin Steffen, Andy Link
Isolate in vivo crosslinked complexes
by nucleic acid CsCl (or hybridization) by protein epitope tag
analyze protein by DNase 2D gel,trypsin-LC-ESI-MS/MS
analyze DNA/RNA by chip pH
kdal
Link et al. (1997) Electrophoresis 18:1259 & 1314
Rich media log-phase, in vivo crosslink, DNaseI digest
pH
kdal
4 5 6 7
10
20
30
40
50
100
lac I
fu r
grpE
dps
hns
efp
purEdps
sspA
ihfB
ssb
In vivo crosslinking & footprinting summary
11% of the E.coli genome is non-coding.About 340 / 4328 proteins are likely DNA-binding proteins (2 or the top 380 proteins).
24/25 footprinted GATC sites are non-coding. Odds = 10-27.
2/3 crosslinked DNA molecules are likely regulatory binding sites. Odds = 0.04
8/11 top DNA-crosslinked proteins are known DNA-binding proteins. Odds = 10-16.
Thoughts on chips for crosslinked epitope selections (& generally).
An easy 10-fold enrichment but with 40,000 fragments meansan expensive 1:4000 Signal:Noise,if sequencing (or SAGE) were used.
However, spread over a chip, 1:10.
E. coli oligonucleotide chip challenges:
#1) Closely spaced transcripts, e.g. carAB: (Intergenic 25-mers overlap, start 6 bp apart on average)
P1(pyrimidine) ... 48 bp ... P2(arginine)
gggtaagcaaatttgcattgcttcatactgactgaatgaattaatatgcaaataaagtg
#2) Repeats, e.g. tufA & tufB DNA. Mismatches: *.....*.........*..*.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................................................................................................................................................................................*...........................................................................................................................................................*...................................*.................*..*........*.......................*.............................*.............
Expression: Cell-type & condition clustering plus DNAGibbs algorithm extracts intergenic binding motifs for yeast Gal-Glc, Mat-Mata, & 30oC-39oC comparisons.
Interaction: Strong enrichment for low abundance wild-type & mutant in vivo E.coli DNA-protein contactsestablishes mechanistically anchored intergenic elements.
Growth: Multiplex competitive growth of in-frame replacements for novel E.coli regulatory genes definescellular system integration & environments.
From Genome Sequences to Regulatory Network Phenotypes
Summary
Environments
Metabolites
Growth rate
RNADNA ProteinExpression
InteractionskD
kR kP
kI
kc
kD , kD , kD : Initiate, Elongate, Terminate, Fold, Modify, Localize, Degrade
Escherichia coli & Saccharomyces cerevisiaeRegulatory and Metabolic Networks
Population Selection, Flux Balance, & Gibbs
Growth, Expression, & InteractionHarvard Center for
Computational Genetics
John Aach
Tim Chen
George Church
Jason Hughes
Jason Johnson
Abby McGuire
Jong Park
Fritz RothAffymetrix
David Lockhart
Eric Gentalen
NCBI
Andrew Neuwald
DOE, DARPA, Lipper, NIST, HMR
HMS Genetics
Andy Link, Doug Selinger
Pete Estep, Michael Ching
Martha Bulyk, Sonali Bose
Martin Steffen
Saeed Tavazoie, Annie Chan
Dereth Phillips, Chris Harbison
UCSD
Bernhard Palsson