Thanks to the Lipper Center for Computational Genetics

38
Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, CHI Macroresults through Microarrays 3 George Church 1-May-02 Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation.

description

Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation. CHI Macroresults through Microarrays 3. George Church 1-May-02. Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, - PowerPoint PPT Presentation

Transcript of Thanks to the Lipper Center for Computational Genetics

Page 1: Thanks to the Lipper Center for Computational Genetics

Thanks to the Lipper Center for Computational Genetics

Government and private grant agencies: NHLBI,

NSF, ONR, DOE, DARPA, HHMI, Armenise

Corporate collaborators & sponsors:

Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran

CHI Macroresults through Microarrays 3

George Church 1-May-02

Array quantitation for modeling mutations affecting RNA, protein interactions & cell

proliferation.

Page 2: Thanks to the Lipper Center for Computational Genetics

gggatttagctcagttgggagagcgccagactgaa gatttg gaggtcctgtgttcgatccacagaattcgcacca

Post- 300 genomes &

3D structures

Page 3: Thanks to the Lipper Center for Computational Genetics

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

Microbes Cancer & stem cells DarwinianIn vitro replicationSmall multicellular organisms

RNAiInsertionsSNPs

Page 4: Thanks to the Lipper Center for Computational Genetics

Functional Genomics Challenges • Systems dynamics and optimality modeling.• Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes.• Multiple RNAs & regulatory proteins per gene.• Many causative genes & haplotypes per disease.

• Polony RNA exon-typing• Multiplex in situ RNA & protein analyses • Automated differentiation• Homologous recombination genome engineering

Page 5: Thanks to the Lipper Center for Computational Genetics

Human Red Blood CellODE model200 measured parameters

GLCe GLCi

G6P

F6P

FDP

GA3P

DHAP

1,3 DPG

2,3 DPG

3PG

2PG

PEP

PYR

LACi LACe

GL6P GO6P RU5PR5P

X5P

GA3P

S7P

F6P

E4P

GA3P F6P

NADPNADPH

NADPNADPH

ADPATP

ADPATP

ADP ATPNADHNAD

ADPATP

NADHNAD

K+

Na+

ADP

ATPADP

ATP

2 GSH GSSGNADPH NADP

ADO

INO

AMP

IMPADOe

INOe

ADE

ADEeHYPX

PRPP

PRPP

R1P R5PATP

AMPATP

ADP

Cl-

pH

HCO3-

Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286.(http://atlas.med.harvard.edu/gmc/rbc.html)

Page 6: Thanks to the Lipper Center for Computational Genetics

Modeling suboptimality:

Segre, Edwards, Vitkup

Page 7: Thanks to the Lipper Center for Computational Genetics

0 20 40 60 80 100 120 140 160 180 200

0

20

40

60

80

100

120

140

160

180

200

12

3

4 56

7

8

9

10

11121314

15

16

1718

Sauer wild type

LP w

tSauer data and FBA fluxes comparison

Wild type, C 0.4-limited CC=0.97

Cal

cult

ed F

lux

Calculated & Observed Fluxes in wt

Observed Fluxes in wt

Page 8: Thanks to the Lipper Center for Computational Genetics

Replication rate of a whole-genome set of mutants

Badarinarayana, et al. (2001) Nature Biotech.19: 1060

Page 9: Thanks to the Lipper Center for Computational Genetics

Replication rate challenge met: multiple homologous domains

 

1 2 3

1 2 3

thrA

metL

1.1 6.7

1.8 1.8

1 2lysC

10.4

 

  

probes

Selective disadvantage in minimal media

Page 10: Thanks to the Lipper Center for Computational Genetics

Multiple mutations per gene

Correlation between two selection experiments

Badarinarayana, et al. (2001) Nature Biotech.19: 1060

Page 11: Thanks to the Lipper Center for Computational Genetics

Comparison of selection data with Flux Balance Optimization predictions on 488 genes

predictions number of genes

negatively selected

not negatively selected

essential 143 80 63

reduced growth rate

46 24 22

non essential

299 119 180

P-value Chi Square = 0.004

>

<

Novelduplicates?

Positioneffects, toxin

accumulation, non-opt?

Page 12: Thanks to the Lipper Center for Computational Genetics

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

Page 13: Thanks to the Lipper Center for Computational Genetics

RNA quantitation issues

Small fold changes in RNA are important. Example: 1.5-fold in trisomies.

Cross-hybridizing RNAs. Alternative RNAs, gene families.

Mixed tissues.In situ hybridization has low multiplex.

Page 14: Thanks to the Lipper Center for Computational Genetics

Gene Expression database Aach, Rindone, Church, (2000) Genome Research 10: 431-445.

• Microarrays1

• Affymetrix2

• Lynx-MPSS3, SAGE4

experiment

control • R/G ratios

• R, G values

• quality indicators

ORF

ORF

PMMM

• Averaged PM-MM

• “presence”

• feature statistics

• 25-mers

• Counts of 14-mers sequence tags for each ORF

1 DeRisi, et.al., Science 278:680-686 (1997)2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000)4 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)

agactagcag

Page 15: Thanks to the Lipper Center for Computational Genetics

RNA Cluster Analyses: Cell Cycle

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Replication & DNA synthesis (2)

s.d

. fr

om

mean

MCB SCB

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3005

101520253035

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

CLUSTERCLUSTER

Nu

mb

er o

f O

RF

s

05

1015

2025

3035

Distance from ATG (b.p.)

Nu

mb

er o

f si

tes

02468

1012141618

Distance from ATG (b.p.)

Nu

mb

er o

f si

tes

Nu

mb

er o

f O

RF

s

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

MIPS Functional category (total ORFs) ORFs withinfunctional category

(k)

P-value-Log10

DNA synthesis and replication (82)Cell cycle control and mitosis (312)Recombination and DNA repair (84)Nuclear organization (720)

23301140

16854

N = 186

Tavazoie, et al. 1999 Nature Genetics 22:281.

Page 16: Thanks to the Lipper Center for Computational Genetics

(homeobox gene Crx-/-)

Livesey, Furukawa, Steffen, Church, Cepko (2000) Current Biol. 10:301.

sp

Combining mouse knockouts with

RNA array analysis

Page 17: Thanks to the Lipper Center for Computational Genetics

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

Page 18: Thanks to the Lipper Center for Computational Genetics

ds-DNA ds-DNA arrayarray

HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen ChooMRC: Yen Choo

Combinatorial arrays for binding constantsHuman/Mouse EGR1

Page 19: Thanks to the Lipper Center for Computational Genetics

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Phage

pVIIIpVIII

pIIIpIII

Antibodies

Combinatorial arrays for binding constants

Page 20: Thanks to the Lipper Center for Computational Genetics

PhycoerythrinPhycoerythrin- 2º IgG- 2º IgG

Combinatorial DNA-binding

protein domains

ds-DNA ds-DNA arrayarray

Martha Bulyk et alMartha Bulyk et al

Phage

Combinatorial arrays for binding constants

Page 21: Thanks to the Lipper Center for Computational Genetics

Isalan et al., Biochemistry (‘98) 37:12026-12033

Interactions of Adjacent Basepairs in EGR1 Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA RecognitionZinc Finger DNA Recognition

Page 22: Thanks to the Lipper Center for Computational Genetics

high [DNA](+) ctrl sequence

for wt binding

alignment oligos

etc.

Wildtype EGR1 MicroarrayWildtype EGR1 Microarray

Page 23: Thanks to the Lipper Center for Computational Genetics

WildtypeWildtypeRSDHLTTRSDHLTT

RGPDLARRGPDLARREDVLIRREDVLIR

LRHNLETLRHNLET

TGG 2.8 nM

GCG 16 nM

2.5 nM

TAT 5.7 nM

AAA,AAT,ACT,AGA,AGC,AGT,CAT,CCT,CGA,CTT,TTC,TTT

AAT 240 nM

KASNLVSKASNLVS

Motifs weight all 64 Kaapp

Page 24: Thanks to the Lipper Center for Computational Genetics

DNA RNA Protein: in vivo & in vitro interactions

Metabolites

Replication rate

Environment

Biosystems Measures & Models

microbescancer & stem cellsIn vitro replicationsmall multicellular organisms

RNAiInsertionsSNPs

Page 25: Thanks to the Lipper Center for Computational Genetics

Common diseases: billions of “new” allelesplus a millions of balanced polymorphisms

• 60 new mutations per generation * 5,000 generations since major bottleneck(s) which set up the linkage patterns (=300,000 per genome)

• Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T, 600,000 of each SNP on earth (spread over the common haplotypes).The population frequency will be <0.01%. (Aach et al, 2001 Nature 409: 856)

• Functional genomics (FG) may provide better leads for therapies & diagnostics. (Accuracy goal 1 ppb?)

Page 26: Thanks to the Lipper Center for Computational Genetics

Projected costs affect our view of what is possible.

In 1985, the dawn of the genome project, $10 per bp, would have been $30B per genome.In 2002, Perlegen or Lynx: $3M (103 bits/$, 4 logs)

In 2001, the cost of video data collection? 1013 bits/$

Genotyping & functional genomics demand will probably be as high as permitted by costs.

Page 27: Thanks to the Lipper Center for Computational Genetics

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring Human genome haplotypingRNA splicing & editingimmune B&T cell receptor spectra

& How?

Page 28: Thanks to the Lipper Center for Computational Genetics

A’

A’A’

A’

A’

A’

B

BB

B

BB

A

Single Molecule From Library

B

BA’

A’

1st Round of PCR

Primer is Extendedby Polymerase

B

A’

BA’

Primer A has 5’ immobilizing (Acrydite) modification.

Page 29: Thanks to the Lipper Center for Computational Genetics
Page 30: Thanks to the Lipper Center for Computational Genetics

1. Remove 1 strand of DNA.2. Hybridize Universal Primer.3. Add Red (Cy3) dTTP.

B B’

3’ 5’

AGT..

T

4. Wash; Scan Red Channel

B B’

3’ 5’

GCG..

Sequence polonies by sequential,fluorescent single-base extensions

Page 31: Thanks to the Lipper Center for Computational Genetics

5. Add Green (FITC) dCTP

6. Wash; Scan Green Channel

B B’

3’ 5’

AGT.

TC

B B’

3’ 5’

GCG..

C

Sequence polonies by sequential, fluorescent single-base extensions

Page 32: Thanks to the Lipper Center for Computational Genetics

Polony Template

3’ P’

P5’ A ATA CAA TTCACACAGGAAACAGCTATGA CATT CTATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’

FITC ( C ) CY3 ( T )

Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43

Primer Extension 26 cycles, 34 Nucleotides

Page 33: Thanks to the Lipper Center for Computational Genetics

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring •Human genome haplotypingRNA splicing & editingimmune B&T cell receptor spectra

& How?

Page 34: Thanks to the Lipper Center for Computational Genetics
Page 35: Thanks to the Lipper Center for Computational Genetics

Femtoliter (10-15) scale & low-cost scannersPolymerase DNA colonies (polonies)Fluorescent in situ sequencing (FISSEQ)

Why lower-cost, high quality “sequencing”?

Mitra & Church Nucleic Acids Res. 27: e34

Environmental, food, & biodiversity monitoring Human genome haplotyping•RNA splicing & editingimmune B&T cell receptor spectra

& How?

Page 36: Thanks to the Lipper Center for Computational Genetics

RNA Exon typing

•Single molecules of RNA dispersed.

•Multiplex polonies spanning all likely variable exons

•Sequential probing of each exon.

Page 37: Thanks to the Lipper Center for Computational Genetics

Functional Genomics Challenges • Systems dynamics and optimality modeling.• Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes.• Multiple RNAs & regulatory proteins per gene.• Many causative genes & haplotypes per disease.

• Polony RNA exon-typing• Multiplex in situ RNA & protein analyses • Automated differentiation• Homologous recombination genome engineering

Page 38: Thanks to the Lipper Center for Computational Genetics

For more information:

arep.med.harvard.edu