Post on 01-Feb-2016
description
Translating the Cell’s “Instruction Manual”
A Biophysicist’s Approach to Understanding Gene Regulation
Rachel Patton McCordBulyk Lab
Harvard University Biophysics Program3/20/08
“Knobloch lives?” What are characteristics of “life”?
Response to environment Take in nutrients and produce waste Reproduction ….
Biological Signal Processing
oxygen ethanol
Inputs
Nucleus
Transcription Factor
mRNA
protein
Outputs
Biological Signal Processing
Regulation of Gene Expression
Transcription Factor (TF) recognizes DNA bases (ACGT)
Promotes gene expression: transcription of mRNA
(output)
RNA
Sequence-Specific TFs
RNA Polymerase
Organisms
Ideal: understand gene regulation in human Problems: Large genome size, diverse cell types,
likely complicated gene regulation “rules”
Begin with model system single celled organism Saccharomyces cerevisiae (yeast)
A few hundred kb
A few hundred bp
Goals: Find DNA sequences bound by TFs
Predict how TFs function in the cell
Look for biophysical links between TF structure and function
Use quantitative approaches to maintain a physically realistic view of biology.
Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.
Protein Binding Microarray (PBM) Technology
TF-DNA Sequence Recognition
TF
TF
Microarray slide
Fluorophore labeled antibody
dsDNA
Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.
Protein Binding Microarray (PBM) Technology
TF-DNA Sequence Recognition
DetectorLaser
(488 nm)
Universal Array Design
Interested in sequences of 8-10 bases
CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG5’
3’
CTATCTACACA TATCTACACAC ATCTACACACA TCTACACACAA
410 ≈ 1,000,000 total 10-mers
24 nt fixed sequence
36 nt variable sequence
27 10-mers per spot
Philippakis, Qureshi et al., RECOMB (2007).
Berger, Philippakis et al., Nature Biotechnology (2006), 24:1429-1435.
410 ≈ 1,000,000 total 10-mers
410 / 27 ≈ 40,000 total spots
Universal Array Design Use an idea from cryptography:
“de Bruijn” sequence contains all sequence variants of length k in the shortest sequence possible
Anthony Philippakis, Mike Berger
AAA AAC AAG AAT ACA ACC ACG ACTAGA AGC AGG AGTATA ATC ATG ATTCAA CAC CAG CAT CCA CCC CCG CCTCGA CGC CGG CGTCTA CTC CTG CTTGAA GAC GAG GAT GCA GCC GCG GCT GGA GGC GGG GGTGTA GTC GTG GTTTAA TAC TAG TATTCA TCC TCG TCTTGA TGC TGG TGT TTA TTC TTG TTT
All possible 3-mers
Length = 43 = 64 bp
de Bruijn sequence
TC
GA
TT
GC
GT
GA
CA
GG
GT
AA
AA
CA
AG
AC
CC
TG
AC
CA
TG
GC
AG
TG
T
TC
GA
TT
GC
GT
GA
CA
GG
GT
AG
TC
CG
GG
TT
CT
TT
GC
GC
TC
AC
TA
TA
C
Fixed sequence (24 bp)
Test sequence (36 bp)
Deriving Binding Strength at each Sequence
CCGTCAGCAGTCATGGAAAGCTGGTAGAAGTTCTGGGTCTGTGTTCCGTTGTCCGTGCTGTTATACCATGGAAAGACAAACGTAGCATGTTGGAGTGTCTGTGTTCCGTTGTCCGTGCTGCCATGGAAATGTGTCCCTAAGGGTGGTAACAAAATAGTCTGTGTTCCGTTGTCCGTGCTGCACTACGCAAGTGCGGTGCATGGAAAGGGTTCTGGAGTCTGTGTTCCGTTGTCCGTGCTGATCTCATGGAAAAGACTCATAACGATCAACAGTCGGGTCTGTGTTCCGTTGTCCGTGCTGACAACAGAGCACCGATGGCATGGAAACTTGCGTAGAGTCTGTGTTCCGTTGTCCGTGCTGGTGGAGAAAGGGGTCAAACATGGAAACGCATCGACAGTCTGTGTTCCGTTGTCCGTGCTGGCCCGGGATCCCATCCATGGAAAATGTCGCTTACATGTCTGTGTTCCGTTGTCCGTGCTGCAGAAGTGTCCTACGTAACATCCACATGGAAAGTACGTCTGTGTTCCGTTGTCCGTGCTGGTTGCATACACGCATGGAAATAACAATCGAACTCCAGTCTGTGTTCCGTTGTCCGTGCTGTCATGTGCTGGGCTTGATTCAGCATGGAAAACCAGTGTCTGTGTTCCGTTGTCCGTGCTGTATTCTTCTCTTCATGGAAACAGTAAAAAATCGGACGTCTGTGTTCCGTTGTCCGTGCTGCTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTGCCTGGGGACATGGAAAAATGAAGTCACCCATGGTGCGTCTGTGTTCCGTTGTCCGTGCTGATCATCCTTACATTACATGGAAATCGTGTGCCAATAGTCTGTGTTCCGTTGTCCGTGCTGAAGGCCCATGGAAACCACGTCATATTCACAACTAACGTCTGTGTTCCGTTGTCCGTGCTG
Example: CATGGAAA
Every 8mer is represented 16 times Take median over intensities of all spots containing this 8mer
GTCACGTG CACGCGAC 108178GCACGTGC GCACGTGC 95854CACGTGCC GGCACGTG 89203GCACGTGA TCACGTGC 74295TCACGTGA TCACGTGA 69377ACACGTGA TCACGTGT 68733ATCACGTG CACGTGAT 58874CACGTGTA TACACGTG 58656CCACGTGA TCACGTGG 47900ACACGTGG CCACGTGT 47240CACGTGAG CTCACGTG 42887AGCACGTG CACGTGCT 41755ACACGTGC GCACGTGT 36764CACGTGTC GACACGTG 36463ACCACGTG CACGTGGT 36380CACGTGCG CGCACGTG 35515CACGTGCA TGCACGTG 32370AACACGTG CACGTGTT 28948CCACGTGC GCACGTGG 22983CACGTGGC GCCACGTG 19315... ... ...
8-mer Rev. Comp. Median Signal
Deriving Binding Strength at each Sequence
ka kd
[TF] + [DNA] [TF-DNA]ka
kd
Affinity vs. PBM Signal (Cbf1)
log (PBM Median Signal)
log
(KD
-1)
Maerkl and Quake. Science (2007); 315:233-237.
Goals: Find DNA sequences bound by TFs
PBMs
Predict how TFs function in the cell
Look for biophysical links between TF structure and function
Use quantitative approaches to maintain a physically realistic view of biology.
Predicting TF Cellular Functions
Use known/measurable inputs and outputs:
Heat shock
Gene Deletion
Gene expression
mRNA
Gene Expression Data 1327 Publicly Available Microarray Datasets
Condition 2
Condition 1 mRNA
Predicting Cellular Functions of Components
Basic model/assumptions TF binding near genes
causes change in expression
Similar TF binding probability + similar expression = active regulation
TF1
TF1
TF1
TF1Gene 2
Gene 1
Gene 3
Gene 4
Gene 5
PBM data Expression data
Physically Realistic Binding Probability
Simple (and often used) view:
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Cbf1
Gene
Promoter region is BOUND:
Gene is ON
GGCACGTGGCTGCATGAGCGGAGGCTCGCGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCCGTGCGCCCTTTTATGTTGTCAGTGGGTGCAC
Gene
Promoter region is NOT BOUND:
Gene is OFF
Physically Realistic Binding Probability Physical reality:
Energy landscape of potential TF binding
TF occupancy probability = Integration of binding potential across sequence near gene Dictates likelihood of recruiting RNA polymerase and
thus level of mRNA transcription
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Cbf1
Gene
Physically Realistic Binding Probability Physical reality:
Energy landscape of potential binding
Sum median intensity data across all possible 8-mers in sequence near gene
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Cbf1
Gene
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Gene
Intensity = 117651 Intensity = 215352
Goals of New Analysis Method Combine binding probability with expression data
to predict TF function and condition specific binding site usage
Gene expressionPBM dataCondition A
Condition B
Condition C
Condition D
Target Gene:
1
2
3
4
5
6
TF Function
Goals of New Analysis Method
Consider all data rather than drawing arbitrary cutoffs Low affinity binding as well as minor
expression changes may be biologically relevant Tanay, 2006; Foat et al., 2006
Bin
ding
pr
obab
ility
?
CRACR
“Combination Rank-order Analysis of Condition-specific Regulation”
Basics of CRACR Approach
TF binding rank:
2 3 6 9 1 8 5 10 4 7 11
Order genes by expression in condition of interest
Assign ranks based on PBM-derived binding probability for TF
Most Most induced repressedY
GR043C
YAR014C
YAR029W
YGR087C
YAR018W
YAL003C
YAR003W
YGR088W
YAR044W
YER130C
YPL054W
PBM p-value rank:
2 3 6 9 1 8 5 10 4 7 11 Most Most induced repressed
Basics of Analysis Approach
YGR043C
YAR014C
YAR029W
YGR087C
YAR018W
YAL003C
YAR003W
YGR088W
YAR044W
YER130C
YPL054W
Select: similarly expressed foreground genes background set
foreground background
Most Most induced repressed
Basics of Analysis Approach
YGR043C
YAR014C
YAR029W
YGR087C
YAR018W
YAL003C
YAR003W
YGR088W
YAR044W
YER130C
YPL054W
Slide window along ordered expression Calculate an area statistic for enrichment of PBM targets
within each window vs. background
PBM p-value rank:
2 3 6 9 1 8 5 10 4 7 11
1 ρB ρF
(B + F) B F
ρ = rank sum
F = foreground B = background[[
area =
Predicting TF Function
Plot area statistic (ranges -0.5 to 0.5) at each window Determine condition significance by permutation test-derived
threshold (gray line: p < 0.001)
area
sta
tistic
Glucose added: Mig1 targets repressed
induced-----------------repressed >8.0 5.0 3.4 2.3 1.5 0 -1.5 -2.3 -3.4 -5 <-8
Expression
fold change
Glucose
Mig1
mRNA
metabolism enzymemetabolism switch
Predicting TF Function
Determine which individual genes are repressed by Mig1
area
sta
tistic
Glucose added: Mig1 targets repressed
induced-----------------repressed >8.0 5.0 3.4 2.3 1.5 0 -1.5 -2.3 -3.4 -5 <-8
Expression
fold change
Group of genes repressed by Mig1
Mig1YHR005C
Mig1YER130C
Mig1YBL054W
Prediction of General TF Function
Conditions for which there is significant enrichment of PBM targets: Effect
Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced
Expression during the cell cycle (alpha factor arrest and release)(16) induced
Expression during the cell cycle (cdc15 arrest and release)(8) induced
Expression during the cell Cycle (cdc28)(7) induced
Expression in response to 50 nM alpha-factor: 120 min induced
Expression in ckb2 deletion mutant induced
Expression in dig1, dig2 deletion mutant induced
Expression in swi6 (haploid) deletion mutant induced
Expression in tec1 (haploid) deletion mutant induced
Expression in yel044w deletion mutant induced
Expression in sir2 deletion mutant repressed
Expression in snf2 mutant cells in minimal medium repressed
Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed
Selected Mcm1 significant conditions
Find all (of 1327) expression conditions where a TF is predicted to be active
Look for enrichment of general biological functions in this set
Prediction of General TF Function
Conditions for which there is significant enrichment of PBM targets: Effect
Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced
Expression during the cell cycle (alpha factor arrest and release)(16) induced
Expression during the cell cycle (cdc15 arrest and release)(8) induced
Expression during the cell Cycle (cdc28)(7) induced
Expression in response to 50 nM alpha-factor: 120 min induced
Expression in ckb2 deletion mutant induced
Expression in dig1, dig2 deletion mutant induced
Expression in swi6 (haploid) deletion mutant induced
Expression in tec1 (haploid) deletion mutant induced
Expression in yel044w deletion mutant induced
Expression in sir2 deletion mutant repressed
Expression in snf2 mutant cells in minimal medium repressed
Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed
Selected Mcm1 significant conditions
Find all (of 1327) expression conditions where a TF is predicted to be active
Look for enrichment of general biological functions in this set
Prediction of General TF Function
Selected Mcm1 significant conditions
Find all (of 1327) expression conditions where a TF is predicted to be active
Look for enrichment of general biological functions in this set
Prediction: Mcm1 involved in cell cycle and mating
“a” cell“alpha” cell
alpha factor
Prediction of TF function
After PBM experiments, CRACR has been used to predict functions of 90 yeast TFs (paper in process)
Binding Site Affinity EffectsB
indi
ng a
ffin
ity
TFGene 1
TFGene 2
TFGene 3
ka kd
[TF] + [DNA] [TF-DNA]ka
kd
TF concentration low
TF concentration medium
TF concentration high
High affinity
Medium affinity
Low affinity
Demonstrating Effects of Binding site affinity
Low vs. high affinity binding sites may have different biological functions
Expression after oxidative stress vs. Rap1 binding affinity
Highest binding affinity……………Lowest binding affinity
ALD4- Predicted Conditional Target
02468
101214161820
0 20 30
Time after diamide treatment (min)
Oc
cu
pa
nc
y U
nit
s *****
MCR1- Predicted Conditional Target
0123456789
10
0 20 30
Time after diamide treatment (min)O
ccu
pan
cy U
nit
s
* ***
Experimentally Validated
Goals: Find DNA sequences bound by TFs
PBMs
Predict how TFs function in the cell
CRACR
Look for biophysical links between TF structure and function
Use quantitative approaches to maintain a physically realistic view of biology.
Reasons for Different Functions: TF structure?
Goal: Consider biophysical TF structure instead of cartoon “TF blob”
Mig1
cyc8tup1
TF Structure and Function Are certain TFs structurally suited for
certain types of biological processes? Case Study:
Lower Information Content Motif
GAL4 (Zn2Cys6)
CST6 (bZIP)
Regulatory hub; many target genes
Higher Information Content Motif
More specific, fewer target genes
metabolism of specific nutrients
cell fate, cell cycle
Goals: Find DNA sequences bound by TFs
PBMs
Predict how TFs function in the cell
CRACR
Look for biophysical links between TF structure and function
Use quantitative approaches to maintain a physically realistic view of biology.
Future Directions
Completion of functional predictions and study of yeast gene regulation
Toward predictive model in humans Experiments for understanding gene regulation
rules
Acknowledgements
Martha BulykMike Berger
Anthony PhilippakisCong Zhu
Kelsey ByersTrevor Siggers
Vicky ZhouCherelle WallsJason WarnerJaime Chapoy
Other Bulyk Lab Members
NSF graduate research fellowshipNIH/NHGRI R01
GO CATS!!
Advantages and Challenges of Interdisciplinary Work
Insight gained by quantitative reasoning in biology, combining of different perspectives
“Physicists and mathematicians choose projects in biology that are fun, but not necessarily important”
Important not to get caught up in what “counts” as “true biology” or “true physics”