Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene...

download Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene Regulation Rachel Patton McCord Bulyk Lab Harvard University.

If you can't read please download the document

description

Biological Signal Processing oxygenethanol

Transcript of Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene...

Translating the Cells Instruction Manual A Biophysicists Approach to Understanding Gene Regulation Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08 Knobloch lives? What are characteristics of life? Response to environment Take in nutrients and produce waste Reproduction . Biological Signal Processing oxygenethanol Inputs Nucleus Transcription Factor mRNA protein Outputs Biological Signal Processing Regulation of Gene Expression Transcription Factor (TF) recognizes DNA bases (ACGT) Promotes gene expression: transcription of mRNA (output) RNASequence-Specific TFs RNA Polymerase Organisms Ideal: understand gene regulation in human Problems: Large genome size, diverse cell types, likely complicated gene regulation rules Begin with model system single celled organism Saccharomyces cerevisiae (yeast) A few hundred kb A few hundred bp Goals: Find DNA sequences bound by TFs Predict how TFs function in the cell Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. Mukherjee, Berger, et al., Nature Genetics (2004), 36: Protein Binding Microarray (PBM) Technology TF-DNA Sequence Recognition TF Microarray slide Fluorophore labeled antibody dsDNA Mukherjee, Berger, et al., Nature Genetics (2004), 36: Protein Binding Microarray (PBM) Technology TF-DNA Sequence Recognition Detector Laser (488 nm) Universal Array Design Interested in sequences of 8-10 bases CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG 53 CTATCTACACA TATCTACACAC ATCTACACACA TCTACACACAA 4 10 1,000,000 total 10-mers 24 nt fixed sequence36 nt variable sequence mers per spot Philippakis, Qureshi et al., RECOMB (2007). Berger, Philippakis et al., Nature Biotechnology (2006), 24: 1,000,000 total 10-mers 4 10 / 27 40,000 total spots Universal Array Design Use an idea from cryptography: de Bruijn sequence contains all sequence variants of length k in the shortest sequence possible Anthony Philippakis, Mike Berger AAA AACAAG AAT ACA ACCACG ACT AGA AGCAGG AGT ATA ATCATG ATT CAA CACCAG CAT CCA CCCCCG CCT CGA CGCCGG CGT CTA CTCCTG CTT GAA GACGAG GAT GCA GCCGCG GCT GGA GGCGGG GGT GTA GTCGTG GTT TAA TAC TAG TAT TCA TCC TCG TCT TGA TGCTGG TGT TTA TTCTTG TTT All possible 3-mers Length = 4 3 = 64 bp de Bruijn sequence TCGATTGCGTGACAGGGTAAAACAAGACCCTGACCATGGCAGTGTTCGATTGCGTGACAGGGTAGTCCGGGTTCTTTGCGCTCACTATAC Fixed sequence (24 bp) Test sequence (36 bp) Deriving Binding Strength at each Sequence CCGTCAGCAGT CATGGAAA GCTGGTAGAAGTTCTGGGTCTGTGTTCCGTTGTCCGTGCTG TTATAC CATGGAAA GACAAACGTAGCATGTTGGAGTGTCTGTGTTCCGTTGTCCGTGCTG C CATGGAAA TGTGTCCCTAAGGGTGGTAACAAAATAGTCTGTGTTCCGTTGTCCGTGCTG CACTACGCAAGTGCGGTG CATGGAAA GGGTTCTGGAGTCTGTGTTCCGTTGTCCGTGCTG ATCT CATGGAAA AGACTCATAACGATCAACAGTCGGGTCTGTGTTCCGTTGTCCGTGCTG ACAACAGAGCACCGATGG CATGGAAA CTTGCGTAGAGTCTGTGTTCCGTTGTCCGTGCTG GTGGAGAAAGGGGTCAAA CATGGAAA CGCATCGACAGTCTGTGTTCCGTTGTCCGTGCTG GCCCGGGATCCCATC CATGGAAA ATGTCGCTTACATGTCTGTGTTCCGTTGTCCGTGCTG CAGAAGTGTCCTACGTAACATCCA CATGGAAA GTACGTCTGTGTTCCGTTGTCCGTGCTG GTTGCATACACG CATGGAAA TAACAATCGAACTCCAGTCTGTGTTCCGTTGTCCGTGCTG TCATGTGCTGGGCTTGATTCAG CATGGAAA ACCAGTGTCTGTGTTCCGTTGTCCGTGCTG TATTCTTCTCTT CATGGAAA CAGTAAAAAATCGGACGTCTGTGTTCCGTTGTCCGTGCTG CTATCTACACACAACTATGCGGTCGC CATGGAAA TGGTCTGTGTTCCGTTGTCCGTGCTG CCTGGGGA CATGGAAA AATGAAGTCACCCATGGTGCGTCTGTGTTCCGTTGTCCGTGCTG ATCATCCTTACATTA CATGGAAA TCGTGTGCCAATAGTCTGTGTTCCGTTGTCCGTGCTG AAGGCC CATGGAAA CCACGTCATATTCACAACTAACGTCTGTGTTCCGTTGTCCGTGCTG Example: CATGGAAA Every 8mer is represented 16 times Take median over intensities of all spots containing this 8mer GTCACGTGCACGCGAC GCACGTGCGCACGTGC CACGTGCCGGCACGTG GCACGTGATCACGTGC TCACGTGATCACGTGA ACACGTGATCACGTGT ATCACGTGCACGTGAT CACGTGTATACACGTG CCACGTGATCACGTGG ACACGTGGCCACGTGT CACGTGAGCTCACGTG AGCACGTGCACGTGCT ACACGTGCGCACGTGT CACGTGTCGACACGTG ACCACGTGCACGTGGT CACGTGCGCGCACGTG CACGTGCATGCACGTG AACACGTGCACGTGTT CCACGTGCGCACGTGG CACGTGGCGCCACGTG merRev. Comp.Median Signal Deriving Binding Strength at each Sequence kaka kdkd [TF] + [DNA] [TF-DNA] kaka kdkd Affinity vs. PBM Signal (Cbf1) log (PBM Median Signal) log (K D -1 ) Maerkl and Quake. Science (2007); 315: Goals: Find DNA sequences bound by TFs PBMs Predict how TFs function in the cell Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. Predicting TF Cellular Functions Use known/measurable inputs and outputs: Heat shock Gene Deletion Gene expression mRNA Gene Expression Data 1327 Publicly Available Microarray Datasets Condition 2 Condition 1mRNA Predicting Cellular Functions of Components Basic model/assumptions TF binding near genes causes change in expression Similar TF binding probability + similar expression = active regulation TF1 Gene 2 Gene 1 Gene 3 Gene 4 Gene 5 PBM data Expression data Physically Realistic Binding Probability Simple (and often used) view: GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Cbf1 Gene Promoter region is BOUND: Gene is ON GGCACGTGGCTGCATGAGCGGAGGCTCGCGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCCGTGCGCCCTTTTATGTTGTCAGTGGGTGCAC Gene Promoter region is NOT BOUND: Gene is OFF Physically Realistic Binding Probability Physical reality: Energy landscape of potential TF binding TF occupancy probability = Integration of binding potential across sequence near gene Dictates likelihood of recruiting RNA polymerase and thus level of mRNA transcription GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Cbf1 Gene Physically Realistic Binding Probability Physical reality: Energy landscape of potential binding Sum median intensity data across all possible 8-mers in sequence near gene GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Cbf1 Gene GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Gene Intensity = Intensity = Goals of New Analysis Method Combine binding probability with expression data to predict TF function and condition specific binding site usage Gene expressionPBM data Condition A Condition B Condition C Condition D Target Gene: TF Function Goals of New Analysis Method Consider all data rather than drawing arbitrary cutoffs Low affinity binding as well as minor expression changes may be biologically relevant Tanay, 2006; Foat et al., 2006 Binding probability ? CRACR Combination Rank-order Analysis of Condition-specific Regulation Basics of CRACR Approach TF binding rank: Order genes by expression in condition of interest Assign ranks based on PBM-derived binding probability for TF Most Most induced repressed YGR043C YAR014C YAR029W YGR087C YAR018W YAL003C YAR003W YGR088W YAR044W YER130CYPL054W PBM p-value rank: Most Most induced repressed Basics of Analysis Approach YGR043C YAR014C YAR029W YGR087C YAR018W YAL003C YAR003W YGR088W YAR044W YER130CYPL054W Select: similarly expressed foreground genes background set foreground background Most Most induced repressed Basics of Analysis Approach YGR043C YAR014C YAR029W YGR087C YAR018W YAL003C YAR003W YGR088W YAR044W YER130CYPL054W Slide window along ordered expression Calculate an area statistic for enrichment of PBM targets within each window vs. background PBM p-value rank: B F (B + F) B F = rank sum F = foreground B = background [ [ area = Predicting TF Function Plot area statistic (ranges -0.5 to 0.5) at each window Determine condition significance by permutation test-derived threshold (gray line: p < 0.001) area statistic Glucose added: Mig1 targets repressed induced repressed >