Differenalanalysisof ChIPseq *...

25
Differen’al analysis of ChIPseq data Rory Stark Principal Bioinforma3cs Analyst 19 July 2013

Transcript of Differenalanalysisof ChIPseq *...

Page 1: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differen'al  analysis  of  ChIP-­‐seq  data  Rory  Stark  Principal  Bioinforma3cs  Analyst  19  July  2013  

Page 2: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Link  to  zipped  copy  of  working  directory:  

h?p://goo.gl/b4pMP  

Thursday,  25  July  2013  View  <Headers  and  Footers>  to  alter  this  text  

2  

Page 3: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Analysis of ChIP-seq data DIFFERENTIAL BINDING ANALYSIS

–  Occupancy-based analysis

–  Affinity-based analysis

VALIDATION AND DOWNSTREAM ANALYSIS

–  Motif analysis

–  Annotation

–  Integrating binding and expression data

EXPERIMENTAL  DESIGN  –  Controls  and  replicates  

QC/READ  PROCESSING  –  Library  QC  –  Alignment  and  filtering  

– QC  measures  and  assessment  

PEAK  CALLING  –  Peak  callers  

Page 4: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differen'al  Binding  Analysis  

Page 5: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differen'al  binding  analysis:  Observa'ons  –  ChIP-­‐seq  variability  

•  Biological  •  Experimental  •  Technical  

–  Peak  calling  is  noisy  •  Profusion  of  peak  callers  •  Highly  parametric  •  Callers  have  low  agreement  on    

marginal  peaks    

–  Many  samples  involved  •  Condi3ons  and  treatments            

(contrasts)  •  Factors,  marks,  an3bodies  •  Replicates  required  to  capture  variance  

68,000 peaks

24,000 peaks

Page 6: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differen'al  binding  analysis:  Goals  

–  Be  robust  to  noise  •  Noisy  experiments  •  Noisy  peak  calling  

–  Determine  DB  without  requiring  global  binding  maps  for  each  ChIP  

–  Exploit  quan3ta3ve  affinity  (read  scores)  beyond  binary  occupancy  (peak  calls)    

–  Func3onally  link  differen3al  regulatory  events  (DB)  with  differen3al  mRNA  levels  (DE)  

Page 7: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Types  of  differen'al  binding  analysis  

–  Overlap  analysis  (peaks/site  occupancy)  –  Quan3ta3ve  analysis  

•  Binding  site  count  density  (ChipDiff,  DiffBind)  •  Binding  profile  (MMDiff)  •  PCA  of  mul3ple  factors  (dPCA)  

Page 8: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Occupancy      Analysis  

Page 9: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Peak Overlap Analysis

Ross-­‐Innes  et  al,  Genes  and  Development  2010  

Page 10: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Example domain: Tamoxifen resistance in breast cancer cell lines

Page 11: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

6,920  events  

104,051  ER  binding  events  

 

Occupancy Clustering

ReplicateConditionTissue

ZR75:Responsive:2

ZR75:Responsive:1

T47D:Responsive:2

T47D:Responsive:1

BT474:Resistant:2

BT474:Resistant:1

MCF7:Resistant:1

MCF7:Resistant:2

MCF7:Responsive:2

MCF7:Responsive:1

MCF7:Responsive:3

MCF7:Responsive:3

MCF7:Responsive:1

MCF7:Responsive:2

MCF7:Resistant:2

MCF7:Resistant:1

BT474:Resistant:1

BT474:Resistant:2

T47D:Responsive:1

T47D:Responsive:2

ZR75:Responsive:1

ZR75:Responsive:2

0 0.4 0.8Correlation

04

8

Color Keyand Histogram

Count

-0.35 -0.30 -0.25 -0.20 -0.15

-0.6

-0.4

-0.2

0.0

0.2

PCA: Condition

Principal Component #1 [39%]

Prin

cipa

l Com

pone

nt #

2 [2

0%]

Page 12: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Occupancy Analysis

Observa3ons:  •  Overall  loss  of  ER  binding  ac3vity  in  

resistant  cell  lines  •  Using  only  status-­‐unique  sites:  

•  Does  not  cluster  by  status  •  Separable  in  second  principal  

component  

Page 13: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Quan'ta've  Analysis  

Page 14: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Binding affinity matrix

–  Rows:  decide  interval  (binding  site)  “universe”  •  Peak  callers  -­‐>  occupancy/overlaps  

–  High-­‐confidence  sites  (stringent)  –  All  poten3al  sites  (lenient)  

•  Genomic  intervals  –  Promoters  –  Windows  

–  Columns:  count  and  normalize  reads  for  all  samples  in  all  intervals  

•  Duplicate  reads  •  Controls  •  Normaliza3on  

Page 15: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Affinity (count) analysis

15

Counts for all peaks

Counts for

unique peaks only

Page 16: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

•  Histone marks •  H3K4me3 •  H3K36me3 •  H3K9me2 •  H3K9me3 •  H3K27me3

•  Conditions: •  Growing vs.

Senescent •  Treatment:

•  WT vs. treated •  Replicates:

•  1-3 for each mark/condition/treatment

•  “Peaks”: •  Windows

around TSSs (-1000, +4000)

Data  from  Chandra  et  al  Molecular  Cell  2012  

Page 17: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differential binding analysis –  Determine  contrasts  

•  Single-­‐factor  •  Mul'-­‐factor  (GLM/blocking)  

–  Matched  tumour-­‐normal  –  Common  'ssue  –  Replicate  groups  (batch)  

–  Run  RNA-­‐Seq  DE  package    •  edgeR,  DESeq,  etc.  •  Fit  nega've  binomial  distribu'on    •  Exact  test  •  Mul'ple  tes'ng  correc'on  (B&H  FDR)  

Page 18: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

13,901  FDR  <  0.10  

1,277  FDR  <  0.005  

Page 19: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differen'al  binding  analysis:  Occupancy  vs.  Affinity  

All  Sites   Differen'ally  Bound  Sites  

Page 20: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Differential binding signature can be isolated even amongst sites common to all samples

Page 21: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

DiffBind  

Page 22: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

R/Bioconductor  package  -­‐-­‐  DiffBind  

Page 23: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

DiffBind Workflow 4.  DBA

•  Contrasts •  GLMs •  Multi-factor designs (paired, blocking)

•  Normalisation •  Subtract control reads •  Library size: full vs. effective •  e.g. TMM (edgeR)

•  DE Method (edgeR, DESeq) •  data(tamoxifen_analysis)!

5.  Plotting and reporting •  Retrieving DB sites, stats, counts •  MA plots •  Heatmaps (correlation, affinity),

PCA, boxplots

1.  Reading in peaksets •  Sample sheets •  Metadata •  Peaksets from peak callers •  data(tamoxifen_peaks)!

2.  Occupancy analysis •  Overlap venns •  Overlap rate •  Consensus peaksets

3.  Read counting •  BAM/SAM/BED •  Scores (RPKM) •  Filtering •  data(tamoxifen_counts)!

Page 24: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

Acknowledgements  •  CRUK-­‐CI  Bioinforma3cs  Core  

-  Madhew  Eldridge  -  Tom  Carroll    -  Suraj  Menon  

•  Gordon  Brown  (DiffBind)  •  Jason  Carroll  and  his  laboratory  

-  Caryn  Ross-­‐Innes  -  Vasiliki  Therodorou  

Page 25: Differenalanalysisof ChIPseq * datamaster.bioconductor.org/help/course-materials/2013/BioC...6,920events 104,051*ERbinding*events* % Occupancy Clustering Replicate Condition Tissue

NOW  LET’S  GET  OUR  HANDS  DIRTY….  

Rory  Stark    [email protected]  

www.cruk.cam.ac.uk