MSPC: Joint analysis of ChIP-seq replicates

32
POLITECNICO DI MILANO Department of Electronics, Information and Bioengineering July 20, 2015 Using combined evidence from replicates to evaluate ChIP-seq peaks Vahid Jalili Vahid Jalili ([email protected] ) Matteo Matteucci ([email protected] ) Marco Masseroli ([email protected] ) Marco Morelli ([email protected] ) Website: https://mspc.codeplex.com

Transcript of MSPC: Joint analysis of ChIP-seq replicates

Page 1: MSPC: Joint analysis of ChIP-seq replicates

POLITECNICO

DI MILANO

Department of Electronics,

Information and Bioengineering

July 20, 2015

Using combined evidence from replicates to evaluate ChIP-seq peaks

Vahid Jalili

Vahid Jalili ([email protected])

Matteo Matteucci ([email protected])

Marco Masseroli ([email protected])

Marco Morelli ([email protected])

Website: https://mspc.codeplex.com

Page 2: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 2

MotivationTag c

ount

Genomic DNA

Signal Background

ChIP-seq sample

True Positive False Positive

False Negative True Negative

Stringent

Threshold

Permissive

Threshold

Stringent

Threshold

Permissive

Threshold

Page 3: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 3

Motivation

Benefit from ReplicatesUtilize replicates to discriminate between

sub-threshold binding from truly none-bounding regions

Tag c

ount

Genomic DNA

Signal Background

Replicate 1

Replicate 2

Tag c

ount

Page 4: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 4

Motivation

Benefit from Replicates

Page 5: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 5

Method

Notations

𝒯𝑠

𝒯𝑤

Strong threshold

Weak threshold

𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯𝑠

Strong Peak

Weak Peak

𝒯𝑠 < 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝒯𝑤

Page 6: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 6

Method

Combining Evidences

𝑋2𝑘2 follows a 𝜒2 distribution with 2𝑘 degrees of freedom.

Alternatives for combining test statistics :

Liptak’s method (Liptak, 1958)

Mudholkar and George (Mudholkar & George, 1979)

Wilkinson’s method (Wilkinson, 1951)

Truncated product method (Zaykin D. , Zhivotovsky, Westfall, & Weir, 2002)

How to combine evidences ?

Fisher’s combined probability test

𝑋2𝑘2 = −2

𝑖=1

𝑘

ln 𝑝𝑖

𝐶𝑜𝑛𝑓𝑖𝑟𝑚, 𝑋2𝑘

2 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

𝐷𝑖𝑠𝑐𝑎𝑟𝑑, 𝑋2𝑘2 < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

Page 7: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 7

Method

Combining Evidences

Replicate 1

Replicate 2

Replicate 3

Which evidences to combine ?

Replicate 4

Page 8: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 8

Method

Combining Evidences

Replicate 1

Replicate 2

Replicate 3

Which evidences to combine ?

Replicate 4

Page 9: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 9

Method

Combining Evidences

Replicate 1

Replicate 2

Replicate 3

Which evidences to combine ?

Replicate 4

Page 10: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 10

Method

Combining Evidences

Replicate 1

Replicate 2

Replicate 3

Which evidences to combine ?

Replicate 4

Page 11: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 11

Method

Intersection DeterminationThe Challenge …

an optimal method for finding the intersections

Sorted Lists

Naïve method

Hashing Based

Interval Trees

𝑶 𝒎 𝒏

𝑶 𝒏𝒎

𝑶𝒏 𝒍𝒐𝒈𝟐𝒘

𝒘+𝒎𝒓

𝑶 𝒏 log𝟐 𝒏

S o m e Po s s i b l e M e t h o d s

• 𝑛 average peaks count on a sample

• 𝑚 sample count

M e t h o d ’ s C o m p l ex i t y

• 𝑤 number of bits in a machine-word

• 𝑟 intersection size

Page 12: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 12

Method

Intersection DeterminationInterval Trees

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[ 16 , 21 ]

Data

[ 8 , 9 ]

Data

[ 25 , 30 ]

Data

[ 17 , 19 ]

Data

[ 26 , 27 ]

Data

[ 19 , 20 ]

Data

[ 15 , 23 ]

Data

[ 5 , 8 ]

Data

[ 6 , 10 ]

Data

[ 0 , 3 ]

Data

Page 13: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 13

Method

Algorithm

Page 14: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 14

Method

Algorithm

Page 15: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 15

Method

Algorithm

Page 16: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 16

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3

R 1 (weak peak)

R 4 (strong region)

R 3 (weak peak)

Algorithm … an example

R 2 (weak peak)

R 1 (weak peak)

Page 17: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 17

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

R 3 (weak peak)

Algorithm … an example

R 2 (weak peak)

Determine intersecting regions across all samples

R 1 (weak peak)

R 2 (weak peak) R 3 (weak peak)

Page 18: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 18

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 1 (weak peak)

R 2 (weak peak) R 3 (weak peak)

If multiple regions determined intersecting on a

sample, choose the strongest one

R 3 (weak peak)

Determine intersecting regions across all samples

Page 19: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 19

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 1 (weak peak)

R 2 (weak peak) R 3 (weak peak)

If multiple regions determined intersecting on a

sample, choose the strongest one

Determine intersecting regions across all samples

Combine test statistics using Fisher’s method

Page 20: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 20

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 1 (weak peak)

R 2 (weak peak) R 3 (weak peak)

If multiple regions determined intersecting on a

sample, choose the strongest one

Determine intersecting regions across all samples

Combine test statistics using Fisher’s method

𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? NO !

Page 21: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 21

Method

Algorithm

██ Confirmed Peaks Set

██ Discarded Peaks Set

Algorithm … an example

R 1

I n t e r m e d i a t e S e t s

R e p l i c a t e 1 R e p l i c a t e 2 R e p l i c a t e 3

R 2

Page 22: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 22

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 1 (weak peak)

R 2 (weak peak) R 3 (weak peak)

Determine intersecting regions across all samples

R 2 (weak peak)

Since R2 intersects only with R1, and R1-R2 test is

already performed, no further process will be taken

Page 23: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 23

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 1 (weak peak)

R 3 (weak peak)

Determine intersecting regions across all samples

R 2 (weak peak) R 3 (weak peak)

R 4 (strong region)

R 1 (weak peak)

Combine test statistics using Fisher’s method

𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES !

Page 24: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 24

Method

AlgorithmAlgorithm … an example

██ Confirmed Peaks Set

██ Discarded Peaks Set

R 1

I n t e r m e d i a t e S e t s

R e p l i c a t e 1 R e p l i c a t e 2 R e p l i c a t e 3

R 2

R 3 R 4

Page 25: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 25

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 3 (weak peak)R 2 (weak peak)

R 1 (weak peak)

R 4 (strong region)

Determine intersecting regions across all samples

Combine test statistics using Fisher’s method

𝑋2 ≥ 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ? YES !

R 3 (weak peak)

Page 26: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 26

Method

AlgorithmAlgorithm … an example

██ Confirmed Peaks Set

██ Discarded Peaks Set

I n t e r m e d i a t e S e t s

R e p l i c a t e 1 R e p l i c a t e 2 R e p l i c a t e 3

R 2

R 3 R 4

R 1

Page 27: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 27

Method

AlgorithmAlgorithm … an example

I n t e r m e d i a t e S e t s

R e p l i c a t e 1 R e p l i c a t e 2 R e p l i c a t e 3

R 2

R 3 R 4

R 1

R 1

██ Confirmed Peaks Set

██ Discarded Peaks Set

██ Output Set

O u t p u t S e t s

Page 28: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 28

Method

Algorithm

Replicate 1

Replicate 2

Replicate 3 R 4 (strong region)

Algorithm … an example

R 3 (weak peak)R 2 (weak peak)

R 1 (weak peak)

R 2 (weak peak)

R 1 (weak peak)

R 3 (weak peak)

R 4 (strong region)

Page 29: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 29

Results

Myc2_1

0e

+0

02

e+

04

4e

+0

46

e+

04

8e

+04

1e+

05

Myc2_2Myc3_1

05

00

010

00

015

00

02

000

02

50

00

30

00

0

Myc3_2

Myc2_1

0e+

00

2e+

04

4e+

04

6e+

04

8e+

04

1e

+0

5

Myc2_2 Myc3_1 Myc3_2

Abbreviation File name

Myc2_1 wgEncodeSydhTfbsK562CmycIggrabAlnRep1

Myc2_2 wgEncodeSydhTfbsK562CmycIggrabAlnRep2

Myc3_1 wgEncodeSydhTfbsK562CmycStdAlnRep1

Myc3_2 wgEncodeSydhTfbsK562CmycStdAlnRep2

Category Abbreviation Color Implication

Input (source BED file) In██ Strong

██ Weak

Analysis Results Re

██ Strong Confirmed

██ Weak Confirmed

██ Weak Discarded

S e t 1 S e t 2 S e t 3

In Re In Re In Re In Re In Re In Re In Re In Re

Page 30: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 31

Results

Motif was enriched in the sequence defined by peaks

Motif was NOT enriched in the sequence defined by peaks

Presence of Ebox

Page 31: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 32

Implementation

Performance

0

5

10

15

20

25

30

35

40

45

50

0 5 10 15 20 25 30 35 40 45

Tim

e (

seco

nds)

Peaks Count

x 10000

Running Time

2-Replicates 4-Replicates 6-Replicates

Demo

Page 32: MSPC: Joint analysis of ChIP-seq replicates

P o l i t e c n i c o d i M i l a n oV a h i d J a l i l i / 35July 20, 2015 33

Questions

Q u e s t i o n sare welcome at: https://mspc.codeplex.com/discussions