Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE...

52
www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE...

Page 1: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

www.cmmt.ubc.ca

MOTIF ENRICHMENT ANALYSIS IN CO-EXPRESSED GENE SETS AND HIGH-

THROUGHPUT SEQUENCE SETS

Wyeth WassermanJan. 18, 2012

opossum.cisreg.ca/oPOSSUM3

Page 2: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Welcome

• If you encounter any technical difficulties during the webinar– Type a report using the chat option

• Slide presentation ~20 min• Compile Questions as they are submitted

and answer them during the final Q&A/discussion period

• During the discussion session, we’ll allow audience speaking

2

Page 3: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Webinar Format

• Introduction• Walk-Through• Summary• Q&A

3

Page 4: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

INTRODUCTION

4

Page 5: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Overview

• Given co-expressed gene sets, what are the key mediators of co-expression?– Focus on TFs

• Web-based software system for motif enrichment analysis– Co-expressed genes or sequences– Multiple sets of analysis methods– Available for human, mouse, fly, worm, yeast

5

Page 6: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Motif Enrichment Analysis

6

Background Target

0

0.2

0.4

0.6

0.8

1

TFBS1 TFBS2 TFBS3

Pro

po

rtio

n o

f g

en

es

co

nta

inin

g

TF

BS

Background

Target

p=0.04 p=0.55 p=0.66

Finds over-represented TFBS in co-expressed gene sets

Page 7: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

What do we need?

• Region selection– Where to look for enriched binding sites– Use conservation filter to restrict search

space• TFBS profiles to search for

– Need a pool of validated profiles• Scoring metrics for enrichment

– How to measure motif over-representation

7

Page 8: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

GeneCR1 CR2 CR4CR3

Threshold

Genomic Position

phastConsScore

Conserved Region Selection

8

Page 9: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

TFBS Profiles

• JASPAR 2010: Portales-Casamar et al. Nucleic Acids Research 2009.

• Expanded collection of TFBS profiles– 130 vertebrate profiles– 105 insect profiles– 5 nematode profiles– 177 yeast profiles– PBM (104), PBM_HOMEO (176), PBM_BHLH (19)

• Standardized 2-level TF classification (class, family)

9

Page 10: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Scoring Metrics

• Z scores– Based on the number of occurrences of the

TFBS relative to background– Normalized for sequence length– Simple binomial distribution model

• Fisher scores– Fisher exact probability test

• Fisher score = -log(Fisher p-value)– Based on the number of genes containing the

TFBS relative to background

10

Page 11: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Additional Metric for Seq-Based

• KS scores– Kolmogorov-Smirnoff

test– Compares the empirical

distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background

– Expect real binding sites to be centered around the MPC

11

MPC

Foreground

Background

KS score = -log(KS test p-value)

Page 12: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Analysis Methods

12

Page 13: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

WALK-THROUGH

13

Page 14: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

14

http://opossum.cisreg.ca/oPOSSUM3

Page 15: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human SSA - Input

15

Page 16: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

16

Page 17: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

17

Page 18: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human SSA - Results

18

Page 19: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

19

TF HNF1A

JASPAR ID MA0046.1

Class Helix-Turn-Helix

Family Homeo

Tax Group Vertebrates

IC 15.548

GC Content 0.259

Page 20: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

20

Target Gene Hits 19

Target Gene Non-Hits 36

Background Gene Hits 1113

Background Gene Non-Hits 3887

Target TFBS Hits 41

Target TFBS Nucleotide Rate 0.0269

Background TFBS Hits 2127

Background TFBS Nucleotide Rate 0.009

Page 21: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

21

Z-score 15.134

Fisher score 3.646

Page 22: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

22

Page 23: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

oPOSSUM methods

23

Page 24: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

24

Page 25: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human aCSA - Input

25

Page 26: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human aCSA - Input

26

Page 27: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human aCSA - Input

27

Page 28: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human aCSA - Results

28

Page 29: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

29

Page 30: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

30

Page 31: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

TFBS Cluster Analysis

31

TFBS ProfileCluster

Page 32: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

GeneCR1 CR2 CR4CR3

TFBSs

TFBS Cluster Hits

Merge

Overrepresentation Analysisbased on merged TFBS cluster hits

TFBS Cluster Analysis (TCA)

32

Page 33: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human TCA – TFBS cluster selection

33

Page 34: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Human TCA - Results

34

Page 35: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

TFCluster Info Page

35

Page 36: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

36

Page 37: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Seq SSA - Input

37

Page 38: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Seq SSA - Input

38

Page 39: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

39

Page 40: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

40

Page 41: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

41

Page 42: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

42

Page 43: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

43

Page 44: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

44

Page 45: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Seq SSA - Results

45

Page 46: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

46

KS score

Page 47: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

47

Page 48: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Seq TCA - Input

48

Page 49: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

SUMMARY

49

Page 50: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

oPOSSUM-3

• Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments

• Important functionalities– Gene-based vs. Sequence-based– Single site vs. Anchored combination site– Individual vs. clusters of TFBS profiles– Human, mouse, fly, worm and yeast

50

Page 51: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Development Team

51

Version 1 CSA Version 2 Version 3

• Ho Sui, SJ• Mortimer, JR• Arenillas, DJ• Brumm, J• Walsh, CJ• Kennedy, BP• Wasserman,

WW

• Huang, S• Fulton, DL• Arenillas, DJ• Perco, P• Ho Sui, SJ• Mortimer, JR• Wasserman,

WW

• Ho Sui, SJ• Fulton, DL• Arenillas, DJ• Kwon, AT• Wasserman,

WW

• Kwon, AT• Arenillas, DJ• Worsely

Hunt, R• Wasserman,

WW

Page 52: Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

QUESTIONS & ANSWERS

Please take a moment to type questions/comments into the chat box.

The questions will be answered shortly.

52