DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND PHYLOGENETIC FOOTPRINTING

62
DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND PHYLOGENETIC FOOTPRINTING PhD presentation Valerie Storms March 29 th , 2011 Promoters Prof. Dr. Ir. Kathleen Marchal Prof. Dr. Ir Bart De Moor

description

DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND PHYLOGENETIC FOOTPRINTING. PhD presentation Valerie Storms March 29 th , 2011 Promoters Prof. Dr. Ir. Kathleen Marchal Prof. Dr. Ir Bart De Moor. Overview. Introduction on transcriptional regulation - PowerPoint PPT Presentation

Transcript of DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND PHYLOGENETIC FOOTPRINTING

Page 1: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

DETECTION OF REGULATORY MOTIFSBASED ON COEXPRESSION AND PHYLOGENETIC FOOTPRINTING

PhD presentation Valerie StormsMarch 29th, 2011

PromotersProf. Dr. Ir. Kathleen Marchal

Prof. Dr. Ir Bart De Moor

Page 2: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Overview

1. Introduction on transcriptional regulation

2. The effect of orthology and coregulation on detecting regulatory motifs

3. PhyloMotifWeb: workflow for motif discovery in eukaryotes

4. De novo motif discovery in vitamin D3 regulated genes

Page 3: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

All living organisms consists of one or more cells• E.g. humans:

– Built of multiple cells like nerve cells, muscle cells, skin cells– Every cell: contains identical genetic information

Genetic information• Stored as DNA (deoxyribose nucleic acid)• Double helix with sugar-phosphate backbone• 4 building blocks = “base”

– A: adenine– C: cytosine– G: guanine– T: thymine / U: uracil

• Complementary base pairing -> hydrogen bounds• Presentation: ACCTGCTAG….ATTGACGGAC

Sugar-Phosphate Backbone

Base pair A-T

Base pair G-C

GCGATCGTAGGTAT

- C- G- C- T- A- G- C- A- T- C- C- A- T- A

Genetic information

Page 4: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Genetic dogma

DNA contains genes = specific sequences of bases that encode instructions on how to make proteins = work units of a cell

DNA Gene

protein

TRANSLATION

mRNA

TRANSCRIPTION TRANSCRIPTIONAL REGULATION

….AAATTTGGTTGTTGTCTCCCAGCTGTTTATTTCTGTAACAGATCTTGGAGGCTGCGGTCTGGATCCCTCGCCAAGAACCAGATCCAGGAGAAAACGTGCTCAACGTGCAGCTCTGCTCCTACTGATTATAGCCCCACAGATGACATCGCTCCATAGTCACACCAAGTCTCCTGTGGGAGTCTTGCTCCTCGTTCTCAGTGTCTGTTACAGCTCGGTATTTTAGTGTCAGGACGTCGGCTCCCAGCCCGCATCTCCGCTCAGCAATGCCATTATCTTCTCAGCCAAGTCCTAGAAATGGGTTGGCTTCCCATTTGCAAAAACATCGCTCCATAGTCACACCAAGTCTCCTGTGGGAGTCTTGCTCCTCGTTCTCAGTGTCTGTTACAGCTCGGTATTTTAGTGTCAGGACGTCGGCTCCCAGCCCGCATCTCCGCTCAGCAATGCCATTATCTTCTCAGCCAAGTCCTAGAAATGGGTTGGCTTCCCATTTGCAAAAACATCGCTCCATAGTCACACCAAGTCTCCTGTGGG….

GEN

EXPR

ESSI

E

DIFFERENTLEVELS

OF REGULATION

Page 5: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Main players in Transcriptional regulation

1. Recruitment of the RNA POLYMERASE COMPLEX to the promoter region of the target gene

DNA TARGET GENE

TSSRNA polymerase

complex

Promoter region

TF

This process can be activated or repressed by:• Transcription Factors (TFs) – activators and repressors

Bind DNA directly by recognizing specific regions• Co-activators and co-repressors

Recruited by protein-protein interactions

Co-activator

Page 6: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Main players in Transcriptional regulation

Eukaryotic cells• Nucleus • Linear DNA molecules organized into chromosomes • Chromatin = complex of DNA and proteins

2. Chromatin structure

InfluencesTranscriptional

Regulation

TF

Heterochromatin Euchromatin

Histones Linear DNA molecule

Page 7: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Main players in Transcriptional regulation

Chromatin remodeling complexDNA TARGET GENE

TSSRNA polymerase

complexTFCo-activator

• TFs bind specific non-coding sequences in the DNA to control the expression of their target genes TF binding sites

• All genes regulated by the same TF contain a similar TF binding site in their promoter region

• REGULATORY MOTIF models the TF-DNA binding specificity and captures the variability of TF binding sites

ATTGCCAT

TF-DNA INTERACTION

TF REGULATORY MOTIF

- Modify chromatin structure:- DNA methylation- Histone modifications like methylation, acetylation

Page 8: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Regulatory motif

TF REGULATORY MOTIF

G T G A C GG T G A C CG A G A C GG T G T C GG T C A G G

Alignment of TF binding sites

Construction of frequency matrix

Motif logo

0.01ACGT

0.01

0.01

0.97

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.010.97

0.97

0.97

0.69

0.97

0.29

p1 p2 p3 …. pn

Page 9: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Computational motif discovery

TFMotif scanning

1. Motif scanning: known motif model

2. De novo motif discovery: search for novel, uncharacterized motifs

Algorithms classified based on the information sources they use:- Coregulation information

- Orthology information

- Co-localization of different TF binding sites

- Chromatin structure

? De novo motif discovery

Different algorithms to predict TF binding sites

Two different computational approaches!

Page 10: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Overview

1. Introduction on transcriptional regulation

2. The effect of orthology and coregulation on detecting regulatory motifs

3. PhyloMotifWeb: workflow for motif discovery in eukaryotes

4. De novo motif discovery in vitamin D3 regulated genes

Page 11: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Different information spaces

Next generation of motif discovery tools integrates orthology with coregulation information

2. Orthologous space

3. Combined coregulation-orthology space

1. Coregulation space

Page 12: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Study

Research goal:– Extent of information in coregulation or orthologous space

– Conditions under which complementing both spaces improves motif detection

Method: – Synthetic and real benchmark datasets

– Select motif detection tools flexible enough to perform in each of the three spaces

- Phylogibbs (Siddharthan et al., 2005)

- Phylogenetic sampler (Newberg et al., 2007)

- MEME (Bailey and Elkan, 1994)

Page 13: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Overview

Phylogibbs Phylogenetic sampler MEME

Simulated annealing + tracking=> global optimum (= MAP solution)

A Gibbs sampler => local optimum=> Ensemble centroid solution

ExpectationMaximization=> local optimum

Short Long (>multiple re-initializations) Short

Phylogenetic relatedness between the orthologous sequences

Tree-based evolutionary model

Alignment of the orthologous sequences needed

No evolutionary model

Unaligned sequences

Page 14: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Assignment and scoring of motif sites

Phylogibbs Phylogenetic samplerUnaligned

Tree-based evolutionary model (F81)

Window principle -> more flexible in case of a bad prealignment

Block principle -> very sensitive to bad prealignments-> leave out phylogenetic distant orthologs

Prealigned

Single independent motif sites

Multiple orthologous motif sites

Page 15: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

REF SPECIESSPECIES 1SPECIES 2SPECIES 3SPECIES 4

Seq 1 Seq 2 … Seq 10

TC…TTT…T

TC…C

2 3

5

Coregulation Orthologous Combined

Performance assessment Construction of Synthetic datasets

Use a phylogenetic tree and an evolutionary model to create the orthologs for different species

1

4

Ancestor speciesSeq 1 Seq 2 … Seq 10

Motif WMs with a different IC Background sequences

Page 16: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Construction of Real datasets

TyrRLexA

Biological datasets:

1. Prokaryotic data -> Gamma-proteobacteria

2. Eukaryotic data -> yeast species

Urs1HRap1

Page 17: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Results (1)

COREGULATION SPACE

Does adding orthologs improve the performance

for the LOW IC motif?

Depends on the degeneracy of the embedded motif

Page 18: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Results (2)

COMBINED SPACE

1. Evolutionary distance between the added orthologs

……

Page 19: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Results (3)

2. Phylogenetic tree

=> Tree based on neutral evolution rate

3. The number of added orthologs and the topology of the tree

=> low impact

4. Noise=> Orthologous direction: performance drop depends on the species

distance and the algorithm characteristics

Page 20: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Results (4)

ORTHOLOGOUS SPACE

Room for improvement!

-Number of added orthologslarger effect than in combined space

-PSAlmost no output when orthologs are prealigned(No centroid solution)

Page 21: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Conclusions

Phylogibbs Phylogenetic sampler MEME

Quality of predicted motifs depends on correctness of prealignments Challenge: accounting for phylogenetic relatedness, independent of a prealignment

Ensemble centroid strategy Useful with low signal/noise Computationally limiting

Phylogenetic tools may perform better than the more basic MEME tool BUT More parameters to tune Performance strongly depends on the prealignment quality, the phylogenetic tree, the relationship between the orthologs etc…

Page 22: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Overview

1. Introduction on transcriptional regulation

2. The effect of orthology and coregulation on detecting regulatory motifs

3. PhyloMotifWeb: workflow for motif discovery in eukaryotes

4. De novo motif discovery in vitamin D3 regulated genes

Page 23: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb

Motif finders with different algorithmic backgroundperformance diversity

Ensemble strategycombine results

of multiple algorithms

Progress of experimental technologies

Epigenetic informationChromatin structure

information

Growing number of sequenced genomesOrthology information

Easy reduction of search space

Create orthologs alignments

Automatic parameter sweep

phylogenetic tree

Ensemble phylogenetic motif finders

Page 24: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb – Ensemble strategy

• Three motif finders: Phylogibbs, Phylogenetic sampler and MEME

• Run each motif finder across multiple parametersettings (e.g. different motif numbers, motif widths etc.)

Large collection of output matrices

• FuzzyClustering algorithm – summarizes all these output matrices into a set of non-redundant

ensemble motifs– Works on the TF binding site level <-> matrix level

Page 25: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb

Motif finders with different algorithmic backgroundperformance diversity

Ensemble strategycombine results

of multiple algorithms

Progress of experimental technologies

Epigenetic informationChromatin structure

information

Growing number of sequenced genomesOrthology information

Easy reduction of search space

Create orthologs alignments

Automatic parameter sweep

phylogenetic tree

Ensemble phylogenetic motif finders

Important for motif discovery in eukaryotes!

Page 26: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb - Eukaryotes

Restrict search space to regions with higher regulatory potential based on epigenetic information like chromatin structure

BUT: Tissue and condition dependent!

Annotation of regulatory regions > Regulatory build pipeline of Ensembl

• Multi-cell type:– DNase hypersensitivity -> open chromatin– CTCF binding sites -> enhancer/insulator marker– Binding sites of other TFs

• Cell-type specific:– Histone modifications

Page 27: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb – Webserver

Page 28: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb – Webserver

Page 29: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb – Webserver

Page 30: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Results page

- Motif logo

- Individual binding sites of the ensemble solution

- p-value for the overrepresentation of the ensemble motif in the sequence set versus random sequence sets

- Comparison with database motifs

Page 31: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Overview

1. Introduction on transcriptional regulation

2. The effect of orthology and coregulation on detecting regulatory motifs

3. PhyloMotifWeb: workflow for motif discovery in eukaryotes

4. De novo motif discovery in vitamin D3 regulated genes

Page 32: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - metabolism

• Source: Diet and produced in skin when exposed to sunlight

• Role in regulating many physiological and cellular processes:

- Bone health

- Prevention of autoimmune diseases

- Anti-proliferative effect on different cell types like cancer cells

Page 33: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - mode of action

VDRVitD3

RXR VDRVitD3

VDRE

RXR VDRVitD3

Chromatinremodeling

complex

Co-activator complex

3. Recruitment of co-activators and chromatin remodelers: open chromatin structure

2. Ligand-activated VDR/RXR binds the DNA at Vitamin D Regulatory elements (VDRE)

RXR VDRVitD3

DRIP Transcription machinery

Target gene

4. Transcription of the VDR target gene

1. Vitamin D3 enters the cell and binds to the vitamin D receptor (VDR), which dimerizes with RXR

Page 34: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - dataset

GOAL: get insight in molecular mechanism underlying anti-proliferative effect of vitD3

- Human and mouse cell lines treated with vitD3 versus no vitD3 (Control)

- Measured the expression of all genes in the human and mouse cells using microarrays for both conditions over different time points

- Select differentially expressed genes (vitD3 versus Control) -> phenotype

- Group per species all genes with similar behavior in coexpression clusters

focus on genes with a conserved co-expression behavior across human and mouse interesting for common anti-proliferative phenotype

VitD3

Human breast cancer cells

Mouse bonecells

Ctr

VERSUS

RXR VDRVitD3

Target geneVDRE

ANTI-PROLIFERATIVE

PHENOTYPE

Page 35: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - Dataset

Conserved coexpression cluster:- 10 genes

- Upregulated after vitD3

Assume: conserved transcriptional regulation

Conserved regulatory motifs responsible for expression behavior

De novo strategy

Screening: Co-localization of TF binding sites

Page 36: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - de novo motifs

METHOD: PhyloMotifWeb

RESULTS:

1. Very common motifs• Low specificity for coexpressed cluster

• Match with TFs involved in cell cycle regulation– Well conserved TF binding sites, present in many genes! – e.g. SP1, ZF5, NRF1

• TF involved in B-cell differentation– EBF

Page 37: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - de novo motifs

2. Motifs specific for the conserved coexpression cluster-> higher overrepresentation in the cluster compared to the genome

-> match with following TFs:

ZEB1 - Transcriptional activator of VDR protein- Role in cancer metastasis

VDR - Putative direct regulation by VDR - VDRE hard to discover de novo: only one conserved half-site!

•Two conserved half sites with variable spacer

•Diverse configurations [DR, IR, ER]

•Located far up-/down-stream TSS

NHR-scan: specific for nuclear hormone receptor binding sites

C1 C2 C1 C2

Page 38: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 – Cis-regulatory modules

TF2

TF2

TF2

TF1

TF1

TF1

Higher eukaryotes:

-> TFs act in cooperation to modulate gene expression

-> Find co-localized binding sites for de novo predicted motifs => CRMs

Page 39: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 – Cis-regulatory modules

METHOD: CPModule

INPUT: • De novo predicted motif models• Constraint: module size ranging between 150bp and 400bp

RESULTS: • 3 CRMs highly specific for the coexpressed genes (p-value < 0.001):

• Each CRM contains the EBF motif -> degenerated -> many hits -> using a motif-specific score threshold

• Most interesting is the ZEB1-VDR module

SP1-EBF 7 genes

NRF1-EBF 7 genes

VDR-ZEB1-EBF 10 genes

Page 40: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - perspectives

• Motifs predicted for the conserved coexpression cluster -> investigate their presence for larger species-specific clusters or maybe for the full genome

• The availability of cell-type specific epigenetic information can help to retrieve the functional binding sites

• Besides a transcriptome analysis -> integrate extra omics data like ChIP-seq and protein profiling to reconstruct the regulatory network of vitD3

Page 41: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Acknowledgements

ESAT-Bioi• Prof. Dr. Bart De Moor• Prof. Dr. Yves Moreau• Wouter Van Delm

LEGENDO• Dr. Lieve Verlinden• Prof. Dr. Mieke Verstuyf• Dr. Guy Eelen• Els Vanoirbeek

CMPG-Bioi• Prof. Dr. Kathleen Marchal• Dr. Pieter Monsieurs• Marleen Claeys• Carolina Fierro• Aminael Sanchez• Hong Sun

CMPG• Prof. Dr. Jan Michiels

Page 42: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Extra slides

Page 43: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Phylogibbs Algorithm (1)

Procedure:1. start with a random configuration C, based on prior information on the number of motif sites/TFs

2. construct the set of all possible configurations C’ that differ in one single move from C (designed moveset)

3. calculate for each C’ the posterior probability score

4. sample a new configuration from this score distribution

This procedure is repeated for two phases : 1. Simulated annealing: iterating to configuration C* with the highest posterior probability

(=MAP) (temperature parameter β)2. Tracking: posterior probabilities are assigned to the windows in C*

-> One initialization is sufficient-> Very short running time (minutes/hours)

Page 44: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Phylogibbs Algorithm (2)

3. Calculate the posterior probability score: P(C|S)

Bayes’ Theorem:

P(C|S) ~ P(S|C) = probability that the motif sites of C are drawn from the motif WM and that the background sequence is drawn from the background model EVOLUTIONARY MODEL

The motif WM = unknown!! -> integral over all possible WMs :

with prior P(WM) modeled by Dirichlet prior distribution Dir(γ)

The approximation to solve this integral requires that the tree topologies are reduced to collections of star topologies

Page 45: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Phylogenetic sampler Algorithm (1)

Procedure:

1. start with a random positioning of blocks (based on prior information on the expected number of motif sites/TFs and max number of motif sites per sequence)

2. update the motif model based on the current blocks (<-> PG)

3. scoring: leave out the blocks for one sequence (<-> PG)and calculate for each possible block the conditional probability score

4. first sample the number of motif sites for the sequence, then sample this number of blocks from the score distribution (3)

This iteration procedure is repeated for:1. Burn-in phase: to converge to local optimum2. Sampling phase: keep track of all sampled blocks to construct the centroid afterwards

-> multiple initializations (seeds) recommended to avoid getting trapped in local maximum -> long running time (hours/days)

Page 46: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Phylogenetic sampler Algorithm (2)

2. Update the motif model

-> Sample a new motif model from a Dirichlet distribution Dir(β+c) adjusted with phylogenetically weighted counts (based on phylogenetic tree)

-> Accept the new motif with a probability proportional to the Metropolis Hastings ratio

3. Calculate the conditional probability score

The conditional probability

=> proportional to the probability that the block is drawn from the motif model (inferred) divided by the probability that the block is drawn from the background model EVOLUTION MODEL

The Felsenstein tree-likelihood algorithm is used to handle all tree topologies (<->PG)

Page 47: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Solution

Figure from Newberg et al., 2007

Phylogibbs Phylogibbs Maximum a posteriori (MAP) solutionMaximum a posteriori (MAP) solution

-> set of motif sites (configuration) with the highest posterior probability

Phylogenetic sampler Phylogenetic sampler Centroid solutionCentroid solution

-> report all those motif sites that appear in at least half the sampling iterations-> keeps track of all motif sites sampled during sampling iterations to calculate posterior probabilities-> does not take into account joint occurrences of the motif sites

Page 48: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Evolutionary model

Adapted Felsenstein (F81) model

-> Describes the substitution process at the nucleotide level-> Assumes that all positions evolve independently and at equal rates (u)-> Probability that a is mutated to b is dependent on the time (t)-> Fixation of b is dependent on its frequency in the motif WM

Phylogibbs proximity = q = exp(-ut) = probability that no substitution took place per site

Phylogenetic sampler branch length = b = ut AND a different normalization for their branch lengths (k)

Convert proximities to branch lengths::: b=-3/4ln(q)

Page 49: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING
Page 50: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Introduction

Page 51: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Main players in Transcriptional regulation

Prokaryotic cells (bacteria): • No nucleus, circular ‘naked’ DNA molecule

Nucleus Chromosome

Chromatin

Histone proteins

NucleosomeDNA

Chromatin function: – Storage of long DNA molecules into nucleus– Role in Transcriptional regulation: euchromatin

and heterochromatin

Eukaryotic cells: • Linear DNA molecules organized into chromosomes • Chromatin > complex of DNA and proteins (Histones)

Page 52: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Main players in Transcriptional regulation

2. Chromatin structure (eukaryotes)

Chromatin remodeling complexDNA TARGET GENE

TSSRNA polymerase

complex

Promoter region

TFCo-activator

Page 53: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Input format

SPACE Phylogibbs Phylogenetic sampler

MEME

COREGULATION: Non-coding regions for a set of coregulated genes from one species

Unaligned Unaligned

ORTHOLOGOUS: Non-coding regions for a set of orthologous genes from multiple species

Prealigned orthologs -PG => Dialign

-PS => ClustalW

Phylogenetic treeCOMBINED: Combination of both

Page 54: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Theoretical comparison Assignment and scoring of motif sites

Phylogibbs Phylogenetic samplerUnaligned

Tree-based evolutionary model (F81)

Window principle -> more flexible in case of a bad prealignment

Block principle -> very sensitive to bad prealignments-> leave out phylogenetic distant orthologs

Prealigned

Single independent motif sites

Multiple orthologous motif sites

Page 55: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Performance assessment Results (3)

2. Phylogenetic tree

=> Tree based on neutral evolution rate

3. The number of added orthologs and the topology of the tree

4. Noise=> Orthologous direction: performance drop depends on the species

distance and the algorithm characteristics

Phylogibbs ↓Phylogenetic sampler ↓-Weighting scheme-Block principle

Spec 3Spec M

Page 56: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb - webserver

PHYLO-MOTIF-WEB

STEP 1Select the non-coding regions

ENSEMBL CORE

STEP 3Motif discovery by using an ensemble

strategy

MEME Phylogibbs

Phylogenetic sampler

ENSEMBLCOMPARA AND REGULATORY

BUILD

STEP 2Additional information sources

STEP 4Post-processing of the predicted

ensemble motif matrices

TRANSFAC and JASPAR

UCSC GENOMEBROWSER

MotifComparison

Clover

Multi-species alignments

DNA features like chromatin structure

Mask repeats

External Database External Software

Page 57: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb - Webserver

Page 58: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - de novo motifs

RESULTS:

1. Very common motifs-> low overrepresentation in the cluster compared to the genome-> match with following TFs:

SP1 - Involved in vitD3 response –> regulation of genes without VDRE binding site- Regulator of TFs involved in cell cycle regulation

MEME

ZF5 - TF particularly abundant in differentiated tissues with low proliferation- Growth suppressive activity

MEMEPG

NRF1 - Involved in cell proliferation MEMEPG

EBF - B-cell differentation PS

SP1, ZF5 and NRF1 are cell cycle regulators -> well conserved binding sites, present in many genes!

Page 59: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

PhyloMotifWeb – Ensemble strategy

• Three motif finders: Phylogibbs, Phylogenetic sampler and MEME

• Run each motif finder across multiple parametersettings (e.g. different motif numbers, motif widths etc.) Large collection of output matrices

• FuzzyClustering algorithm -> summarizes all these output matrices into a set of non-redundant ensemble motifs

- Works on TF binding site level -> fine tuning sensitivity/specificity- Integration of TF binding site scores assigned by the original motif

finder- Trace back the different motif finders that contributed to the final

solution

Page 60: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - de novo motifs

METHOD: PhyloMotifWeb- 4000 bp centered around TSS

Restrict to regions with regulatory potential

- Use evolutionary conservation information

human-mouse pairwise alignment

six species alignment

- Use Phylogibbs, Phylogenetic sampler and MEME => Ensemble solution

- Predicted ensemble motifs were compared to database motifs from TRANSFAC and JASPAR to retrieve TFs potentially involved in the coexpression behavior

Page 61: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

Vitamin D3 - dataset

GOAL: get insight in molecular mechanism underlying anti-proliferative effect of vitD3

- Human and mouse cell lines treated with vitD3 versus no vitD3 (Control)

- Measured the expression of all genes in the human and mouse cells using microarrays for both conditions over different time points

- Select differentially expressed genes (vitD3 versus Control) -> phenotype- Group per species all genes with similar behavior in coexpression clusters

Focus on similarity between human and mouse cells as interesting for COMMON antiproliferative phenotype

VitD3

Human breast cancer cells

Mouse bonecells

Ctr

VERSUS

RXR VDRVitD3

Target geneVDRE

ANTI-PROLIFERATIVE

PHENOTYPE

Page 62: DETECTION OF REGULATORY MOTIFS BASED ON COEXPRESSION AND  PHYLOGENETIC FOOTPRINTING

General perspectives

Integration of multiple information sources to improve de novo motif discovery

• Orthology information– Ortholog alignments, evolutionary models– Evolution in how algorithms exploit this information source

• New information sources like epigenetic information become available – How to exploit this new information?– More knowledge on which chromatin modifications co-locate

with transcriptionally active regions like promoters, enhancers or TF binding sites will improve usability