Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19,...

64
Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot

Transcript of Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19,...

Page 1: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Defining Gene Clusters:24 Ways of Looking at Mount Fuji

Anne Bergeron, UQAMDublin, September 19, 2005

7. Mt Fuji from the Foot

Page 2: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Defining Gene Clusters:24 Ways of Looking at Mount Fuji

Anne Bergeron, UQAMDublin, September 19, 2005

"It struck me that it would be good to take one thing in life and regard it from many viewpoints, ... " Roger Zelazny

Page 3: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The basic problem

Genome A

Genome B

Genome C

We start with a set of genomes, labeled by gene names, domains, or synteny blocks,and a similarity relation on those labels.

Highlighting a gene means selecting all labels that are similar.

Genes, or other types of signals, can appear in multiple copies in a genome, or even be missing. In this talk, the similarity relation is "given" and is anequivalence relation.

Page 4: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

The basic problemWe are interested in what happens when a set of genes is highlighted.

A set of genes : { }

Boring...

Page 5: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

The basic problem

Another set of genes: { }

Interesting ?Measures of surprise are studied by Durand, Haque, Hoberman, Sankoff, Raghupathy, etc.

Page 6: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The basic problem

Goal : Given a (big) set of genomes, automatically identify all potentially interesting sets of genes.

Page 7: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

1. Mount Fuji from Owari

Towards formal models

Page 8: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models

What do labels stand for?

How many labels and genomes do we want to compare ?

What do we want to do with the resulting clusters ?

Page 9: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models: Example 1

From: Eichler and Sankoff, Science (301:793-797), 2003

Definition of labels and similarity:Large homology segments disrupted only by local micro-rearrangements.

A total of 281 synteny blocks,colored in the human genomeby their mouse chromosome number.

Interesting features:

Chromosome XChromosome 17Chromosome 20

Application:

Genome evolution

Page 10: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models: Example 2

Definition of labels and similarity:Gene annotations of chloroplasts.

Trachelium

Campanula

Adenophora

Symphandra

Walhenbergia

Merceria

Interesting features:

Rearrangements

Application:

Phylogeny

Page 11: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models: Example 3

From: Pasek et al, Genome Research (15:867-874), 2005

Definition of labels and similarity:PFAM Domain numbers labeling fourbacterial genomes.

Interesting features:

DuplicationsInsertionsRearrangements

Application:

Operon identification

Page 12: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models: Example 4

From: Pasek et al, Genome Research (15:867-874), 2005

Definition of labels and similarity:PFAM Domain numbers labeling fourbacterial genomes.

Application:

Identification of orthologsand/or duplicate segments.

With such an high E-value,the potential duplicate wouldhave been missed by a comparisonbased on sequence similarity.

Page 13: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Towards formal models: Example 5

Definition of labels and similarity:Large homology segments disrupted only by local micro-rearrangements.

Comparing 16 segments of the mouseand rat chromosome X.

Application:

Reconstructing ancestors

From: Bérard et al, WABI 2005

Mouse = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Rat = -4 -3 -2 1 -13 -15 14 -16 8 9 10 -11 12 5 6 7

Page 14: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

2. Mt Fuji from a Teahouse at Yoshida

Down to earth details

Page 15: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Down to earth details

Do we allow gaps ?

Do we allow rearrangements?

Do we allow duplicates and missing genes ?

Do we allow multiple genomes orself-comparison ?

How about "extensions" ?

Page 16: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

Down to earth details : Model 1

No gaps, no duplications, any rearrangement.

Page 17: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

No gaps, no duplications, any rearrangement.

What about this gene? Should we add it ?

Down to earth details : Model 1

Page 18: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

No gaps, no duplications, any rearrangement.

What about this gene? Should we add it ?

Down to earth details : Model 1Extension

Page 19: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

No gaps, duplications, any rearrangement.

Genes not in the set

Down to earth details : Model 2

Page 20: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

Gaps, no duplications, any rearrangement.

Down to earth details : Model 3

Page 21: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

Gaps, missing/inserted genes, any rearrangement.

Down to earth details : Model 4

Page 22: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A set of genes: { }

Gaps, missing genes, duplications, any rearrangement.

With gap size = 1, we get 4 occurrences.

Reducing the number of genes....

Down to earth details : Model 5

Page 23: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Genome A

Genome B

Genome C

A smaller set of genes: { }

... yields 5 occurrences.

Down to earth details : Model 5

Page 24: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

24. Mount Fuji in a Summer Storm

A general framework

Page 25: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

A general framework

Given a gap g, an occurrence of S is a maximal run of genes of S, separated by gaps of at most g genes not in S,and that contains at least one of each gene of S.

A set S of genes: { }

A set of genes S is an extension of a set T, included in S, if each occurrence of T is contained in an occurrence of S.

S = { } is an extension of T= { }

> g > g > g≤ g

Occurrence #1 Occurrence #2

A chromosome:

Page 26: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

A general framework

Given a gap g, an occurrence of S is a maximal run of genes of S, separated by gaps of at most g genes not in S,and that contains at least one of each gene of S.

A set S of genes: { }

A set of genes S is an extension of a set T, included in S, if each occurrence of T is contained in an occurrence of S.

S = { } is an extension of T= { }

> g > g > g≤ g

Occurrence #1 Occurrence #2

A chromosome:

Page 27: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

• g = 0 or g > 0

ChoicesWhen g = 0, the number of candidates is polynomial in the number of genes.

When g > 0, the number ofcandidates can be exponentialin the number of genes.

A general framework

Even with g = 1, there are problems. For example, with g = 0, the sequence of genes:

a b c d e fproduces one potential cluster that contains both a and f. But with g = 1, there are 8 of them:

a b c d e fa b c d fa b c e fa b d e fa c d e fa c e f a b d fa c d f

The number of these sequences grows in a Fibonacci progression!

Page 28: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

• g = 0 or g > 0

Choices

• Duplications or no duplications Duplications usually meansan exponential number of candidates but, most of the time,are unavoidable.

Models without duplications are,nevertheless, useful in many situations.

A general framework

Page 29: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

• g = 0 or g > 0

Choices

• Duplications or no duplications

• Three ways of filtering candidates

Filtering is mostly based on the properties of the extension relation.

If the number of candidates is low, filtering is not necessary,but it can be relevant.

For models with a huge numberof candidates, filtering is a must.

A general framework

Page 30: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

• g = 0 or g > 0

Choices

• Duplications or no duplications

• Three ways of filtering candidates

• Formal or heuristic Formal models have inherentcomputational problems whenapplied to real data.

Heuristics will always be useful.

A general framework

Page 31: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

• g = 0 or g > 0

Choices

• Duplications or no duplications

• Three ways of filtering candidates

• Formal or heuristic

A general framework

2 x 2 x 3 x 2 = 24How convenient!

Page 32: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

20. Mount Fuji from Inume Pass

*Voluntary simplicity is a lifestyle considered by its adherents to be a sustainable, ecologically sensitive alternative to the typical, western consumerist lifestyle. [Ref. Wikipedia]

Common intervals: Voluntary simplicity*

Page 33: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Common intervals: Voluntary simplicity*

*Voluntary simplicity is a lifestyle considered by its adherents to be a sustainable, ecologically sensitive alternative to the typical, western consumerist lifestyle. [Ref. Wikipedia]

A (partial) list of credits:Uno and Yagiura (2000)Heber and Stoye (2001)Bergeron, Heber and Stoye (2002)Didier (2003)Schmidt and Stoye (2004)Figeac and Varré (2004)Bérard, Bergeron and Chauve (2004)Blin, Chauve and Fertin(2005)Landau, Parida and Weizman (2005)Tannier and Sagot (2005)Bérard, Bergeron, Chauve and Paul (2005)Bergeron, Chauve, de Montgolfier and Raffinot (2005)

Page 34: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Common intervals

• g = 0

Choices

• No duplications

• No filtering

• Formal

Genome A

Genome B

Genome C

The basic model of common intervals oftenyields a large number of 'uninteresting clusters'.However, filtering provides unusual informationon whole genome organization.

Page 35: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Common intervals -> Strong Intervals

• g = 0

Choices

• No duplications

• Filtering

• Formal

Genome A

Genome B

Common intervals

stuv

Both t and u are two different extensions of the common interval s: Remove them.

Strong intervalss

v

Page 36: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Strong Intervals

From: Bérard et al, WABI 2005

Mouse = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Rat = -4 -3 -2 1 -13 -15 14 -16 8 9 10 -11 12 5 6 7

This tree displays the strongintervals between the synteny blocks of the mouse and rat chromosomes X.

This kind of tree is known as a PQ-tree. Strong intervals possess a rich combinatorial structure that can be exploited both from the biological and computation perspective.

Page 37: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

13 15 14 16 8 9 10 11 12 5 6 7

4 3 2 1

13 15 14 16

8 9 10 11 12 5 6 715 14

15 14 8 9 10 121 5 6 74 3 2 1113 16

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

This tree provides guidelines to possible rearrangementscenarios that transform the rat chromosome into a mouse chromosome. These scenarios preserve all common intervals.

Page 38: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

13 15 14 16 8 9 10 11 12 5 6 7

4 3 2 1

13 15 14 16

8 9 10 11 12 5 6 715 14

15 14 8 9 10 121 5 6 74 3 2 1113 16

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Intervals are first labeled (in red) with respect to their relative orientation.

Page 39: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16 8 9 10 11 12 5 6 7

4 3 2 1

13 15 14 16

8 9 10 11 12 5 6 715 14

15 14 8 9 10 121 5 6 74 3 2 1113 16

Strong Intervals : transforming a rat into a mouse

Intervals are first labeled (in red) with respect to their relative orientation.

Page 40: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

4 3 2 1

4 3 2 1

13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16

8 9 10 11 12 5 6 715 14

15 14 8 9 10 12 5 6 71113 161

4 3 2 1

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 1

Page 41: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

4 3 2 1 13 15 14 16 8 9 10 11 12 5 6 7

1

4 3 2 1

4 3 2

13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16

8 9 10 11 12 5 6 715 14

15 14 8 9 10 12 5 6 71113 164

1 2 3 4

1 2 3

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 4 3 2 1

Page 42: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

4

1 2 3 4

1 2 3

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16 8 9 10 11 12 5 6 7

13 15 14 16

15 14

15 1413 16

8 9 10 11 12 5 6 7

8 9 10 12 5 6 71113

13 15 14 16

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 13

Page 43: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

13

13 15 14 16

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

4

1 2 3 4

1 2 3

15 14

15 14 16

8 9 10 11 12 5 6 7

8 9 10 12 5 6 711

15 14

13 15 14 16

14

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 14

Page 44: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

15 14

13 15 14 16

14

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

134

1 2 3 4

1 2 3 15 16

8 9 10 11 12 5 6 7

8 9 10 12 5 6 71116

13 15 14 16

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 16

Page 45: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

16

13 15 14 16

13 15 14 16 8 9 10 11 12 5 6 7

1 2 3 4 13 15 14 16 8 9 10 11 12 5 6 7

134

1 2 3 4

1 2 3

15 14

1415

8 9 10 11 12 5 6 7

8 9 10 12 5 6 711

14 15

1514

13 14 15 16

13 14 15 16 8 9 10 11 12 5 6 7

1 2 3 4 13 14 15 16 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 14 15

Page 46: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

16

13 15 14 16

13

15 14

1415

14 15

1514

13 14 15 16

13 14 15 16 8 9 10 11 12 5 6 7

1 2 3 4 13 14 15 16 8 9 10 11 12 5 6 7

4

1 2 3 4

1 2 3

8 9 10 11 12 5 6 7

8 9 10 12 5 6 711

14 15

1514

13 14 15 16

1613

15 14

1415

16 15 14 13

1316

16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 13 14 15 16

Page 47: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

14 15

1514

13 14 15 16

1613

15 14

1415

16 15 14 13

1316

16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

4

1 2 3 4

1 2 3

8 9 10 11 12

8 9 10 1211

5 6 7

5 6 711

8 9 10 11 12

16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 11

Page 48: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

16 15 14 13 8 9 10 11 12 5 6 7

1 2 3 4 16 15 14 13 8 9 10 11 12 5 6 7

14 15

1514

13 14 15 16

1613

15 14

1415

16 15 14 13

13164

1 2 3 4

1 2 3 11

8 9 10 11 12

8 9 10 12

5 6 7

5 6 79

12 11 10 9 8

12 11 10 8

16 15 14 13 12 11 10 9 8 5 6 7

1 2 3 4 16 15 14 13 12 11 10 9 8 5 6 7

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 8 9 10 11 12

Page 49: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

9

12 11 10 9 8

12 11 10 8

16 15 14 13 12 11 10 9 8 5 6 7

1 2 3 4 16 15 14 13 12 11 10 9 8 5 6 7

14 15

1514

13 14 15 16

1613

15 14

1415

16 15 14 13

13164

1 2 3 4

1 2 3

5 6 7

5 6 7

7 6 5

7 6 5

16 15 14 13 12 11 10 9 8 7 6 5

1 2 3 4 16 15 14 13 12 11 10 9 8 7 6 5

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 5 6 7

Page 50: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

1 2 3 4 16 15 14 13 12 11 10 9 8 7 6 5

7 6 5

7 6 5

16 15 14 13 12 11 10 9 8 7 6 5

9

12 11 10 9 8

12 11 10 8

14 15

1514

13 14 15 16

1613

15 14

1415

16 15 14 13

13164

1 2 3 4

1 2 3

5 6 7

14 15 16

5 6 7 8 9 10 11 12 13 14 15 16

12

8 9 10 11 12

9 10 11 13

14 15

76

13 14 15 16

85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 5 6 7 ... 14 15 16

Page 51: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

1 2 3 4 16 15 14 13 12 11 10 9 8 7 6 5

4

1 2 3 4

1 2 3

5 6 7

14 15 16

5 6 7 8 9 10 11 12 13 14 15 16

12

8 9 10 11 12

9 10 11 13

14 15

76

13 14 15 16

85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Strong Intervals : transforming a rat into a mouse

Then all strong intervals that disagree with their parent are inverted : 5 6 7 ... 14 15 16

Page 52: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

18. Mt Fuji from the Offing in Kanagawa

Domain Teams: The 'eXtreme' model

Page 53: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

A (partial) list of credits:Bergeron, Corteel and Raffinot (2002)Luc, Risler, Bergeron and Raffinot (2003)He and Goldwasser (2004)Béal, Bergeron, Corteel and Raffinot (2004)Pasek, Bergeron, Risler, Louis, Ollivier and Raffinot (2005)Blin, Chauve and Fertin (2005)

Domain Teams: The 'eXtreme' model

Page 54: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Domain Teams

• g > 0

Choices

• Duplications

• Heavy filtering

• Formal

Genome A

Genome B

Remove them all!

has an extension. has an extension.

has an extension. has an extension.

Surviving teams:

Page 55: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Domain Teams : Example

67591 Domains 50078 Proteins 16 ChromosomesMaximum gap: 3 16713 Domain Teams

Page 56: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Domain Teams : Example

From: Pasek et al, Genome Research (15:867-874), 2005

Page 57: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The combinatorial beauty of nature

12. Mt Fuji from Lake Kawaguchiç

Page 58: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The combinatorial beauty of nature

Does nature allow all possiblerearrangements ?

Page 59: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Six domains can theoretically form 63 potential teams.If they are labelled as {a, b, c, d, e, f}, the possible teamswith more than one member are:{a, b}, {a, c}, {a, d}, {a, e}, {a, f}, {b, c}...{a, b, c}, {a, b, d}, {a, b, e}, ......{a, b, c, d, e, f}

For 6 domains, of the 63 possibilities, we found 35 teams that had at least two occurrences and no extension.q

The combinatorial beauty of nature

Promiscuous domains

Who are they?PF00005 ABC transporterPF00072 Response regulator receiver domainPF00486 Transcriptional regulatory proteinPF00512 His Kinase A PF00528 Binding-protein-dependent transport system inner membranePF00672 HAMP domain

Page 60: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The need for heuristics

21. Mount Fuji from the Totomi Mountains

Page 61: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The need for heuristics

• g > 0

Choices

• Duplications

• No filtering

• Heuristic

From: St-Onge, et al. Poster RECOMB CG 2005

Very reasonable approximationsof the general model can be obtainedefficiently -- a few minutes -- in the case of very large scale comparisons.

Page 62: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

The need for heuristics

An uncertainty principle

With the general model of gene clusters, it is impossible to predict simultaneously the computing time AND the properties of the output.

Page 63: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Marie-Pierre Béal, Informatique, Marne-la-ValléeSèverine Bérard, INRA, ToulouseMathieu Blanchette, McGill UniversitySylvie Corteel, PRiSM, VersaillesSteffen Heber, Raleig, USAHokusai Katsushika: 1760-1849Nicolas Luc,Génome et informatique, EvryFabien de Montgolfier, LIAFA, ParisChristophe Paul, LIRMM, MontpellierSophie Pasek, Génome et informatique, EvryJean-Loup Risler, Génome et informatique, EvryMathieu Raffinot, Laboratoire Poncelet, MoscouJens Stoye, Technische Facultat, Bielefeld

Credits

Cedric ChauveAnnie ChateauOlivier GingrasYannick GingrasAndré LevasseurJacqueline RwirangiraKarine St-Onge

Page 64: Defining Gene Clusters: 24 Ways of Looking at Mount Fuji Anne Bergeron, UQAM Dublin, September 19, 2005 7. Mt Fuji from the Foot.

Marie-Pierre Béal, Informatique, Marne-la-ValléeSèverine Bérard, INRA, ToulouseMathieu Blanchette, McGill UniversitySylvie Corteel, PRiSM, VersaillesSteffen Heber, Raleig, USAHokusai Katsushika: 1760-1849Nicolas Luc,Génome et informatique, EvryFabien de Montgolfier, LIAFA, ParisChristophe Paul, LIRMM, MontpellierSophie Pasek, Génome et informatique, EvryJean-Loup Risler, Génome et informatique, EvryMathieu Raffinot, Laboratoire Poncelet, MoscouJens Stoye, Technische Facultat, Bielefeld

Credits

Cedric ChauveAnnie ChateauOlivier GingrasYannick GingrasAndré LevasseurJacqueline RwirangiraKarine St-Onge