Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol....

25
Bi-correlation clustering algorithm for determining a set of co-regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol....

Page 1: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm for determining a set of co-

regulated genes

BIOINFORMATICSvol. 25 no.21 2009

Anindya Bhattacharya and Rajat K. De

Page 2: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Outline

Introduction Bi-correlation clustering algorithm

(BCCA) Results Conclusion

Page 3: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Introduction

Biclustering Performs simultaneous grouping on genes and

conditions of a dataset to determine subgroups of genes that exhibit similar behavior over a subset of experimental condition.

A new correlation-based biclustering algorithm called bi-correlation clustering algorithm (BCCA) Produce a diverse set of biclusters of co-regulated

genes All the genes in a bicluster have a similar change

of expression pattern over the subset of samples.

Page 4: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Introduction

Cluster analysis Most cluster analysis try to find group of

genes that remains co-expressed through all experimental conditions.

In reality , genes tends to be co-regulated and thus co-expressed under only a few experimental conditions.

Page 5: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

Notation A set of n genes Each gene has m expression values For each gene gi there is an m-

dimensional vector , there is the j-th expression value of gi.

A set of m microarry experiments (measurements)

n genes will have to be grouped into K overlapping biclusters

}g,...,g,{gX n21

}e,...,e,{eY m21

},...,,{ 21 KCCC

ix ijx

Page 6: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

Bicluster: A bicluster can be defined as a subset of

genes possesing a similar behavior over a subset of experiments

Represented as A bicluster contains a subset of

genes and a subset of experiments where each gene in is correlated with a correlation valued greater than or equal to specified threshold , with all other genes in over the measurements in .

kC kI

kJ

),( kkk JIC )( XII kk

)( YJJ kk kI

kI

kJ)(

Page 7: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

BCCA Use person correlation coefficient for

measuring similarity between expression patterns of two genes and .

ig jg

m

l

m

lijjliil

m

lijjliil

ji

xxxx

xxxx

1 1

22

1

)()(

))((),(Corr xx

Page 8: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

Step 1: The set of bicluster S is initialized to

NULL and number of bicluster Bicount is initialized to 0

Step 2A BCCA generate a bicluster (C) for each

pair of genes in a dataset under a set of conditions

For each pair of genes .BCCA creates a bicluster , where and .

)(, jigg ji

),( JIC }{ ji ,ggI YJ

Page 9: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

In step 2C: For a pair of genes in C, if then a

sample is detected from C, deletion of which caused maximum increase in correlation value between and .

If being a threshold, the sample is deleted from . otherwise, C is discarded.

Deletion of a measurement for which genes differ in expression value the most will result in the highest increase in correlation value.

BCCA deletes one measurement at a time from .

),(Corr ji xx

3,' rrJm

ig

jg

J

J

Page 10: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

In step 2D(a): Other genes from , which satisfy

the definition of a bicluster are included in C for its augmentation.

In step 2D(b): Whether present bicluster C has been

found. If it is so then we do not to include C, otherwise, C is considered as a new bicluster.

IX

Page 11: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

Page 12: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Bi-correlation clustering algorithm

Page 13: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Datasets We demonstrate the affectiveness of

BCCA in determining a set of co-regulated genes (i.e. the genes having common transcription factors) and functionally enriched clusters (and atributes) on five dataset

Page 14: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Variation with respect to threshold Plot of YCCD dataset :

Average number of functionally enriched attributes (computed using P-values) versus correlation threshold value

Page 15: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Follow a guideline on this value from a previous study by Allocco et al. (2004) which has concluded that if two genes have a correlation between their expression profiles >0.84 then therre is >50% chance of being bounded by a common transcription factor.

Page 16: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

By locating common transcription factors At first, we only consider those biclusters

that have less than or equal to 50 genes. Use a software TOUCAN 2 (Aerts et al., 2005)

for performance comparison by extracting information on the number of transcription factors present in proximal promoters of all the genes in a single bicluster.

Presence of common transcription factors in the promoter regions of a set of genes is a good evidence toward co-regulation.

Page 17: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Page 18: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Page 19: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Sequences of all the five genes found in a bicluster generated by BCCA from SPTD dataset.

Any transcription factor may be found present in more than one location in upstream region.

Page 20: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Functional enrichment : P-value

The functional enrichment of each GO category in each of the bicluster

employed the software Funcassociate (Berriz et al., 2003).

P-value represents the probability of observing the number of genes from a specific GO functional category within each cluster.

A low P-value indicates that the genes belonging to the enriched functional categories are biologically significant in the corresponding clusters.

Page 21: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

P-value of a functional category Suppose we have total population of N genes ,

in which M has a particular annotation. If we observe x genes with that annotation, in

a sample of n genes, then we can calculate the probability of that observation.

The probability of seeing x or more genes with an annotation, out of n, given that M in the population of N have that annotation

n

N

xn

MN

x

M

P

n

xj

n

N

jn

MN

j

M

valueP

Page 22: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results Only functional categories with

are reported. Analysis of the 10 biclusters obtained for the

YCCD, the highly enriched category in bicluster Bicluster1 is the ‘ribosome’ with P-value of

7100.5 P

17102.4

Page 23: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Page 24: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Results

Page 25: Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De.

Conclusion

BCCA is able to find a group of genes that show similar pattern of variation in their expression profiles over a subset of measurements.

Better than other biclustering algorithm: Find higher number of common

transcription factors of a set of gene in a bicluster

More functionally enriched