RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI...

16
RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI and GU
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of RedundancyMiner A novel method of clustering in genomic studies Barry Zeeberg, NCI Hongfang Liu, NCI...

RedundancyMinerA novel method of clustering

in genomic studies

Barry Zeeberg, NCI

Hongfang Liu, NCI and GU

Gene Ontology (GO) AmiGO browserHierarchical organization of categories

and mapped genes

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

High-Throughput GoMiner (HTGM)

QuickTime™ and a decompressor

are needed to see this picture.

Typical HTGM resultclustered image map (CIM)

QuickTime™ and a decompressor

are needed to see this picture.

Redundancy problem• Because of the hierarchical nature of GO structure, parent-child categories may contain

partially redundant gene mappings• This can “inflate” the number of categories in the CIM• Thus obscure the core information content in the CIM• The redundancy itself can be studied to look at fine detail nuanced associations of category

clusters

QuickTime™ and a decompressor

are needed to see this picture.

RedundancyMiner (RM) is an attempt to solve that problem

• Remove the redundancy from the CIM– Redundancy cause the CIM to be inflated by

e.g. 3-fold

• Place the redundancy into a META CIM– Study the redundancy as a nuanced themes of

association of groups of GO categories

RM paradigm

• Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories

• Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value

RM overcomes two problems of traditional hierarchical clustering

• All objects are put into one cluster or another, even if the object truly is an outlier

• Each object can appear in only one cluster, even though it may be related to several clusters

CIM after RM

QuickTime™ and a decompressor

are needed to see this picture.

META CIM

QuickTime™ and a decompressor

are needed to see this picture.

Additional examplegene expression in NCI-60 cell lines

• NCI-60 is set of 60 well-studied cancer cell lines

• Composed of around 5 or 6 each of around 8 or 9 different cancer types

Problem

• Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing

• Solution is to select sub-portion of CIM based on RM analysis

NCI-60 META CIM based on correlation threshold = 0.20

QuickTime™ and a decompressor

are needed to see this picture.

Sub-CIM of highest correlating genes from group 33

QuickTime™ and a decompressor

are needed to see this picture.

Gene expression values are adjusted z-scores

Red = positive z score

Green = negative z score

Sub-CIM of highest correlating genes from group 32

QuickTime™ and a decompressor

are needed to see this picture.

Conclusions

• RM can remove redundancy from the primary CIM

• RM can display the nuanced themes of redundancy structure in the META CIM

• The META CIM can be used as the basis of further investigation