Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules...

47
Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam Systems and Modeling Research Unit University of Liège 14 December 2012

Transcript of Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules...

Page 1: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Robustness and accuracy of functional modules in integrated network analysis

Gunnar W. KlauLife Sciences GroupCWI Amsterdam

Systems and Modeling Research UnitUniversity of Liège

14 December 2012

Page 2: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

LYMPHOMA

Lymph cancer• ≈ 1.000 diagnoses/day• ≈ 20% Hodgkin’s,≈ 80% non-Hodgkin’s

Lymphatic system• filters fluids around cells and

tissue• transports lymphocytes

Page 3: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

LYMPHOMA

Lymph cancer• ≈ 1.000 diagnoses/day• ≈ 20% Hodgkin’s,≈ 80% non-Hodgkin’s

Lymphatic system• filters fluids around cells and

tissue• transports lymphocytes

Question• focus on two non-Hodgkin’s

subtypes (ABC and GCB)• what makes the difference?

Page 4: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

INTEGRATIVE APPROACH

Gene expression“Lympho-chip”, 3583 tumor-related genes (ABC: 82, GCB: 112)

Page 5: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

INTEGRATIVE APPROACH

Gene expression“Lympho-chip”, 3583 tumor-related genes (ABC: 82, GCB: 112)

been shown to constitutively activate NF-nB signaling pathways(15), the list of modulated genes was compared with knowntranscriptional targets of NF-nB7 (Table 1). Ptgs2/Cox-2 also seemedto be one of the most strongly up-regulated NF-nB target genes.This finding of Ptgs2/Cox-2 in both cancer and NF-nB gene lists wasof particular interest because COX-2 protein is highly up-regulatedin various forms of cancers (29) but also induced upon HCMV-infection (20).To validate the microarray expression data, a few genes with

high fold changes were analyzed by means of qPCR. As observedin the microarray experiment, Ptgs2/Cox-2 as well as other highlydifferentially expressed genes (Mef2c , Cxcl12 , and Tgfb2) showed asimilar degree of up or down-regulation upon expression of US28 inNIH-3T3 cells (Supplementary Table S3). Because of the previouslyreported oncogenic potential of US28 in vivo , we also determinedexpression levels of these genes in RNA extracted from fiveindependent US28-induced tumors derived from our xenograftmodel (14). Expression of US28 was confirmed in all mouse tumorsby qPCR and Ptgs2/Cox-2 mRNA was highly up-regulated in US28-induced tumors, highlighting a potential important role for COX-2during tumorigenesis.US28 constitutive activity up-regulates COX-2 expression.

COX-2 is highly up-regulated in a variety of cancers and is knownto drive expression of cyclin D1 and VEGF (30). Because COX-2 isalso up-regulated in HCMV-infected cells (20) and expression ofUS28 results in induction of cyclin D1 and VEGF expression (14),we decided to further focus on COX-2 and examine its role in US28-induced proliferative signaling and tumor formation. US28-WT–

expressing cells, but not cells expressing the G-protein uncoupledmutant US28-R129A, have been shown to present a transformedphenotype in vitro (14). NIH-3T3 cells expressing US28-R129Ashowed comparable receptor expression levels to US28-WT–expressing cells as measured by [125I]-CX3CL1 binding (Fig. 1A)but did not show increases in inositol phosphate accumulation(Fig. 1B). Analysis of COX-2 mRNA expression by qPCR showed a19.7- F 1.8-fold increase in US28-WT–expressing cells comparedwith mock-transfected cells (Fig. 2A). Cells expressing US28-R129Arevealed no significant difference (1.6- F 0.6-fold) in COX-2 mRNAlevels compared with mock-transfected cells (Fig. 2A). Similarly,US28-WT–transfected NIH-3T3 cells showed a marked increasein COX-2 protein expression compared with mock-transfected andUS28-R129A–expressing cells (Fig. 2B).US28 induces COX-2 and VEGF transcription via activation

of NF-KB. To understand the molecular mechanisms resultingin the up-regulation of COX-2, signaling studies with a COX-2promoter reporter (25) were performed in HEK 293T cells. US28induced the human COX-2 promoter activation in a dose-dependent manner, but no increase in COX-2 promoter activitywas observed in US28-R129A–expressing HEK 293T (Fig. 3A).Because the transcription of the COX-2 gene is under the controlof NF-nB (31), we investigated the contribution of NF-nB in theCOX-2 promoter reporter gene. US28-WT, but not US28-R129A,constitutively activated the NF-nB transcription factor in trans-fected HEK 293T cells (Fig. 3A). Moreover, inhibition of NF-nBactivation with the InB phosphorylation inhibitor Bay 11-7082(5 Amol/L) resulted in a severe reduction of US28-induced COX-2

Figure 1. Characterization and microarray analysis ofUS28-expressing NIH-3T3 cells. A, independent stableclonal NIH-3T3 cell lines transfected with US28-WTand the G-protein uncoupled mutant US28-R129A bind[125I]-CX3CL1. Unlabeled CX3CL1 displacesspecifically bound [125I]-CX3CL1. B, US28-WTconstitutively induces the formation of inositolphosphate (InsP ), whereas US28-R129A does not.C, affymetrix Mouse Genome Array data from US28and mock-transfected cells were analyzed with theLIMMA software using a false discovery rate V0.02.The 35 most up-regulated (up ) and down-regulated(down ) probe sets are represented with a heatmap.Colors indicate log2 intensity values of normalizedprobe sets.

Role of COX-2 in US28-Induced Tumor Formation

www.aacrjournals.org 2863 Cancer Res 2009; 69: (7). April 1, 2009

American Association for Cancer Research Copyright © 2009 on April 17, 2012cancerres.aacrjournals.orgDownloaded from

Published OnlineFirst March 24, 2009; DOI:10.1158/0008-5472.CAN-08-2487

Page 6: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

INTEGRATIVE APPROACH

Gene expression“Lympho-chip”, 3583 tumor-related genes (ABC: 82, GCB: 112)

PPI dataintersection of HPRD

and Lympho-chip

+

been shown to constitutively activate NF-nB signaling pathways(15), the list of modulated genes was compared with knowntranscriptional targets of NF-nB7 (Table 1). Ptgs2/Cox-2 also seemedto be one of the most strongly up-regulated NF-nB target genes.This finding of Ptgs2/Cox-2 in both cancer and NF-nB gene lists wasof particular interest because COX-2 protein is highly up-regulatedin various forms of cancers (29) but also induced upon HCMV-infection (20).To validate the microarray expression data, a few genes with

high fold changes were analyzed by means of qPCR. As observedin the microarray experiment, Ptgs2/Cox-2 as well as other highlydifferentially expressed genes (Mef2c , Cxcl12 , and Tgfb2) showed asimilar degree of up or down-regulation upon expression of US28 inNIH-3T3 cells (Supplementary Table S3). Because of the previouslyreported oncogenic potential of US28 in vivo , we also determinedexpression levels of these genes in RNA extracted from fiveindependent US28-induced tumors derived from our xenograftmodel (14). Expression of US28 was confirmed in all mouse tumorsby qPCR and Ptgs2/Cox-2 mRNA was highly up-regulated in US28-induced tumors, highlighting a potential important role for COX-2during tumorigenesis.US28 constitutive activity up-regulates COX-2 expression.

COX-2 is highly up-regulated in a variety of cancers and is knownto drive expression of cyclin D1 and VEGF (30). Because COX-2 isalso up-regulated in HCMV-infected cells (20) and expression ofUS28 results in induction of cyclin D1 and VEGF expression (14),we decided to further focus on COX-2 and examine its role in US28-induced proliferative signaling and tumor formation. US28-WT–

expressing cells, but not cells expressing the G-protein uncoupledmutant US28-R129A, have been shown to present a transformedphenotype in vitro (14). NIH-3T3 cells expressing US28-R129Ashowed comparable receptor expression levels to US28-WT–expressing cells as measured by [125I]-CX3CL1 binding (Fig. 1A)but did not show increases in inositol phosphate accumulation(Fig. 1B). Analysis of COX-2 mRNA expression by qPCR showed a19.7- F 1.8-fold increase in US28-WT–expressing cells comparedwith mock-transfected cells (Fig. 2A). Cells expressing US28-R129Arevealed no significant difference (1.6- F 0.6-fold) in COX-2 mRNAlevels compared with mock-transfected cells (Fig. 2A). Similarly,US28-WT–transfected NIH-3T3 cells showed a marked increasein COX-2 protein expression compared with mock-transfected andUS28-R129A–expressing cells (Fig. 2B).US28 induces COX-2 and VEGF transcription via activation

of NF-KB. To understand the molecular mechanisms resultingin the up-regulation of COX-2, signaling studies with a COX-2promoter reporter (25) were performed in HEK 293T cells. US28induced the human COX-2 promoter activation in a dose-dependent manner, but no increase in COX-2 promoter activitywas observed in US28-R129A–expressing HEK 293T (Fig. 3A).Because the transcription of the COX-2 gene is under the controlof NF-nB (31), we investigated the contribution of NF-nB in theCOX-2 promoter reporter gene. US28-WT, but not US28-R129A,constitutively activated the NF-nB transcription factor in trans-fected HEK 293T cells (Fig. 3A). Moreover, inhibition of NF-nBactivation with the InB phosphorylation inhibitor Bay 11-7082(5 Amol/L) resulted in a severe reduction of US28-induced COX-2

Figure 1. Characterization and microarray analysis ofUS28-expressing NIH-3T3 cells. A, independent stableclonal NIH-3T3 cell lines transfected with US28-WTand the G-protein uncoupled mutant US28-R129A bind[125I]-CX3CL1. Unlabeled CX3CL1 displacesspecifically bound [125I]-CX3CL1. B, US28-WTconstitutively induces the formation of inositolphosphate (InsP ), whereas US28-R129A does not.C, affymetrix Mouse Genome Array data from US28and mock-transfected cells were analyzed with theLIMMA software using a false discovery rate V0.02.The 35 most up-regulated (up ) and down-regulated(down ) probe sets are represented with a heatmap.Colors indicate log2 intensity values of normalizedprobe sets.

Role of COX-2 in US28-Induced Tumor Formation

www.aacrjournals.org 2863 Cancer Res 2009; 69: (7). April 1, 2009

American Association for Cancer Research Copyright © 2009 on April 17, 2012cancerres.aacrjournals.orgDownloaded from

Published OnlineFirst March 24, 2009; DOI:10.1158/0008-5472.CAN-08-2487

Page 7: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

INTEGRATIVE APPROACH

Gene expression“Lympho-chip”, 3583 tumor-related genes (ABC: 82, GCB: 112)

PPI dataintersection of HPRD

and Lympho-chip

+

been shown to constitutively activate NF-nB signaling pathways(15), the list of modulated genes was compared with knowntranscriptional targets of NF-nB7 (Table 1). Ptgs2/Cox-2 also seemedto be one of the most strongly up-regulated NF-nB target genes.This finding of Ptgs2/Cox-2 in both cancer and NF-nB gene lists wasof particular interest because COX-2 protein is highly up-regulatedin various forms of cancers (29) but also induced upon HCMV-infection (20).To validate the microarray expression data, a few genes with

high fold changes were analyzed by means of qPCR. As observedin the microarray experiment, Ptgs2/Cox-2 as well as other highlydifferentially expressed genes (Mef2c , Cxcl12 , and Tgfb2) showed asimilar degree of up or down-regulation upon expression of US28 inNIH-3T3 cells (Supplementary Table S3). Because of the previouslyreported oncogenic potential of US28 in vivo , we also determinedexpression levels of these genes in RNA extracted from fiveindependent US28-induced tumors derived from our xenograftmodel (14). Expression of US28 was confirmed in all mouse tumorsby qPCR and Ptgs2/Cox-2 mRNA was highly up-regulated in US28-induced tumors, highlighting a potential important role for COX-2during tumorigenesis.US28 constitutive activity up-regulates COX-2 expression.

COX-2 is highly up-regulated in a variety of cancers and is knownto drive expression of cyclin D1 and VEGF (30). Because COX-2 isalso up-regulated in HCMV-infected cells (20) and expression ofUS28 results in induction of cyclin D1 and VEGF expression (14),we decided to further focus on COX-2 and examine its role in US28-induced proliferative signaling and tumor formation. US28-WT–

expressing cells, but not cells expressing the G-protein uncoupledmutant US28-R129A, have been shown to present a transformedphenotype in vitro (14). NIH-3T3 cells expressing US28-R129Ashowed comparable receptor expression levels to US28-WT–expressing cells as measured by [125I]-CX3CL1 binding (Fig. 1A)but did not show increases in inositol phosphate accumulation(Fig. 1B). Analysis of COX-2 mRNA expression by qPCR showed a19.7- F 1.8-fold increase in US28-WT–expressing cells comparedwith mock-transfected cells (Fig. 2A). Cells expressing US28-R129Arevealed no significant difference (1.6- F 0.6-fold) in COX-2 mRNAlevels compared with mock-transfected cells (Fig. 2A). Similarly,US28-WT–transfected NIH-3T3 cells showed a marked increasein COX-2 protein expression compared with mock-transfected andUS28-R129A–expressing cells (Fig. 2B).US28 induces COX-2 and VEGF transcription via activation

of NF-KB. To understand the molecular mechanisms resultingin the up-regulation of COX-2, signaling studies with a COX-2promoter reporter (25) were performed in HEK 293T cells. US28induced the human COX-2 promoter activation in a dose-dependent manner, but no increase in COX-2 promoter activitywas observed in US28-R129A–expressing HEK 293T (Fig. 3A).Because the transcription of the COX-2 gene is under the controlof NF-nB (31), we investigated the contribution of NF-nB in theCOX-2 promoter reporter gene. US28-WT, but not US28-R129A,constitutively activated the NF-nB transcription factor in trans-fected HEK 293T cells (Fig. 3A). Moreover, inhibition of NF-nBactivation with the InB phosphorylation inhibitor Bay 11-7082(5 Amol/L) resulted in a severe reduction of US28-induced COX-2

Figure 1. Characterization and microarray analysis ofUS28-expressing NIH-3T3 cells. A, independent stableclonal NIH-3T3 cell lines transfected with US28-WTand the G-protein uncoupled mutant US28-R129A bind[125I]-CX3CL1. Unlabeled CX3CL1 displacesspecifically bound [125I]-CX3CL1. B, US28-WTconstitutively induces the formation of inositolphosphate (InsP ), whereas US28-R129A does not.C, affymetrix Mouse Genome Array data from US28and mock-transfected cells were analyzed with theLIMMA software using a false discovery rate V0.02.The 35 most up-regulated (up ) and down-regulated(down ) probe sets are represented with a heatmap.Colors indicate log2 intensity values of normalizedprobe sets.

Role of COX-2 in US28-Induced Tumor Formation

www.aacrjournals.org 2863 Cancer Res 2009; 69: (7). April 1, 2009

American Association for Cancer Research Copyright © 2009 on April 17, 2012cancerres.aacrjournals.orgDownloaded from

Published OnlineFirst March 24, 2009; DOI:10.1158/0008-5472.CAN-08-2487

Page 8: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

INTEGRATIVE APPROACH

Gene expression“Lympho-chip”, 3583 tumor-related genes (ABC: 82, GCB: 112)

Integrate

Goal: find functional modules• → disease mechanisms → new drugs • → robust signatures → classify and predict

PPI dataintersection of HPRD

and Lympho-chip

+

[Ideker et al., Bioinformatics, 2002]

Page 9: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

heinz

[20:14 18/6/03 Bioinformatics-btn161.tex] Page: i223 i223–i231

BIOINFORMATICS Vol. 24 ISMB 2008, pages i223–i231doi:10.1093/bioinformatics/btn161

Identifying functional modules in protein–protein interactionnetworks: an integrated exact approachMarcus T. Dittrich1,2,!,†, Gunnar W. Klau3,4,!,†, Andreas Rosenwald5,Thomas Dandekar1 and Tobias Müller1,!1Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg, 2Institute ofClinical Biochemistry, University of Würzburg, Josef-Schneider-Str. 2, 97080 Würzburg, 3Mathematics in LifeSciences Group, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 3, 14195Berlin, 4DFG Research Center MATHEON, Berlin and 5Institute of Pathology, University of Würzburg,Josef-Schneider-Str. 2, 97080 Würzburg, Germany

ABSTRACTMotivation: With the exponential growth of expression and protein–protein interaction (PPI) data, the frontier of research in systemsbiology shifts more and more to the integrated analysis of theselarge datasets. Of particular interest is the identification of functionalmodules in PPI networks, sharing common cellular function beyondthe scope of classical pathways, by means of detecting differentiallyexpressed regions in PPI networks. This requires on the one handan adequate scoring of the nodes in the network to be identifiedand on the other hand the availability of an effective algorithm to findthe maximally scoring network regions. Various heuristic approacheshave been proposed in the literature.Results: Here we present the first exact solution for this problem,which is based on integer-linear programming and its connectionto the well-known prize-collecting Steiner tree problem fromOperations Research. Despite the NP-hardness of the underlyingcombinatorial problem, our method typically computes provablyoptimal subnetworks in large PPI networks in a few minutes.An essential ingredient of our approach is a scoring function definedon network nodes. We propose a new additive score with twodesirable properties: (i) it is scalable by a statistically interpretableparameter and (ii) it allows a smooth integration of data from varioussources.

We apply our method to a well-established lymphoma microarraydataset in combination with associated survival data and the largeinteraction network of HPRD to identify functional modules bycomputing optimal-scoring subnetworks. In particular, we find afunctional interaction module associated with proliferation over-expressed in the aggressive ABC subtype as well as modules derivedfrom non-malignant by-stander cells.Availability: Our software is available freely for non-commercialpurposes at http://www.planet-lisa.net.Contact: [email protected]

1 INTRODUCTIONConstruction and analysis of large biological networkshave become major research topics in systems biology(Aittokallio and Schwikowski, 2006). Various aspects have beenanalyzed including the inference of cellular networks from gene

!To whom correspondence should be addressed.

†The authors wish it to be known that, in their opinion, the first two authors

should be regarded as joint First Authors.

expression (Friedman, 2004), network alignments (Flannick et al.,2006; Kelley et al., 2003; Sharan and Ideker, 2006) and otherrelated strategies as reviewed by Srinivasan et al. (2007). Atthe same time, well-established microarray technologies providea wealth of information on gene expression in various tissuesand under diverse experimental conditions. Integrating protein–protein interaction (PPI) and gene-expression data generates ameaningful biological context in terms of functional association fordifferentially expressed genes.

Frequently, large scale expression profiling studies investigatemany experimental conditions simultaneously, thereby generatingmultiple P-values. Especially in tumor biology expression profilinghas become a well-established tool for the classification of differenttumors and tumor subtypes. Furthermore, in the clinical context,various patient-associated data are available that—in conjunctionwith expression data—provide valuable information of the influenceof specific genes on disease-specific pathophysiology. In particularthe analysis of survival data allows to establish gene expressionsignatures to make predictions about the prognosis and to assessthe disease relevance of certain genes. However, the cellularfunction of an individual gene cannot be understood on thelevel of isolated components alone, but needs to be studiedin the context of its interplay with other gene products. Thecombined analysis of expression profiles and PPI data thus allowsthe detection of previously unknown dysregulated modules ininteraction networks not recognizable by the analysis of a prioridefined pathways.

Ideker et al. (2002) have proposed to identify interaction modulesin this setting by devising firstly an adequate scoring functionon networks and secondly an algorithm to find the high-scoringsubnetworks. The underlying combinatorial problem has beenproven to be NP-hard for additive score functions defined onthe nodes of the network. The authors proposed a heuristicstrategy based on simulated annealing and developed a scoreto measure the significance of a subnetwork that includes theintegration of multivariate P-values. This score has been extendedby Rajagopalan and Agarwal (2005) to incorporate an adjustmentparameter in order to obtain smaller subgraphs in conjunctionwith a greedy search algorithm. This approach however, excludesthe possibility to combine multiple P-values. Variants of greedysearch strategies have also been used by Nacu et al. (2007) andSohler et al. (2004). Subsequently Cabusora et al. (2005) proposedan edge score by adapting the scoring concept of Ideker et al. (2002).

© 2008 The Author(s)This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/)which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

(heavy induced zubgraphs)

Page 10: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

OVERVIEW

Page 11: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

OVERVIEW

Statisticsgene-wise signal score

Page 12: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

signal⇠ B(a,1)

noise⇠ uniform(0,1)⌘ B(1,1)

f

mix

(x) = lnoise(x) + (1�l)signal(x)

= lB(1,1)(x) + (1�l)B(a,1)(x)

= l + (1�l)ax

a�1

STATISTICS

p−values (second order statistics)

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

1214

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

quantiles of expected p−values under the mixture modelob

serv

ed p−v

alue

s (s

econ

d or

der s

tatis

tics)

signal + noisenoise

p-value

dens

ity

[Pounds, Morris, Bioinformatics, 2003]

Page 13: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

signal⇠ B(a,1)

noise⇠ uniform(0,1)⌘ B(1,1)

f

mix

(x) = lnoise(x) + (1�l)signal(x)

= lB(1,1)(x) + (1�l)B(a,1)(x)

= l + (1�l)ax

a�1

STATISTICS

p−values (second order statistics)

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

1214

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

quantiles of expected p−values under the mixture modelob

serv

ed p−v

alue

s (s

econ

d or

der s

tatis

tics)

signal + noisenoise

p-value

dens

ity

[Pounds, Morris, Bioinformatics, 2003]

Page 14: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

signal⇠ B(a,1)

noise⇠ uniform(0,1)⌘ B(1,1)

0

500

1000

1500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

λλ

a

l = 0.536

a = 0.276

f

mix

(x) = lnoise(x) + (1�l)signal(x)

= lB(1,1)(x) + (1�l)B(a,1)(x)

= l + (1�l)ax

a�1

STATISTICS

p−values (second order statistics)

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

1214

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

quantiles of expected p−values under the mixture modelob

serv

ed p−v

alue

s (s

econ

d or

der s

tatis

tics)

signal + noisenoise

p-value

dens

ity

[Pounds, Morris, Bioinformatics, 2003]

Page 15: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

signal⇠ B(a,1)

noise⇠ uniform(0,1)⌘ B(1,1)

0

500

1000

1500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

λλ

a

l = 0.536

a = 0.276

S(x) = log

✓B(a,1)(x)

B(1,1)(x)

◆= log

ax

a�1

1

= loga + (a�1) logx

f

mix

(x) = lnoise(x) + (1�l)signal(x)

= lB(1,1)(x) + (1�l)B(a,1)(x)

= l + (1�l)ax

a�1

STATISTICS

p−values (second order statistics)

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

1214

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

quantiles of expected p−values under the mixture modelob

serv

ed p−v

alue

s (s

econ

d or

der s

tatis

tics)

signal + noisenoise

p-value

dens

ity

[Pounds, Morris, Bioinformatics, 2003]

Page 16: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

OVERVIEW

Statisticsgene-wise signal score

Page 17: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

OVERVIEW

Statisticsgene-wise signal score

Combinatorial optimizationbest functional modules

Page 18: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

DISTRICT HEATING

Page 19: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

DISTRICT HEATING

Page 20: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

Page 21: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

• w′ := smallest weight

Page 22: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

• w′ := smallest weight• Set profits p(v) = w(v) − w′

for all nodes v

Page 23: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

• w′ := smallest weight• Set profits p(v) = w(v) − w′

for all nodes v• Set all edge costs to -w′

Page 24: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

• w′ := smallest weight• Set profits p(v) = w(v) − w′

for all nodes v• Set all edge costs to -w′

Page 25: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

TRANSFORMATION TO PCST

• w′ := smallest weight• Set profits p(v) = w(v) − w′

for all nodes v• Set all edge costs to -w′

TheoremTree G′ with profit P(G′) in transformed PCST instance is solution in original instance with weight

S(G′) = P(G′) + w′

Page 26: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

EXACT ALGORITHM FOR PCST

3

330

5

7

11

102 2

2

2

4

1

2

3

3 43

2

6

3

6

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

Page 27: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

EXACT ALGORITHM FOR PCST

• Preprocessing3

330

5

7

11

102 2

2

2

4

1

2

3

3 43

2

6

3

6

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

Page 28: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

EXACT ALGORITHM FOR PCST

• Preprocessing

• Transform into directed instance (V ∪{r}, A)

3

330

5

7

11

102 2

2

2

4

1

2

3

3 43

2

6

3

6

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

Page 29: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemHow to solve PCST to optimality

1 Preprocess instance

2 Transform into directed problem[Fischetti, 1991]

3 Introduce variables

xv =

�1 v ⇥ G�

0 otherwise

yuv =

�1 uv ⇥ G�

0 otherwise

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

EXACT ALGORITHM FOR PCST

• Preprocessing

• Transform into directed instance (V ∪{r}, A)

• Introduce variables

3

330

5

7

11

102 2

2

2

4

1

2

3

3 43

2

6

3

6

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 30: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 31: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 32: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 33: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 34: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 35: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 36: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

Lymphoma Statistics Algorithmics Results Conclusions

Optimization problemInteger linear program (ILP) for PCST

max �v⇧V\{r}

p(v)xv � �uv⇧E

c(uv)yuv

s. t. �rv⇧A

yrv ⇥ 1

�uv⇧A

yuv = xv ⌃v ⇧ V \{r}

�u /⇧S,v⇧S

yuv ⇤ xw ⌃w ⇧ S, r /⇧ S,⌃S ⌅ V

xv ,yuv ⇧ {0,1} ⌃uv ⇧ E ,⌃v ⇧ V \{r}

[Steiner arborescence formulation, Ljubic et al., 2006]

3

33

5

7

11

102

4

4

1

2

3

3

2

6

3

4

1

4

4

2

3

3

3

4

2

6

r

A

A

EXACT ALGORITHM FOR PCST

[Ljubic, Weiskircher, Pferschy, K., Mutzel, Fischetti. Mathematical Programming, 2006]

Page 37: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

consensus heinz

Robustness and accuracy of functional modules inintegrated network analysisDaniela Beisser 1, Stefan Brunkhorst 1, Thomas Dandekar 1, Gunnar W.Klau 2,3, Marcus T. Dittrich 1⇤ and Tobias Müller 1⇤1Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg,Germany. 2Life Sciences Group, Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098XG Amsterdam, The Netherlands. 3Netherlands Institute for Systems Biology.

ABSTRACTMotivation: High-throughput molecular data provide a wealth ofinformation that can be integrated into network analysis. Severalapproaches exist that identify functional modules in the context of in-tegrated biological networks. The objective of this study is twofold:first to assess the accuracy and variability of identified modulesand second to develop an algorithm for deriving highly robust andaccurate solutions.Results: In a comparative simulation study accuracy and robust-ness of the proposed and established methodologies are validated,considering various sources of variation in the data. To assess thisvariation, we propose a jackknife resampling procedure resulting inan ensemble of optimal modules. A consensus approach summa-rizes the ensemble into one final module containing maximally robustnodes and edges. The resulting consensus module identifies and vi-sualizes robust and variable regions by assigning support values tonodes and edges. Finally, the proposed approach is exemplified ontwo large gene expression studies: diffuse large B-cell lymphoma(DLBCL) and acute lymphoblastic leukemia (ALL).Contact: [email protected]@biozentrum.uni-wuerzburg.deSupplementary information: Supplementary data is available atBioinformatics online.

1 INTRODUCTIONMultiple genome-scale data sets nowadays allow to model the cell asan intricate network of molecular interactions. Research in systemsbiology has changed accordingly, now focusing on network analysisof high-throughput genome-, transcriptome- and proteome data.Reaching beyond the analysis of mere topological questions, in-tegrated network analysis incorporates additional molecular datainto a network. For gene expression data integrated approaches areused to search for pathways, functional modules or gene signaturescontaining differentially expressed genes in the context of gene net-works or protein-protein interaction (PPI) networks (Ideker et al.,2002; Scott et al., 2006; Ulitsky and Shamir, 2007; Dittrich et al.,2008). Given the integrated gene expression data, the objective is

⇤to whom correspondence should be addressed

to find the maximal significantly deregulated (i.e. differentially ex-pressed) set of interconnected genes in the cellular network. Werefer to the resulting connected subnetwork as a functional mod-ule, which is also denoted as active or perturbed module (Idekeret al., 2002). Please note, that this is in contrast to other fields inbiology (e.g. proteomics), where functional modules denote proteincomplexes (Pu et al., 2007).Various methods have been proposed to identify functional mod-ules in an integrated network. In this study we focus on the popularapproaches proposed by Ideker et al. (2002), Ulitsky and Shamir(2007) and Dittrich et al. (2008). While these algorithms differin many important aspects, conceptually they all aim at identify-ing connected subnetworks that contain significantly deregulatedgenes. Ideker et al. (2002) introduced the problem and proposeda simulated annealing approach to identify subnetworks. Due to theheuristic nature of such sampling approaches, the resulting mod-ules are not optimal in general. In an alternative approach Ulitskyet al. (2010) propose the algorithm DEGAS (DysrEgulated Geneset Analysis via Subnetworks), based on a greedy approximation toidentify subnetworks of dysregulated genes. In contrast to the abovementioned approaches, the algorithm of Dittrich et al. (2008) iden-tifies optimally scoring subnetworks using an exact algorithm basedon integer linear programming (ILP).Besides the accuracy of a module identification method, the robust-ness of obtained solutions is of particular importance. A naturalquestion is: How variable are the provided solutions (given themethod)? A highly variable method produces largely differing solu-tions in different runs or on slightly perturbed input data and is thusless reliable. Clearly, well designed algorithms should ideally showboth: high accuracy as well as high robustness. Here we investigatethe accuracy and robustness of the three prominent module detectionalgorithms regarding (i) the integrated gene expression data and (ii)the network structure of the PPI network itself.As a consequence of the investigation we propose a novel methodto calculate accurate as well as robust modules in which robustparts are indicated by support values, introducing the new conceptof consensus modules. In phylogeny, Felsenstein (1985) introducedresampling approaches (e.g., bootstrap, jackknife) to define a con-fidence measure for splits in a phylogenetic tree and to calculateconsensus trees. Similarly, resampling procedures can be used toassess the robustness of functional modules in integrated network

1© The Author (2012). Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Associate Editor: Dr. Trey Ideker

Bioinformatics Advance Access published May 11, 2012

at CW

I on June 21, 2012http://bioinform

atics.oxfordjournals.org/D

ownloaded from

Page 38: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

QUALITY AND ROBUSTNESS

■ How good are the modules?

Page 39: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

QUALITY AND ROBUSTNESS

Noise in high-troughput data

Noise in (some) methods

Noise in networks

■ How good are the modules?

Page 40: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

QUALITY AND ROBUSTNESS

Noise in high-troughput data

Noise in (some) methods

Noise in networks

■ How robust are the modules?

Optimality versus robustness, ideally both

■ → Compare jactiveModules, DEGAS and heinz•

■ How good are the modules?

Page 41: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

EVALUATE ROBUSTNESS

■ Experiment• Take PPI network, simulate module M and 20 replicates

of gene expression data• Task: find M back

Page 42: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

EVALUATE ROBUSTNESS

■ Experiment• Take PPI network, simulate module M and 20 replicates

of gene expression data• Task: find M back

Page 43: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

IDEA: RESAMPLING

Page 44: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

CONSENSUS NETWORK

■ J jackknife replicates: Modules

■ Re-score nodes and edges with

with , e.g.,

■ This is again an instance of (a variant of) MWCS

■ Solution = consensus module

(Vi , Ei ), i = 1, ... , J

S(v ) =

JX

i=1

|{v} \ Vi |!

� ⇢ 8v 2 V

S(e) =

JX

i=1

|{e} \ Ei |!

� ⇢ 8e 2 E

⇢ 2 [0, J] ⇢ =J2

Page 45: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

APPLICATION: ALL

• PINA network: 11354 proteins, 68257 interactions• 359 samples on Affymetrix hgu133plus2 gene chips (54675 probesets, 19738 genes)

Page 46: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

SUMMARY

Active subnetworks hint at (malfunctioning) molecular mechanisms

The Heinz method achieves this by

■ integrating heterogeneous data by aggregation of p-values

■ data-driven, adjustable, additive network score (based on signal/noise decomposition)

■ provable optimality (connection to classical OR problem)

■ Better and faster (!) than heuristics

consensus Heinz: + robustness

Page 47: Robustness and accuracy of functional modules in …...Robustness and accuracy of functional modules in integrated network analysis Gunnar W. Klau Life Sciences Group CWI Amsterdam

CURRENT AND FUTURE WORK

■ prediction

■ application: viral GPCRs, breast cancer (NKI), twins, ...

■ algorithm: get rid of PCST deviation, suboptimal modules, treewidth-based, ...

■ extensions: modules over time, conserved modules

A Critical Evaluation of Network and Pathway-BasedClassifiers for Outcome Prediction in Breast CancerChristine Staiger1,2*, Sidney Cadot2, Raul Kooter3, Marcus Dittrich4, Tobias Muller4, Gunnar W. Klau1,5*",

Lodewyk F. A. Wessels2,3,6*"

1 Centrum Wiskunde & Informatica, Life Sciences Group, The Netherlands, 2 Bioinformatics and Statistics, The Netherlands Cancer Institute, Amsterdam, The Netherlands,

3 Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft, The Netherlands, 4 Department of Bioinformatics, Biocenter,

University of Wurzburg, Wurzburg, Germany, 5 Netherlands Institute for Systems Biology, Amsterdam, The Netherlands, 6 Cancer Systems Biology Center, The Netherlands

Cancer Institute, Amsterdam, The Netherlands

Abstract

Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, suchas protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches,new composite features are typically constructed by aggregating the expression levels of several genes. The secondary datasources are employed to guide this aggregation. Although many studies claim that these approaches improve classificationperformance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact thatdifferent breast cancer data sets and validation procedures are employed to assess the performance. Here we address theseissues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiasedevaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that compositefeature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selectedfeatures; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneityof the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomizationof secondary data sources, which destroys all biological information in these sources, does not result in a deterioration inperformance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed,the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currentlyno reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcomein breast cancer.

Citation: Staiger C, Cadot S, Kooter R, Dittrich M, Muller T, et al. (2012) A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction inBreast Cancer. PLoS ONE 7(4): e34796. doi:10.1371/journal.pone.0034796

Editor: Joaquın Dopazo, Centro de Investigacion Prıncipe Felipe, Spain

Received October 14, 2011; Accepted March 9, 2012; Published April 27, 2012

Copyright: ! 2012 Staiger et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: No current external funding sources for this study.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected] (CS); [email protected] (GWK); [email protected] (LFAW)

" These authors are joint last authors on this work.

Introduction

Modern high-throughput methods provide the means toobserve genome wide changes in gene expression patterns inbreast cancer samples. Gene expression signatures have beenproposed [1,2] to predict prognosis in breast cancer patients, butwere shown to vary substantially between data sets. One possibleexplanation for this effect is that the data sets on which thepredictors are trained are typically poorly dimensioned, consistingof many more genes than samples. Integrating secondary datasources like protein-protein interaction (PPI) networks, co-expres-sion networks or pathways from databases such as KEGG, hasrecently been proposed to overcome variability of prognosticsignatures and to increase their prognostic performance [3–7].Many of these studies claim that combining gene expression datawith secondary data sources to construct composite features resultsin higher accuracy in outcome prediction and higher stability ofthe obtained signatures. In addition, inclusion of the secondarysources raises the hope that the obtained signatures will be more

interpretable and thus provide more insight into the molecularmechanisms governing survival in breast cancer.

The underlying idea of these methods is that genes do not act inisolation, and that complex diseases such as cancer are actuallycaused by the deregulation of complete processes or pathways,representing ‘hallmarks of cancer’ [8]. This is unlikely to happendue to an aberration in a single gene, and often multiple genesneed to be perturbed to disable a process. This leads to the notionthat aggregating gene expression of functionally linked genessmooths out noise and provides more power to detect deregulationof complete functional units and hence to obtain a clearer pictureof the biological process underlying tumorigenesis and diseaseoutcome.

The observed improvement in classification accuracy achievedby the approaches employing secondary data is hard to assess sinceit is dependent on many factors such as the specific data sets andevaluation protocol employed. To shed more light on this issue weperformed an extensive comparison of a simple, single genes basedclassifier with three of the most popular approaches that includesecondary data sources in the construction of the classifier. More

PLoS ONE | www.plosone.org 1 April 2012 | Volume 7 | Issue 4 | e34796