Joint analysis of regulatory networks and expression profiles

download Joint analysis of regulatory networks and expression profiles

If you can't read please download the document

description

Joint analysis of regulatory networks and expression profiles. Ron Shamir School of Computer Science Tel Aviv University April 2013. Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). - PowerPoint PPT Presentation

Transcript of Joint analysis of regulatory networks and expression profiles

1Joint analysis of regulatory networks and expression profilesRon ShamirSchool of Computer ScienceTel Aviv UniversityApril 20131

Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. BioinformaticsVol. 25 no. 9 1158-1164 (2009).

1OutlineBackgroundJoint network and expression profilesMatisseCezanne

22Background33DNARNAproteintranscriptiontranslation

The hard diskOne programIts output44DNA Microarrays / RNA-seqSimultaneous measurement of expression levels of all genes / transcripts.Perform 105-109 measurements in one experimentAllow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade

http://www.biomedcentral.com/1471-2105/12/323/figure/F255The Raw Data genesexperimentsEntries of the Raw Data matrix: expression levels.Ratios/absolute values/

expression pattern for each gene Profile for each experiment/condition/sample/chip Needs normalization!667EXPression ANalyzer and DisplayERClustering Identify clusters of co-expressed genesCLICK, KMeans, SOM, hierarchicalhttp://acgt.cs.tau.ac.il/expanderA. Maron, R. Sharan Bioinformatics 03Function. enrichmentGO, TANGO

Visualization

Promoter analysis Analyze TF binding sites of co-regulated genesPRIMABiclustering Identify homogeneous submatricesSAMBAA. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 microRNA function inference: FAME

Ulitsky et al. Nature Protocols 107Networks of Protein-protein interactions (PPIs)Large, readily available resourceRepresentation: Network with nodes=proteins/genes edges=interactions

8

Analysis methods:Global propertiesMotif content analysisComplex extractionCross-species comparison

The hairball syndrome

9Potential inroad into pathways and functionCan the network help to improve the analysis?10Analysis of gene expression profiles + a network111112GoalChallenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressedWhere is the action in the network in a particular experiment?12Ron Shamir, RNA Antalia, April 0813

1313

141415

Ulitsky & ShamirBMC Systems Biology 0715Input: Expression data and a PPI networkOutput: a collection of modulesConnected PPI subnetworksCorrelated expression profilesInteractionHigh expression similarityhttp://acgt.cs.tau.ac.il/matisse16Modular Analysis for Topology of Interactions and Similarity SEts

16Probabilistic model Event Mij: i,j are mates = highly co-expressedP(Sij|Mij) ~ N(m , 2m)P(Sij|Mij) ~ N(n , 2n)H0: U is a set of unrelated genesH1: U is a module = connected subnetwork with high internal similarityRi: gene i transcriptionally regulatedm: fraction of mates out of module gene pairs that are transcriptionally regulatedm= P(Mij| Ri Rj, H1)pm: fraction of mates out of all gene pairs that are transcriptionally regulated

1717Probabilistic model (2)Is connected gene set U a module? Assuming pair indep:Define mij= m P(Ri)P(Rj)

Define nij= pm P(Ri)P(Rj).Likelihood ratio Pr(Data|H1)/Pr Data|H0)

Taking log: sum of terms ij:

18

18 Probabilistic model - summary Similarities: mixture of two GaussiansFor a candidate group U, the likelihood ratio of originating from a module or from the background is

Module score = Gene group likelihood ratio = sum over all the gene pairs

Find connected subgraphs U with high WU

1919ComplexityFinding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights)

Devised a heuristic for the problem2020MATISSE workflowSeed generationGreedy optimizationSignificance filtering21Finding seedsThree seeding alternatives testedAll alternatives build a seed and delete it from the networkBuilding small seeds around single nodes:Best neighborsAll neighborsApproximating the heaviest subgraphDelete low-degree nodes and record the heaviest subnetwork found22Greedy optimizationSimultaneous optimization of all the seedsThe following steps are considered:Node additionNode removalAssignment changeModule merge23Front vs. Back nodesOnly a fraction of the genes (front nodes) have meaningful similarity valuesMATISSE can link them using other genes (back nodes).

Back nodes correspond to:Unmeasured transcriptsPost-translational regulationPartially regulated pathways2424Advantages of MATISSENo p-vals needed for measurementsWorks when a fraction of the genes expression patterns are informativeCan handle any similarity dataNo prespecified number of modules25Test case: Yeast osmotic shockNetwork: 65,990 PPIs & protein-DNA interactions among 6,246 genesExpression: 133 experimental conditions response of perturbed strains to osmotic shock (ORourke & Herskowitz 04)Front nodes: 2,000 genes with the highest variance

2626

Pheromone response subnetwork

BackFront

27Performance comparison

% of modules with category enrichment at p< 10-3

% annotations enriched at p