Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene...
Transcript of Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene...
![Page 1: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/1.jpg)
Inference of Directed Acyclic Graphs Using Spectral Clustering
Allison Paul
Fifth Annual MIT PRIMES Conference
May 17, 2015
![Page 2: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/2.jpg)
Genes A and B are involved in the same process
Gene A
Gene B
Introduction
![Page 3: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/3.jpg)
Gene Ontology (GO)
Gene Ontology Terms
Genes
Examples of Gene Ontology Terms: oxygen binding, response to x-ray, sympathetic nervous system development
This type of network is a directed acyclic graph (DAG)
![Page 4: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/4.jpg)
Problem Statement: Given a gene similarity matrix, find the directed acyclic graph Inferring such a graph using a gene similarity matrix is NP-hard in general.
Goal: Infer this graph using gene similarities What is gene similarity?
Functional similarity: gene expression Physical similarity
![Page 5: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/5.jpg)
Current Method
Bottom-up algorithm using maximal cliques (Kramer et al. 2014)
Computational complexity:
Clique: a subset of nodes in which each pair of nodes is connected by an edge
![Page 6: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/6.jpg)
We propose an approximate algorithm that finds quasi-cliques among the genes
Our Approach
Top-Down Algorithm: we infer nodes at layer l using nodes at layer l - 1
Layer 0
Layer 1
Layer 2
Layer 3
![Page 7: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/7.jpg)
Second Eigenvector
Firs
t Ei
gen
vect
or
Spectral Clustering We analyze the top k-1 eigenvectors of the similarity matrix
![Page 8: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/8.jpg)
K-Means Algorithm
Greedy algorithm that identifies clusters among points in Rn
![Page 9: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/9.jpg)
The original problem can be thus simplified to the inference problem of overlapping clusters in a network.
Overlapping Clusters
![Page 10: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/10.jpg)
Use spectral clustering methods to partition network into k clusters
Spectral Clustering
![Page 11: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/11.jpg)
Metric for combining clusters
W(CA, CB) = density(CAUCB) –
average(density(CA), density(CB))
W(C1, C2) = - 0.03 W(C1, C3) = - 0.2
![Page 12: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/12.jpg)
Cluster Similarity Matrix
1 2 3 4 5 6 7 8 9
1 0 -.02 -.172 -.20 -.082 -.273 -.122 -.321 -.273
2 -.02 0 -.031 -.019 -.091 -.304 -.14 -.102 -.177
3 -.172 -.031 0 -.041 -.155 -.203 -.37 -.088 -.209
4 -.20 -.019 -.041 0 -.027 -.012 -.221 -.298 -.078
5 -.082 -.091 -.155 -.027 0 -.034 -.098 -.120 -.192
6 -.273 -.304 -.203 -.012 -.034 0 -.017 -.038 -.232
7 -.122 -.14 -.37 -.221 -.098 -.017 0 -.044 -.311
8 -.321 -.102 -.088 -.298 -.120 -.038 -.044 0 -.029
9 -.273 -.177 -.209 -.078 -.192 -.232 -.311 -.029 0
> threshold
Mi,j = W(Ci, Cj)
![Page 13: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/13.jpg)
Finding Maximal Cliques
We are left with the same problem as before: identifying overlapping clusters.
Except, we have greatly reduced the dimension of the problem!
![Page 14: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/14.jpg)
Use the maximal cliques to combine clusters
![Page 15: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/15.jpg)
Average density of clusters vs. number of clusters (k = 1,2,…,10)
![Page 16: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/16.jpg)
The clusters found using the algorithm correspond to the GO terms in the DAG
1-200 200- 300
300- 400
400- 500
800-900
900-1100 700-800
600-700
500-600
Genes:
![Page 17: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/17.jpg)
Next Steps
Applying this algorithm successively to a real gene similarity matrix to infer the entire DAG
![Page 18: Inference of Directed Acyclic Graphs Using Spectral Clustering · Functional similarity: gene expression Physical similarity . Current Method Bottom-up algorithm using maximal cliques](https://reader035.fdocuments.us/reader035/viewer/2022081614/5fd02c819f857867ae4427f1/html5/thumbnails/18.jpg)
Acknowledgements
I would like to thank my mentor, Soheil Feizi, for all his help!
Also, thank you PRIMES for this great experience!