Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering...
Transcript of Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering...
![Page 1: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/1.jpg)
BiclusteringBiclustering Algorithms for Algorithms for Biological Data AnalysisBiological Data Analysis
Sara C. Madeiraand Arlindo L. Oliveira
Presentation byMatthew Hibbs
![Page 2: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/2.jpg)
2
What is Biclustering?
• Given an n x m matrix, A, find a set of submatrices, Bk, such that the contents of each Bk follow a desired pattern
• Row/Column order need not be consistent between different Bks
![Page 3: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/3.jpg)
3
Why Bicluster?
• Genes not regulated under all conditions• Genes regulated by multiple
factors/processes concurrently
• Key to determine function of genes• Key to determine classification of
conditions
![Page 4: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/4.jpg)
4
Simple “Bicluster”
Unclustered ClusteredDataset from Garber et al.
![Page 5: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/5.jpg)
5
Formal Definitions
• A contains rows X and columns Y• X = {x1, …, xn} Y = {y1, …, yn}• I ⊆ X, J ⊆ Y, AIJ = (I,J) = submatrix• I = {i1, …, ik} J = {j1, …, js}
![Page 6: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/6.jpg)
6
Bipartite Graph
• Matrix can be thought of as a Graph• Rows are one set of vertices L, Columns
are another set R• Edges are weighted by the corresponding
entries in the matrix• If all weights are binary, biclustering
becomes biclique finding
![Page 7: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/7.jpg)
7
NP-complete
GenesConditions
![Page 8: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/8.jpg)
8
NP-complete
GenesConditions
![Page 9: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/9.jpg)
9
Ordering an Algorithm
• Bicluster Type• Bicluster Structure• Algorithmic Approach
![Page 10: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/10.jpg)
10
Ordering an Algorithm
• Bicluster Type• Bicluster Structure• Algorithmic Approach
![Page 11: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/11.jpg)
11
Bicluster Types
• Constant values• Constant values on rows or columns• Coherent values• Coherent evolutions
![Page 12: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/12.jpg)
12
Constant ValuesPerfect: aij = µ for all i ∈ I and j ∈ JBlock Clustering, minimize variance:
![Page 13: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/13.jpg)
13
Constant Values on Rows/Cols
Perfect constant Rows: Perfect constant Columns:
![Page 14: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/14.jpg)
14
Constant Values on Rows/Cols
• Normalize rows/cols → constant value• Noise makes perfect biclusters rare
– δ-valid ks-patterns (Califano et al.)
![Page 15: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/15.jpg)
15
Coherent valuesPerfect additive model: Perfect multiplicative model:
![Page 16: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/16.jpg)
16
Coherent valuesPerfect δ-bicluster:
(Cheng & Church) Residue:
Mean squared residue:
![Page 17: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/17.jpg)
17
Overlapping Coherent ValuesPlaid Model or General Additive Model (Lazzeroni and Owen)
=
![Page 18: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/18.jpg)
18
Overlapping Coherent ValuesGaussian Distribution with variance depending on
row, column, and bicluster variance
![Page 19: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/19.jpg)
19
Coherent Evolutions• Order-Preserving Submatrix (OPSM, Ben-Dor et al.)• Order-Preserving Cluster (OP-Cluster, Liu and Wang)• Rank-based approaches
![Page 20: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/20.jpg)
20
Coherent Evolutions• Quantized expression levels into states• Maximize conserved rows/cols (Murali and Kasif)
![Page 21: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/21.jpg)
21
Coherent Evolutions• Identify “significant conditions” with respect to normal levels• Construct a bipartite graph containing only these edges• bicluster is biclique (simple SAMBA, Tanay et al.)
![Page 22: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/22.jpg)
22
Coherent Evolutions• Label “significant conditions” as up or down regulated• Select columns that have the same or opposite effect• (refined SAMBA, Tanay et al.)
![Page 23: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/23.jpg)
23
Ordering an Algorithm
• Bicluster Type• Bicluster Structure• Algorithmic Approach
![Page 24: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/24.jpg)
24
Bicluster Structure
Single Bicluster
![Page 25: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/25.jpg)
25
Bicluster Structure
Exclusive row and column bicluster
![Page 26: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/26.jpg)
26
Bicluster Structure
Non-overlapping checkerboard biclusters
![Page 27: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/27.jpg)
27
Bicluster Structure
Exclusive-rows biclusters
![Page 28: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/28.jpg)
28
Bicluster Structure
Exclusive-columns biclusters
![Page 29: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/29.jpg)
29
Bicluster Structure
Non-overlapping tree structured biclusters
![Page 30: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/30.jpg)
30
Bicluster Structure
Non-overlapping non-exclusive biclusters
![Page 31: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/31.jpg)
31
Bicluster Structure
Overlapping hierarchically-structured biclusters
![Page 32: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/32.jpg)
32
Bicluster Structure
Arbitrary overlapping biclusters
![Page 33: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/33.jpg)
33
Ordering an Algorithm
• Bicluster Type• Bicluster Structure• Algorithmic Approach
![Page 34: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/34.jpg)
34
Algorithmic Approaches
• Iterative row and column clustering combo• Divide and conquer• Greedy iterative search• Exhaustive bicluster enumeration• Distribution parameter identification
![Page 35: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/35.jpg)
35
Iterative Row and Column
• Coupled Two-Way Clustering (CTWC)– Keep sets of row and column clusters– Start with all rows and all columns– Hierarchically cluster based on pairs of
row/column clusters– Identify new “stable” row/column clusters– Repeat until a desired resolution
![Page 36: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/36.jpg)
36
Iterative Row and Column
• Interrelated Two-Way Clustering (ITWC)– Cluster the rows into K groups– Cluster columns into two groups based on
each row group– Combine row and column clusterings– Identify row/column cluster pairs that are very
different from each other– Keep the best 1/3 rows in the heterogeneous
pairs– Repeat
![Page 37: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/37.jpg)
37
Divide and Conquer
• Block Clustering (Hartigan)– Sort by row or column mean– Find best row or column split to reduce “within
block” variance– Continue, alternating row or column splits– Stop when arrive at desired K blocks
• Very fast, but likely to miss good biclustersdue to early splits
![Page 38: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/38.jpg)
38
Greedy Iterative Search
• δ-biclusters (Cheung & Church)– Find biclusters with mean squared residue < δ– Iterative procedure
• Remove the row/col that reduces H the most• Add rows/cols that do not increase H
– Stop when H < δ– Mask bicluster with random values– Repeat to find next bicluster
![Page 39: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/39.jpg)
39
Exhaustive Bicluster Enumeration
• Statistical-Algorithmic Method for BiclusterAnalysis (SAMBA, Tanay et al.)– Conversion to bipartite graph– Equivalent to selection of heaviest subgraphs
• Assumes rows have d-bounded degree– Report the K heaviest bicliques
![Page 40: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/40.jpg)
40
Distribution Parameter Identification
• Plaid Model (Lazzeroni and Owen)– Given K-1 biclusters, select the Kth bicluster
that minimizes sum of squared errors
– Sort of like EM, determine θ from ρ and κ, then ρ from θ and κ, then κ from θ and ρ
– “Fuzzy” membership until end (ρ, κ ∈ [0,1])
![Page 41: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/41.jpg)
41
Ordering an Algorithm
• Bicluster Type• Bicluster Structure• Algorithmic Approach
![Page 42: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/42.jpg)
42
Comparison of Biclustering Algorithms
![Page 43: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/43.jpg)
43
Other Applications
• Information Retrieval– Targeted marketing– Text mining
• Dimensionality Reduction– automatic subspace clustering
• Electoral Data Analysis• much more…
![Page 44: Biclustering Algorithms for Biological Data Analysis€¢ If all weights are binary, biclustering becomes biclique finding. 7 NP-complete Genes Conditions. 8 ... • bicluster is biclique](https://reader034.fdocuments.us/reader034/viewer/2022051801/5ae374fe7f8b9a495c8d3eb7/html5/thumbnails/44.jpg)
44
Conclusions
• Biclustering is useful for BioInformaticsand many other research areas
• NP-Complete• Choice of bicluster type and structure with
an algorithmic approach → Algorithm