Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David...
-
Upload
alban-foster -
Category
Documents
-
view
220 -
download
1
Transcript of Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David...
Association Mining viaCo-clustering of Sparse
MatricesBrian Thompson*, Linda Ness†,David Shallcross†, Devasis Bassu†
* †
Definitions
Let be an matrix. A bicluster of is a subset of matrix entries formed by the intersection of a set of rows and a set of columns , and is denoted by .
Association Mining via Co-clustering of Sparse Matrices
𝑀𝑀 𝐼 , 𝐽
Motivation
Matrices can represent: binary relations, objects and attributes, terms and documents, gene expression, recommender systems, ...
Dense biclusters indicate strong associations
Association Mining via Co-clustering of Sparse Matrices
𝑀𝑀 𝐼 , 𝐽
Motivation
Matrices can represent: binary relations, objects and attributes, terms and documents, gene expression, recommender systems, ...
Dense biclusters indicate strong associations
Association Mining via Co-clustering of Sparse Matrices
𝑀𝑀 𝐼 , 𝐽
Co-Clustering
Co-clustering: Given a matrix, cluster the rows and columns to form large, dense biclusters
Challenges:Don’t know the number or sizes of clusters a prioriWant solution to be efficient and scalableMatrix may be sparse
Association Mining via Co-clustering of Sparse Matrices
R1
R2
R3
C1 C2 C3
Our Approach
We propose a two-step approach:
1. Define a quality metric for bicluster partitions
We consider metrics of the form
(Motivation for this choice is in the 15-minute version of the talk...)
2. Find a co-clustering that maximizes the value of
We propose the CC-MACS algorithm
(Co-Clustering via Maximal Anti-Chain Search)
Association Mining via Co-clustering of Sparse Matrices
The CC-MACS Algorithm
1. Build randomized k-d trees on rows (), cols ()
2. Populate for via DP
3. Initialize MACs and heaps ;
4. While at least one of and is non-empty:
• WLOG let
• Update data structures and variables:, , , for
• If , add to
5. Return co-clustering formed by
Association Mining via Co-clustering of Sparse Matrices
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
Experiments: Synthetic Data
• Generate matrix with biclusters of size selected randomly from ; non-bicluster entries are , each bicluster entry is a with probability
• Want co-clustering output to match ground truth
• Compare via -score:
Association Mining via Co-clustering of Sparse Matrices
Experiments: Real-World Data
• Matrices from domains of finite element modeling and quantum chemistry [src: NIST Matrix Market repository]
Association Mining via Co-clustering of Sparse Matrices
Dataset Original Matrix
Cross-Associatio
n
CC-MACS ()
CC-MACS ()
CC-MACS ()
Concluding Thoughts
• The CC-MACS algorithm runs in time.
• Our approach compared favorably to state-of-the-art and baseline methods for a classification task on synthetic data.
• Choice of metric can affect quality and granularity of results; different metrics may be appropriate for different applications.
• The CC-MACS algorithm effectively identified large, dense biclusters in the datasets evaluated.
Association Mining via Co-clustering of Sparse Matrices
Acknowledgements/Disclaimer
This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-706. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.
Any misinformation, mistakes, or misunderstanding resulting from this talk are solely the fault of the speaker.
Association Mining via Co-clustering of Sparse Matrices
Questions?
Association Mining via Co-clustering of Sparse Matrices
Example Matrices
Spectral methods, which try to rearrange rows and columns to form a diagonal block matrix, would not perform well on this matrix.
The dashed lines suggest a good co-clustering.
Association Mining via Co-clustering of Sparse Matrices