Neighboring Feature Clustering

1

Neighboring Feature Clustering

Author: Z. Wang, W. Zheng, Y. Wang,

J. Ford, F. Makedon, J. Pearlman

Presenter: Prof. Fillia Makedon

Dartmouth College

2

What is Neighboring Feature Clustering

Given an m × n matrix M,where m denotes m samples and n denotes n (ordered) dimensional features, the goal is to find a intrinsic partition of the features based on their characteristics such that each cluster is a continuous piece of features.

We assume there is a natural ordering of features that has relevance to the problem being solved

– E.g., in spectral datasets, such characteristics could be correlations– For example, if we decide feature 1 and 10 belong to a cluster,

feature 2 to 9 should also belong to that cluster. – ZHIFENG: PLEASE IMPROVE THIS SLIDE, PROVIDE AN

INTUITIVE DIAGRAM

3

MR spectral features and DNA Copy Number???

MR spectral features are highly redundant suggesting that the data lie in some low-dimensional space (ZHIFENG: WHAT DO YOU MEAN BY LOW DIMENSIONAL SPACE - CLARIFY)

Neighboring spectral features of MR spectra are highly correlated

Using NFC, we can partition the features into clusters. A cluster can be represented by a single feature, hence

reducing the dimensionality. This idea can be applied to DNA copy number analysis too.

Zhifeng: Yuhang said these two are not related!! Please explain how these are related.

4

Use MDL method to solve NFC

Reduce NFC into a one dimensional piece-wise linear approximation problem.

Given a sequence of n one dimensional points <x1,...,xn >, find the optimal step function-like line segments that can be fitted to the points

Fig. 1. Piecewise linear approximation [3] [4] is usually 2D. Here

we use its concept for a 1D situation. We use minimum description length (MDL) method [2] to

solve this reduced problem. Zhifeng: define and explain MDL

5

Minimum Description Length (MDL)

Zhifeng, please provide a slide to define this EXPLAIN HOW THE TRANSFORMATION IS DONE

(AS IN [1]) TO GIVE 1D piece-wise linear approximation.

Represent all the points by two line segments. Trade-off between approximation accuracy and

number of line segments. A compromise can be made using MDL. ??? Zhifeng: it is all very cryptic, pieces of explanation

are missing!

6

Outline

The problem– Spectral data– The abstract problem

Related work– HMM based, partial sum based, maximum likelihood based

Our approach– Problem reduced to 1D linear approximation– MDL approach

7

Reducing NFC to 1D Piece-Wise Linear Approximation Problem 1

Let correlation coefficient matrix of M be denoted as C. LetC∗ be the strictly upper triangular matrix derived from 1−|C|

(entries near 0 imply high correlation between the corresponding two features).

For features from i to j (1 ≤ i ≤ j ≤ n), the submatrix C∗i:j,i:j depicts pairwise correlations. We use its entries (excluding lower and diagonal entries) as the points to be explained by a line in the 1D piece-wise linear approximation problem.

The objective is to find the optimal piece-wise line segments to fit those created points.

Points near 0 mean high correlation. We need to force high correlations among a set. Thus the points are always approximated by 0.

8

example

For example, suppose we have a set with points all around 0.3. In piece-wise linear approximation, it is better to use 0.3 as the

approximation. However in NFC, we should penalize the points that stray away

from 0. So we still use 0 as the approximation. Unlike usual 1D piece-wise linear approximation problem, the

reduced problem has dynamic points (because they are created on the fly).

Zhifeng: provide figure to illustrate above example

9

Spectral data

MR spectral data– High dimensional data points– Spectral features are highly redundant (high correlation)– Find neighboring features with high correlation in a spectral dataset,

such as a MR spectral dataset.

frequeny

inte

nsit

Fig. 1 high dimensional data points

Fig. 2 correlation coefficient matrix

Both axes are the features or the number of dimensions

10

Problem

Finding a low-dimensional space - – zhifeng: define low dimensional space– Curse of dimensionality

We extract an abstract problem: Neighboring Feature Clustering (NFC)

– Features are ordered. Each cluster contains only neighboring features.

– Find an optimal clusters according to certain criteria

11

Another application (with variation)

Array Comparative Genomic Hybridization to detect copy number alterations.

aCGH data are noisy– Smoothing– Segmentation Fig. 3 aCGH technology

Fig. 4 aCGH data (smoothed). The X axis is log ratio

Fig.5 aCGH data(segmented). The X axis is log ratio

12

Related work

An algorithm trying to solve a similar problem– Baumgartner, et al, “Unsupervised feature dimension

reduction for classification of MR spectra”, Magnetic Resonance Image, 22:251-256,2004

An extensive literature on the reduced problem– The, et al, “On the detection of dominant points on digital

curves”, IEEE PAMI, 11(8) 859-872, 1989– Statistical methods…

Fig. 6 1D piece-wise approximation

13

Related work: statistical methods

HMM based– Fridlyand, et al , “Hidden Markov models approach to the an

alysis of array CGH data”, J. Multivariate Anal., 90, 132-153

Partial sum based– Lipson etc., ‘”Efficient Calculation of Interval Scores for DN

A copy Number Data analysis”, RECOMB 2005

Maximum likelihood based– Picard, etc., “A statistical approach for array CGH data anal

ysis”, BMC Bioinformatics, 6:27,2005

14

Framework of the method proposed

3. MDL code length (revised)

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

nn

n

n

C

CC

CCC

C

,

,22,2

,12,11,1

...

...

...

frequency

inte

nsity

1. Correlation coefficient matrix

1 32 n-1 nC1

,2

C2

,3

…C3

,n-1

C2

,n-1

C1

,n-1

Cn

-1,nC3

,nC2,

n

C1,

2

2. For each pair of features

4. Code length matrix

5. Shortest path (dynamic programming)

inte

nsity

frequency

Fig. 7 our approach

15

Minimum Description Length

Information Criterion– A model selection scheme– Common information criteria are Akaike Information Criter

ion(AIC), Bayesian Information Criterion (BIC), and Minimum Description Length (MDL)

– MDL is to encode the model and the data given the model. The balance is achieved in terms of code length

)(log)|(log),(log),( MpMDpMDpMDC −−=−=

Fig. 6 1D piece-wise approximation

16

Encoding model and data given model

For each pair (n*(n-1)/2 in total) of features– Encoding model

Cluster boundary, Gaussian parameter (standard deviation)

– Encoding data given model

d

Fig.8 encoding the data given model for each feature pair

17

Minimize the code length

Code length matrix

Shortest path– Recursive function– Dynamic programming

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

nn

n

n

C

CC

CCC

C

,

,22,2

,12,11,1

...

...

...

1 32 n-1 nC1,2 C2,3 …

C3,n-1

C2,n-1

C1,n-1

Cn-1,n

C3,n

C2,n

C1,2

Fig. 9 alternative representation of matrix C

Fig. 10 Recursive function for the shortest path

18

Results

We test on simulated data.

Fig. 11 the revised correlation matrix and the computed code length matrix

Neighboring Feature Clustering

Documents

Transcript of Neighboring Feature Clustering