A Nonlinear Mapping for Data Structure Analysis

14
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology A Nonlinear Mapping for Data Structure Analysis John W. Sammon, Jr., IEEE Transaction on Computers, Vol. C-18, No. 5, 1969, pp. 401-409. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/4/4

description

A Nonlinear Mapping for Data Structure Analysis. John W. Sammon, Jr., IEEE Transaction on Computers, Vol. C-18, No. 5, 1969, pp. 401-409. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 200 7 / 4/4. Outline. Introduction Nonlinear mapping Some computer results - PowerPoint PPT Presentation

Transcript of A Nonlinear Mapping for Data Structure Analysis

Page 1: A Nonlinear Mapping for Data Structure Analysis

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A Nonlinear Mapping for Data Structure Analysis

John W. Sammon, Jr., IEEE Transaction on Computers, Vol. C-18, No. 5, 1969, pp. 401-409.

Presenter : Wei-Shen TaiAdvisor : Professor Chung-Chian Hsu

2007/4/4

Page 2: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Outline

Introduction Nonlinear mapping Some computer results Relationship of NLM to other structure analysis

algorithm Limitations and extensions Comments MDS

Page 3: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Motivation

Data structure visualization Provide a highly effective visualization method in the

analysis of multivariate data.

Data structure refers to geometric relationships among subsets of the data vectors in the L-space.

Page 4: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Objective

Nonlinear mapping algorithm (NLM) Based upon a point mapping of the N L-dimensional

vectors from the L-space to a lower dimensional space such that the inherent structure of the data is approximately preserved under the mapping.

Page 5: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Nonlinear mapping

N vectors in an L-space designated Xi, i= 1, …, N and corresponding to these we define N vectors in a d-space (d = 2 or 3) designated Yi, i=l, …, N.

Let the distance between the vectors Xi and Xj in the L-space be defined by dij*=dist [Xi, Xj] and the distance between the corresponding vectors-Yi and Yj in the d-space be defined by dij= dist [Yi, Yj].

A steepest descent procedure to search for a minimum of the error

Nd

N

N

dd y

y

Y

y

y

Y

y

y

Y 1

2

21

2

1

11

1

N

ji ij

ijij

jiij d

dd

dE

*

2*

*

][

][

1

Page 6: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Computer results

Page 7: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

19-dimensional Gaussian simplex distribution

Fig 6. result of NLMFig 7. result of principle

eigenvector plots

Page 8: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Experiments in document classification

A document classification space Every document in the library was represented as a

17-dimensional vector. All of them are described a mapping of 1125 preselected words and phrases into the C-space.

Query 1 ~ 5 and their related documents are shown, respectively. Documents considered relevant to a given request we

re clustered. Documents tend to be uniformly distributed through

out the space. Clusters 2 and 3 tend to overlap, yet they are well-se

parated from clusters 4 and 5. In general, the intercluster relationships seem consistent with their respective subject relationships.

Page 9: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Relationship to other related algorithm

Multidimensional Scaling Find a configuration of points in a t-space such that the resultant

inter-point distances preserve a monotonic relationship to a given set of inter-element similarities (or dissimilarities).

Deficiencies Resulting cluster configuration is highly dependent upon a set of

control parameters which must be fixed by the user. Particularly sensitive to hyper-spherical structure and are inefficient

in detecting more complex relationships in the data. Do not exist really good ways for evaluating a resultant cluster

configuration. When two clusters are close, the vectors between tend to form a

bridge and cause spurious mergers.

Page 10: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Nonlinear mapping vantage

A highly promising structure analysis algorithm

1. None control parameters require a priori knowledge.

2. Highly efficient in identifying complex data structures.

3. Easy to detect and identify data structure.

4. Dealing extraneous data and spurious mergers.

5. Simple and efficient.

Page 11: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Limitation and extension

Limitations Reliability of the scatter diagram in displaying extremely

complex high-dimensional structure. Minimum mapping error is too large (E>>0.1) and the 2-

dimensional scatter plot fails to portray the true structure. Number of vectors that it can handle.

Limited at present to N< 250 vectors. When N> 250, we suggest using a data compression technique to

reduce the data set to less than 250 vectors.

Extension On-Line Pattern Analysis and Recognition System (OLPARS)

Page 12: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

Comments

Advantage A visualization method for hyper-space data. The distance of data space can be preserved and

interpreted in geometric relationship in the low-dimension map.

Drawback Easy to learn and hard to compute. The computational cost seems quite high.

Application Data structure visualization related applications.

Page 13: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

MDS A very simple example, using mileage distances between cities.

1. Start with a map, which illustrates the relative geographic locations of a set of American cities.

2. The map is a geometric model in which cities are represented as points in two-dimensional space. The distances between the points are proportional to the geographic proximities of the cities.

3. Using the map/model it is easy to construct a square matrix containing the distances between any pair of cities.

4. The matrix, itself, is analogous to the mileage chart that is often included with road maps.

Page 14: A Nonlinear Mapping for Data Structure Analysis

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

MDS algorithm MDS uses the matrix of distances (i.e., the “mileage

chart”) as input data. The output from MDS consists of two parts:

A model showing the cities as points in space, with the distances between the points proportional to the entries in the input data matrix (i.e., a map).

A goodness-of-fit measure showing how closely the geometric point configuration corresponds to the data values from the input data matrix.