Visual Data Mining for Quantized, Spatial Data€¦ · Visual Data Mining for Quantized, Spatial...

Visual Data Mining for Quantized, Spatial Data

Amy BravermanJet Propulsion Laboratory

California Institute of TechnologyMail Stop 169-237

4800 Oak Grove DrivePasadena, CA 91109-8099

[email protected]

QuickTime™ and a Microsoft Video 1 decompressor are needed to see this picture.

Outline

1. Motivation.

2. Approach.

3. AIRS data collection.

4. Quantization.

5. Visual data mining (I).

6. Visual data mining (II).

7. Hierarchical Quantization.

8. Visual data mining (III).

9. Summary.

• Earth Observing System satellites return “massive” data volume.

• Traditional approach to data exploration: produce maps of one degree averages and standard deviations for each parameter of interest.

• Good news: this is easy, practical, and everybody understands it.

• Bad news: the method throws away almost all of the distributional information in the data including covariance and higher-order statistics.

• Need: to “mine” the data, i.e. how do characteristics of joint distributions change in (time and space) and across resolutions?Characterize forcings and feedbacks.

Motivation

• New approach: produce an estimate of the joint (empirical) probability distribution of variables of interest within each one degree grid cell.

• Use a clustering algorithm such as K-means to partition data into groups, represent each group by its centroid and (normalized) membership count.

• Collection of all 180 x 360 = 64,800 grid cell distribution estimates is a proxy for the original data.

• How to find relationships? We need to visualize multivariate relationships while maintaining spatial context.

Approach

AIRS Data Collection

QuickTime™ and a YUV420 codec decompressor are needed to see this picture.

135 footprints

AIRS Granules

1 degree lat/lon

1500 km

2250 km

90 footprints

Geographic space

Nk = 1[x ∈ k]n=1

N

∑

x1x2

xN 1

1

1

yK

y1

y 2

N1

N2

NK

yk =1

N k

xn1[x ∈ k]n=1

N

∑

X Y = E(X |Y )

High-dimensional data space

(!)

Quantization

Visual Data Mining (I)

• Data: 11 AIRS channels observed over 3 days (July 20-22, 2002).

• Compare joint distributions among grid cells:

• Are the grid cell data homogeneous or heterogeneous?

• What physical processes account for the shapes of the representatives and the distribution?

• What physical processes might account for differences between grid cells?

• Are there “outliers”?

• Data in this region: 10,498 clusters representing 60,681 observations.

• Can we summarize the whole region as one?

Visual Data Mining (II)

W

Y j = E(X j |Y j )

Y = Y jI(V = j)j=1

4

∑

P(V = j) =N j

N j∑

W = E (Y |W )

X = X jj=1

4

∑ I(V = j)

δ(X j ,Yj ) = E X j − Yj

2

δ(X,Y) = E X −Y 2

δ(Y,W ) = E Y −W 2

δ(X,W ) = δ(Y,W ) + δ(X j ,Y j )j=1

4

∑ P(V = j)

1 degree1 degreeX1

X2

X3 X 4

Y4

Y3

Y2

Y1

Y

Hierarchical Quantization

• From where do the clusters come?

More questions:

• How do the distributions change as you move from east to west? (Suggested approach: subdivide the region into western half and eastern half. Summarize separately and compare to each other and summary of the whole. Subdivide again, etc.) North to south?

• What other regions are similar to this one? Are they the ones we expect based on physics? Does spatial resolution matter for answering the question? If so, how?

• Where are the regions of high complexity (variability or distribution entropy)? Do the physics support this?

• How does the regression of channel 1 on channel 2 change spatially?

Visual Data Mining (III)

Summary

• Accept coarser spatial resolution (one degree) to achieve replication and estimate distributions.

• Explore quantized data interactively by comparing distributions at different levels of aggregation and in different locations (and times).

• We are mining the data, not making inferences. No spatial statistical models.

• AIRS data will be available at http://daac.gsfc.nasa.gov/atmodyn/airs/index.html.

• More information about AIRS: http://www-airs.jpl.nasa.gov.

Visual Data Mining for Quantized, Spatial Data€¦ · Visual Data Mining for Quantized, Spatial...

Documents

Transcript of Visual Data Mining for Quantized, Spatial Data€¦ · Visual Data Mining for Quantized, Spatial...