Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National...

23
Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive Datasets

Transcript of Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National...

Page 1: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Dynamic Visualization of Transient Data Streams

P. Wong, et alThe Pacific Northwest National Laboratory

Presented by John SharkoVisualization of Massive Datasets

Page 2: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Characteristics of Data Streams

• Arrives continuously

• Arrives unpredictably

• Arrives unboundedly

• Arrives without persistent patterns

Page 3: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Examples of Data Streams

• Newswires

• Internet click streams

• Network resource management

• Phone call records

• Remote sensing imagery

Page 4: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Visualization Problem

• Fusing a large amount of previously analyzed information with a small amount of new information

• Reprocess the whole dataset in full detail

Page 5: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

First Objective

• Achieve the best understanding of transient data when influx rate exceed processing rate

Approach: Data stratification to reduce data size

Page 6: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Second Objective

• Incremental visualization technique

Approach: Project new information incrementally onto previous data

Page 7: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Primary Visualization OutputMultidimensional Scaling

OJ Simpson trial

French elections

Oklahoma bombing

Page 8: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Adaptive Visualization Using Stratification

Page 9: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Methods for Adaptive Visualization

• Vector dimension reduction

• Vector sampling

Page 10: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Vector Dimension Reduction

Approach: dyadic wavelets (Haar)

200 terms

100 terms

50 terms

Page 11: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Results of Vector Dimension Reduction

200 10050

Dimensions

Page 12: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Results of Vector Sampling

3298 1649 824

Number of Documents

Page 13: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Scatterplot Similarity Matching

Page 14: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Scatterplot Similarity Matching

Procrustes Analysis Results

200 100 50

All 0.0 (self) 0.022 0.084

1/2 0.016 0.051 0.111

1/4 0.033 0.062 0.141

Page 15: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Incremental Visualization Using Fusion

• Reprocessing by projecting new items onto existing visualization

• Feature: reprocessing the entire dataset is often not required

Page 16: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Hyperspectral Image Processing

• Apply MDS to scale pixel vectors

• K-mean process to assign unique colors

• Stratify the vectors progressively

Page 17: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Robust Eigenvectors

Generate three MDS scatter plots for each third of the image

Page 18: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Robust Eigenvectors (cont’d)Generate MDS scatterplot for entire dataset

Page 19: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Robust Eigenvectors (cont’d)

Extract points from cropped areas

Page 20: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Using Multiple Sliding Windows

Eigenvectors determined by the long window

New vectors are projected using the Eigenvectors of the long window

Data Stream

Long Window Short Window

Sliding Direction

Page 21: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Dynamic Visualization Steps

1. When influx rate < processing rate, use MDS

2. When influx rate > processing rate, halt MDS

3. Use multiple sliding windows for pre-defined number of steps

4. Use stratification approach for fast overview

5. Check for accumulated error using Procrustes analysis

6. If error threshold not reached, go to step 3

If error threshold reached, go to step 1

Page 22: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.

Conclusions

• The data stratification approach can substantially accelerate visualization process

• The data fusion approach can provide instant updates

Page 23: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive.