2013.10.24 big datavisualization
-
Upload
sean-kandel -
Category
Technology
-
view
6.622 -
download
0
description
Transcript of 2013.10.24 big datavisualization
Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta
How can we visualize and interact with billion+ record
databases in real-time?
Two Challenges:1. E!ective visual encoding2. Real-time interaction
Perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the
number of records.
Perception
Data Sampling
ModelingBinning
Google Fusion Tables (Sampling)
imMens (Binned Aggregation)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
Number of Bins?
100,000 Data Points Rectangular BinsHexagonal Bins
Hexagonal or Rectangular Bins?
Hex bins better estimate density for 2D plots,but the improvement is marginal [Scott 92], whilerectangles support reuse and query processing.
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
[1] Wickham 2013
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection
2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
4. Plot Visualize the aggregate summary values
Plot: Visual Encoding
Choose Most E!ective Encoding [Cleveland & McGill ’84]
1D Plot -> Position or Length EncodingHistograms, line charts, etc.
2D Plot -> Area or Color EncodingSpatial dimensions (x, y) already allocated.While less e!ective than area for magnitude estimation, color can be used at the per-pixel level and provides an overall “gestalt”
Standard Color RampCounts near zero are white.
-> Outliers are missed
Add Discontinuity after ZeroCounts near zero remain visible.
-> Outliers can be seen
Linear Alpha Interpolationis not perceptually linear.
Cube-Root Alpha Interpolationapproximates perceptual linearity.
Color Encoding
Luminance (in range 0-1)
Min. Non-Zero Intensity (α=0.15) [1] Perceptual Scaling (γ=1/3) [2]
User-Adjustable Min/Max Values [3]
[1] Keep small non-zero values visible (outliers!)[2] Match color ramp to perceptual distances[3] Enable exploration across value ranges
Design Space of Binned Plots
Interaction
Interaction Techniques?1. Select Detail-on-Demand2. Navigate Pan & Zoom3. Query Brush & Link
5-D Data CubeMonth, Day, Hour, X, Y
X
Y
256
…
767
512 1023…
Day
Hour
Month
23…
0 1 … 30
0 …
11
1
23…
0…
11
0 1 … 30 0 1 … 30 0
23…
0
11
10
…
10
12 x 31 x 24 x 512 x 512 = ~2.3 billion cells
X
Y
256
…
767
512 1023…
Day
Hour
Month
23…
0 1 … 30
0 …
11
1
23…
0…
11
0 1 … 30 0 1 … 30 0
23…
0
11
10
…
10
Brushing JanuaryMonth, Day, Hour, X, Y
31 x 24 x 512 x 512 = ~195 million cells
Multivariate Data Tiles1. Send data, not pixels2. Embed multi-dim data
Full 5-D Cube
For any pair of 1D or 2D binned plots, the maximum number of dimensions needed to support brushing & linking is four.
Σ Σ Σ Σ
X : 512 bins
Y :
512
bins
~2.3B bins
~17.6M bins (in 352KB!)
Full 5-D Cube
13 3-D Data Tiles
Σ Σ Σ Σ
Query & Render on GPU via WebGL
Pack data tiles as PNG image files,bind to WebGL as image textures.
Query & Render on GPU via WebGL
Σ
Invoke program for each output bin.Executes in parallel on GPU.
Query & Render on GPU via WebGL
Σ
Performance BenchmarksSimulate interaction:brushing & linkingacross binned plots.
- imMens vs. Profiler- 4x4 and 5x5 plots- 10 to 50 bins
Measure time from selection to render.
Test setup:2.3 GHz MacBook Pro (4-core)
NVIDIA GeForce GT 650MGoogle Chrome v.23.0
~50fps querying of visualsummaries of 1B data points.
In-Memory Data Cube
imMens
Number of Data Points
5 dimensions x 50 bins/dim x 25 plots
[1] Lins et. al. Infovis 2013
[2] Sismanis et. al. SIGMOD 2002
NanoCubes
[1] Lins et. al. Infovis 2013
NanoCubes
ResourcesimMens vis.stanford.edu/projects/immensTableau Public tableausoftware.com/publicBigVis (R) github.com/hadley/bigvisNanocubes nanocubes.netBlinkDB blinkdb.orgMapD geops.csail.mit.edu/docs/
AcknowledgmentsZhicheng “Leo” LiuBiye Jiang
Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta