Unternehmensgedächtnis &...
Transcript of Unternehmensgedächtnis &...
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics
Multimedia Information Systems VO/KU (707.021/707.022)
Vedran Sabol
KMI, TU Graz
Jan 7 & 14, 2013
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics
Structure of the lectures
• 7.1.2013
Motivation
Introduction to Visualization and Visual Analytics
Visualization examples and demos
• 14.1.2013
In depth analysis of selected Visual Analytics methods
Algorithms, visual interfaces, architecture
Problem solving with Visual Analytics - examples
2
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Motivation
• We are confronted with:
Massive amounts of information
Dynamically changing data sets
Incomplete and conflicting information
Complex knowledge structures, relationships, networks
Multi-dimensional knowledge objects
Heterogeneous information
• Multimedia, geo-spatial, sensory data,…
• Structured and unstructured information
…
How can computers help us to understand and utilize our data?
Explore, analyse, understand
Unveil important facts and knowledge hidden within the data
3
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Motivation Knowledge Discovery Process
• Knowledge Discovery Process [Fayyad, 1996]
Mainly an automatic approach consisting of a chain of processing steps
Goal: discovery of new, relevant, previously unknown patterns and relationships in data
4
Feedback Target Data
Transformed
Data
Patterns &
Models
Preprocessed
Data
Data
USER
Knowledge
Preprocessing & Cleaning
Data Transformation
Data Mining & Pattern Discovery
Interpretation & Evaluation
Data Selection
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Motivation
• Machines are very powerful
Automatic processing methods for huge data sets
Exponential growth of computer-performance since 60 years
• Moor‘s Law: continues until 2020, 2030… ?
Distributed computing: Cloud, Grid, …
• Nevertheless, machines still behind humans in
Identification of complex patterns and relationships
Wide knowledge and experience
Abstract thinking
…
5
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Motivation
• Human visual apparatus is an extremely efficient „processing machine“
• Enormous amounts of information are transferred by the visual nerve into the brain cortex
• Visual cortex remains unbeatable in recognition of objects and complex patterns (for example rotational invariance)
• Pre-attentive processing
6
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Pre-attentive Processing
• Capability to process certain visual information without focusing our attention
• Criterion 1: Processing time < 200 - 250ms
Eye movements in about 200ms highly parallel processing
• Criterion 2: Processing time does not correlate with the amount of noise in the data
7
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Pre-attentive Processing
It is immediately possible to determine which data set contains a red spot Pre-attentive processing possible
8
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Pre-attentive Processing
It is still possible to quickly determine where the red spot is Borderline case
9
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Pre-attentive Processing
Scanning is necessary to determine where the red spot is Pre-attentive processing not possible
10
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Motivation Visualization
• Solution
Employ the human visual system for pattern recognition
Use machines to transform the data into a suitable graphical representation
• Challenges
How should the graphical representations look like (design)?
Which operations shall be supported on the graphical representation (interactivity)?
How to compute the graphical representation (algorithms)?
11
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
What is Visualization?
Definitions
“Transformation of the symbolic into the geometric.” [McCormick et al., 1987]
“The depiction of information using spatial or graphical representations to facilitate comparison, pattern recognition, change detection, and other cognitive skills by making use of the visual system. “ [Hearst, 2003]
The use of visual representation to aid cognition
Graphical representation of data, information and knowledge
Use of human visual system, supported by computer graphics, to analyze and interpret large amounts of data
…
12
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples
13
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics
14
• Abundance of data:
• problems are becoming too large to be addressed by visualization alone
• Limited resources of the visual front end
• Combine machine processing with human capabilities in a suitable way and get the best of both worlds.
• Integrate humans in the analytical process
• Provide means for explorative analysis
• Visual Analytics: a young research field (2005)
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics
15
• Combines automatic methods with interactive information/data/knowledge visualisation to get the best of both worlds [Keim 2008]
• Supports analytical reasoning facilitated by interactive visual interfaces [Thomas 2005]
• Focuses on interaction between humans and machines through visual interfaces to derive new knowledge
Repository
New Insights and Knowledge
Algorithms Visualization
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics
16
• Main Idea (Mantra): “analyse first – show the important – zoom, filter and analyse further – details on demand” [Keim 2008]
Initial analysis and visual pattern recognition
Posing a hypothesis
Further analysis steps (automatic and interactive)
Confirmation or rejection of the hypothesis: new facts
Confirm the expected, discover the unexpected
• Challenges [Keim 2009]
Balance between automatic and interactive analysis
Design of effective VA workflows
Data quality
Scalability
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Analytics in the Web
17
• HTML5 provides the basis for visualization in the Web
• Rich, responsive user interfaces
• AJAX
• New elements, advanced forms
• Rendering and visualization
• Canvas
• SVG
• Logic and Interactivity
• JavaScript
• Server-Client Web architectures fit the needs of Visual Analytics
• Model View Controller (MVC) architecture
• Data storage and crunching on the server
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Design Representation Forms
• Fundamental categories of visual representation:
Formalisms
Metaphors
Models
• Formalisms: abstract schematic representations
Defined by a designer
Users must learn how to read and interpret
Example: Percentage is represented by an arc
18
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Design Representation Forms
• Metaphors: representations based on a real-world equivalent
Intuitive
User can understand the meaning through building analogies
Example: using the geographic map metaphor to represent similarity in non-spatial data
• Models: based on mental representations of real physical world
Data typically has a natural representation in the real world
Examples: visualization of sensory data in 3D, virtual 3D worlds
19
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Data – Information - Knowledge
• Data/Scientific Visualization
• Information Visualization
• Knowledge Visualization
20
Data Knowledge Information
Representation complexity, applicability by humans
Machine processing capability
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Data – Information - Knowledge
• Data
Formal representation of raw, basic facts
Have a fixed format: numbers, dates, strings,…
Have a fixed, predefined meaning (i.e. no interpretation required)
„3162“ – Hotel room number (not a telephone number)
• Information
Result of processing, manipulation and interpretation of data
May not have a fixed format (unstructured or semi-structured)
Meaning is determined by interpretation within some context
“A small, white mouse” – a computer or a field mouse? (determined by context)
21
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Data – Information - Knowledge
• Knowledge
Identified, organized and as valid recognized information
Representations of reality through abstract, domain-dependent models
Represented by formalized conceptual systems: Taxonomies, Thesauri…
Ontologies are formally defined knowledge representations consisting of concepts, relations and rules (axioms)
• complex graph-structures
22
Animal
Mouse is a Legs
has
Jerry is a
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Systems Categorization Depending on Data
• Data/Scientific Visualization
• Sensory data
• 3D spaces
• Knowledge Visualization
• Knowledge models
• Information Visualization
• Document content: text and multi-media
• Multidimensional data sets
• Structures: hierarchies and networks (graphs)
• Temporal information
• Geo-spatial information
• Multiple data types/aspects
23
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Data/Scientific Visualization
• Visualization of simulation or sensory data
have a natural representation in the real, physical world
• Applications in physics, medicine, astronomy, industry, …
24
Pressure coefficients [NASA] Coil magnetic field
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Data/Scientific Visualization
25
Weather monitoring - wind direction Monitoring Myocardial Infarctions
using ECG data [University of Dublin]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Knowledge Visualization
• Knowledge Visualization is about using visual representations to present and transfer existing (explicit) knowledge between people [Eppler]
• The focus is on structured knowledge spaces
Concepts, relations, facts, attributes
Navigation along structures present in the knowledge model
• Use of metaphors is common
26
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Knowledge Visualization Examples
27
Stairs of Visualisation [Eppler] (Let‘s Focus: http://en.lets-focus.com/ )
Research Map [Bresciani]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Knowledge Visualization Examples
Gyro, Know-Center [Kienreich]
28
Cultural Heritage Visualization Ancient Theatres [Blaise]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Information Visualization
• Interactive visualization of abstract information spaces
Abstract information has no „natural“, real-world representation
Rely on metaphors and formalisms
• Goal: identifying patterns and relationships
Explorative analysis and navigation
Unveiling of implicit knowledge
• InfoVis Mantra [B. Shneiderman]
„overview first - zoom and filter - details on demand”
Compare to the Visual Analytics mantra
29
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Document Content Summary
MovieDNA [Ponceleon] TileBars [Hearst]
TagClouds, Know-Center [Seifert]
30
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multidimensional Data
Scatterplot [Nowell]
(Demos: http://www.highcharts.com/demo/scatter, http://mbostock.github.com/d3/talk/20111116/iris-splom.html)
Parallel Coordinates [Inselberg]
(http://mbostock.github.com/protovis/ex/cars.html)
31
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multidimensional Data Similarity - Text
Know-Center [Sabol et al.]
Galaxies (SPIRE), PNNL [Wise]
32
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multidimensional Data Similarity - Images
Image Similarity Layouts [Rodden]
33
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Hierarchies
TreeMaps [Shneiderman]
(http://philogb.github.com/jit/static/v20/Jit/Examples/Treemap
/example1.html#)
Hyperbolic Tree (InXight) [Lamping]
(http://ucjeps.berkeley.edu/map2.html)
34
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Hierarchies
35
InfoSky, Know-Center [Andrews et al.]
Circle Packing, D3 library (http://mbostock.github.com/d3/talk/20111116/pack
-hierarchy.html)
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Hierarchies
Walrus, CAIDA
Information Pyramids [Andrews]
36
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Graphs
37
Gephy, https://gephi.org/
Narcissus [Hendley]
(Web-Demos: • Small: http://mbostock.github.com/d3/talk/20111116/force-collapsible.html • Medium: http://sigmajs.org/examples/gexf_example.html)
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Graphs
38
Edge-Bundling [Holten & van Wijk]
Concept Networks [Kienreich] (Know-Center)
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Temporal Data
39
Spiral geometry [Carlis] Perspective Wall [Mackinlay]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Temporal Data
40
LifeLines [Plaisant] Themeriver, PNNL [Havre]
(Demos: http://vis4.net/labs/as3streamgraph/ , http://bl.ocks.org/4060954)
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Geo-Spatial Data
41
Google Maps
APA-Labs component, by Know-Center
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Geo-Spatial Data
42
LucentVision [Pingali 2001]
Planetarium, Know-Center [Kienreich]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multiple Data Aspects – Geo-Temporal
GeoTime, Oculus [Kapler]
43
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multiple Data Aspects – Immersive 3D Environments
Starlight, PNNL [Risch et al.]
44
Caleydo, ICG, TU Graz [Lex et al.]
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization Examples Multiple Data Aspects – Coordinated Multiple Views
Coordinated Multiple Views
• Multiple visualizations “fused” into a single, coherent user interface
• Each visualization designed to convey a different aspect of the data
simultaneous navigation and analysis over multiple data aspects becomes possible
• Coordination of state an behavior
• interactions in one visual component influence all others
• Selection, filtering, visual properties, navigation, …
45
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Coordinated Multiple Views
Spotfire DecisionSite [Schneiderman]
46
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Coordinated Multiple Views Media Watch on Climate Change
ECOResearch Portal - Media Watch on Climate Change: http://www.ecoresearch.net/climate/
47
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Web Visualization Toolkits and Rendering Libraries
• D3 (Data-Driven Documents): http://d3js.org/
Protovis: http://mbostock.github.com/protovis/
• JavaScript InfoVis Toolkit: http://philogb.github.com/jit/index.html
• Raphaël: http://raphaeljs.com/
• Charting: jqPlot: http://www.jqplot.com ; gRaphaël: http://g.raphaeljs.com/ ; NVD3:
http://nvd3.org/ ; canvaseXpress: http://canvasxpress.org/ ; High-Charts: http://www.highcharts.com/ …
48
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Performance and scalability
• Scalability limited by hardware
Number of pixels on screen
Computing power of the client
• SVG: thousands of items
• Canvas: at least one order of magnitude better
• WebGL: potentially millions of items
– not officially part of HTML5
• Usability issues
Clarity of the representation may be compromised by clutter
Orientation and navigation in large data
• How to scale to large (huge) data sets
Millions (or billions) of data elements.
49
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Level of Detail
• Variable level of detail (LOD)
Technique known from 3D environments
Decrease complexity of representation for “far-away” objects
• Coarse-grained view of the whole data space
• Provide more details when zooming-in
50
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Geovisualisation Google Maps
51
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Aggregation
• Abstract data sets do not have a geometric structure appropriate for LOD
• Structure and organize the data space hierarchically
• Compute a hierarchical visual representation
Coarse-grained view of the whole data space
Render more details as zooming in
Navigate and explore along the hierarchical structure
• Two examples in the following:
Text Visualization (unstructured data)
Semantic Graph Visualization (structured data)
52
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visualization of Text Data
• Text remains an essential data type in many domains
• Challenges
Text is not pre-attentive
Text is „non-visual“, i.e. has no „natural“ visual representation
Composed of abstract concepts and complex relationships between them
Described by a very high amount of features (dimensions)
Vague and ambiguous
• Synonyms, Homonyms
Context
• Interpretation by humans
53
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Selected Visualization Techniques Visualization of Text Data
Questions
What are the dominant topics (and entities, such as organizations and persons) mentioned in the data set?
How do documents correspond topically to each other?
What are the relationships between the dominant topical clusters?
How do topical clusters relate in size?
How do topics develop over time (major trends)?
Are there correlations between topics, entities and temporal developments?
…
54
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Projection Processing Pipeline Summary
Natural Language Processing, Feature Engineering
Text Data
Mathematical Model: Vector Space Model
Similarity/Distance Metrics
Similarities/Distances
Clu
ster
ing
&
lab
ellin
g
Projection Rendering
Aggregation (Hierarchy) “virtual table of contents” Geometry Visualization
55
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Feature Engineering
Identify features describing document content
Apply natural language processing (NLP) methods
• Sentence detection and part-of-speech (POS) tagging: nouns, verbs, adjectives…
• Named entity recognition (NER): organizations, persons, locations, dates…
• Stemming: reduce words to root form
• Stopword filtering
• …
“Organized by government, services of commemoration are being held in Germany to mark the end of World War I in 1918. ...”
56
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Vector Space Model Bag of Words
Each document represented as vector of terms
• d1: “Services of commemoration are being held around the world to mark the end of World War I in 1918. ...”
• d2: “World War I (abbreviated as WW-I, WWI, or WW1), also known as the First World War ...”
• d3: “We offer world wide service”
57
servic commemor world end war
d1 1 1 2 1 1
d2 2 2
d3 1 1
Weighting (TF/IDF)
Feature selection Feature Vectors
Texts
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Similarity Metrics
Ideally: semantic similarity (elevator = lift)
In practice: statistical similarity
• (Euclidean) Distance between vectors
– Between 0 and infinity
• Cosine similarity: depends on the angle between vectors
– Between 0 and 1
58
Distances Feature Vectors
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Clustering Definition
Grouping of data points so that those in the same cluster are more similar to each other than to those in other clusters
Given a set of data points
Find groups C1 to Ck (k < n) of data points which optimize a given criterion
• Within Cluster Criterion: Maximize similarity (or minimize distance ) of data elements within one cluster
• Between Cluster Criterion: Minimize similarity (or maximize distance) of data elements from different clusters
59
},,,,{ 121 nn xxxxX
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Clustering Cluster Representation
60
Centroid: sum of the data point vectors (center of gravity)
Medoid: „best“ data point in the cluster
• the one closest to the center of gravity
Selected subset of cluster elements
All cluster elements
Convex Hull
…
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Clustering Labeling a Cluster
61
Labels enable the user to interpret the cluster
Computation of the most important features of a cluster
• Centroid-Heuristic: 5-10 features with the highest weight
• „similarity; clustering; k-means; centroid;“
Discriminative analysis between clusters
• Documents on computers “computer” will appear in each label
– Descriptive but useless for discriminating between clusters
• Instead use best features discriminating between data points
– Features frequently appearing only in a fraction of data points
– „operating systems“, „programming languages“ etc.
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Clustering Application
62
Browsing document collections
• Apply clustering recursively to compute a cluster hierarchy
• Use the labeled hierarchy as “virtual table of contents”
Feature Vectors
Distances
Cluster hierarchy
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
K-means Clustering
Partitional method: partitions the data set into k clusters
Given: document term vectors, number of clusters k
Overview of the algorithm
1. Seeding: choose k documents, use their vectors as cluster centroids
2. Compute similarity of documents to centroids, assign each document to the most similar cluster
3. Centroid update: add documents vectors to the centroid with the highest similarity
4. Goto 2 until (i) no data points move between the clusters, or (ii) iteration count has reached a predefined threshold I
Converges to a local optimum
• Several passes over data usually sufficient (I very small)
63
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
K-means Clustering Properties
Disadvantages
• Number of clusters must be given a-priori
– Constrained by application scenarios (e.g. browsing)
• Sensitive to initial seed choice (often random)
Advantages
• Runtime complexity: O(Ikn)
– I & k << n, both usually with an upper bound: O(n)
• Creates hyperspherical clusters
– Good for high-dimensional spaces (e.g. text)
– May underperform in low-dimensional spaces
64
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Ordination
How to visualize high-dimensional data sets
Projection into a „smaller“ (2D) visualization space which can be understood by users
• Navigation and explorative analysis in the projection space
Dimensionality reduction techniques
• Projection of the high-dimensional space into a lower dimensional
• Preservation of distances/similarities
– Related data items placed close together
• Usability and aesthetics play a role
65
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Ordination Methods
Distance-/similarity-preserving methods
• Multidimensional scaling
• Input is a distance-/similarity matrix
• Dimensions of the low-dimensional space have no meaning and no relation to the original dimensions
Transformations of the feature space
• Principal Component Analysis
• Self Organizing Maps
• Input are high-dimensional feature vectors
• Dimensions of the low-dimensional space may be related to the high-dimensional space
66
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Multidimensional Scaling Motivation
Example: Dissimilarity between car makers
Which car makers are similar?
Which car makers build groups?
Impossible to read from large distance(/similarity) matrices
Siehe http://www.wiwi.uni-wuppertal.de/kappelhoff/papers/mds.pdf
67
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Multidimensional Scaling
Projection into a 2D space with preserving the original distances
See: http://www.wiwi.uni-wuppertal.de/kappelhoff/papers/mds.pdf
68
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Multidimensional Scaling
69
See: http://www.wiwi.uni-wuppertal.de/kappelhoff/papers/mds.pdf
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Multidimensional Scaling Example: 2D to 1D space
1 2
3
a
b
c
1 2 3
~a
~b
~c
x y
1 0.2 1
2 0.5 1
3 1.5 0.2
a= dist(1,2)= 0.3 b= dist(2,3) = 0.6 c= dist(1,3) = 1.55
Distance computation
1 2 3
1 0 0.3 1.55
2 0.3 0 0.6
3 1.55 0.6 0
Information loss is inherent!
70
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Multidimensional Scaling Force-Directed Placement
• Heuristic, iterative multidimensional scaling method
• Spring model simulates a physical system
Computes positions and forces between objects
Force depends on similarity between objects in the original space
Similar object attract, dissimilar objects repulse each other
• Advantages: simple to implement, parametrizable, good layout quality, suitable for visualization
• Issues:
Tends to get stuck in local minima
Not scalable: O(n3) time-complexity for the brute-force version
71
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Force-Directed Placement Algorithm
• Iteration: for each document i
For every other document j
totalForce_i += force(i, j)
Move object into the direction of the total force
Stop condition:
object movements have subsided, positions have stabilized sufficiently
Alternative: stress computation (computationally intensive)
• Layout quality evaluation: stress measure
Difference between pairwise distances in high- and low-dimensional spaces
72
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Force-Directed Placement Basic Force Model
),(),(),( jihighjilowji dddistdddistddforce
2,,
2,, )()(),( yjyixjxijilow dddddddist
ji
jilowjihighji dddistdddistN
ddstress 2)),(),((1
1),(
km
kjkijihigh wwdddist2
,,),(
• Attempts to reconstruct the original distances low-dim space
• Scaling of distances may be unsuitable for visualization
• No parameterization possibilities
lowhigh distdist
idjd
lowhigh distdist
id jd
73
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Force-Directed Placement Improved Force Model
gravdddist
pddsimddforce
r
ji
d
jiji ),(
),(),(
Similarity in original space
Distance in projection space
Constant
22 ||||*||||),(
ji
ji
jivv
vvddsim
Repulsive force
• First term: attractive force proportional to similarity
• Second term: rapidly rising, short-distance range repulsive force
Prevents „gravitational collapse“ of similar items
• Third term: weak cohesive force to prevent endless expansion of non-similar data elements
• Parameterization possibilities usable, visually appealing layouts
74
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Force Directed Placement Item Position Computation
N
ijj
ijjiii xdxdddforceN
xdxd,1
)..)(,(1
1..
N
ijj
ijjiii ydydddforceN
ydyd,1
)..)(,(1
1.. id
1d
3d
2d
Force
Resulting Force xdxdxdxdxdforce
xdxdxdxdxdforce
iijii
jijii
.)..(0..0
.)..(1..1
• More complex models consider additional physical components
Friction, viscosity, acceleration, momenta, …
Requires solving differential equations
Computationally very intensive, but hardly layout improvements
75
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Force Directed Placement Addressing Scalability Issues
• Solution: do not compare every object with all the others
Stochastic Sampling (neighbor + random sets) [Chalmers 1996]: O(n2)
Use kernel (sampling), pivots and interpolation [Alastair & Chalmers 2004]: O(n1.2)
Apply sampling and interpolation recursively [Jourdan & Melançon 04]: O(n*log(n))
Compute a hierarchy automatically using clustering, apply FDP along the hierarchy [Muhr, Sabol, Granitzer 2010]: O(n*log(n))
Hierarchical geometry: support for LOD and navigation
• Alternative Techniques:
Least Square Projection, Random Projections, FASTMAP, IDMAP…
Fast, but mostly inferior layout quality, often not visually pleasing layouts
76
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Projection Algorithm
Input: term vectors, base area (rectangle)
Output: hierarchy of nested areas, 2D document positions
Recursive Algorithm:
• Aggregation: k-means clustering, labeling using highest weight features
• Similarity layout: force-directed placement, inscribing into area
• Area subdivision: Voronoi diagrams
• For each cluster: cluster size > threshold?
– Yes: apply algorithm recursively on the cluster
– No: layout documents (similarity layout)
77
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Projection Algorithm Hierarchical Projection
78
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Projection Algorithm Advantages
• FDP applied on small number of objects: good layout, fast
• Hierarchical labeled geometry
Navigation and exploration along the hierarchy
Labels adapted to the level of detail, from overview to detail
• Incremental: data set changes integrated seamlessly into the layout
User can recognized unchanged parts of the visualization
• Scalable
Time and space complexity: O(n*log(n))
Parallelization fairly straightforward
• Parameterizable
Adaptable to different data types
Produces visually pleasing layouts
79
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Information Landscape Visualization
80
• Proximity expresses relatedness
• Hills represent groups of similar data elements
Height indicates size
Compactness indicates topical cohesion
• Labels capture essence of undelaying data
Orientation and navigation
400.000 documents
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Hierarchical Information Landscape Navigation and Orientation
81
• Conveys relatedness and hierarchy
• Level of detail-sensitive navigation and orientation
Animated transitions: auto-focus on the chosen cluster
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Topical-Temporal-Metadata Analysis Visual Interface
Know-Center [Sabol et al.]
82
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Topical-Temporal-Metadata Analysis Coordinated Multiple Views
83
View coordination
• Colors and transparency
• Icons
• Size
• Selection
• Visibility
• Navigation in the hierarchy
Views
• Cluster hierarchy: tree
• Topical similarity: hierarchical information landscape
• Position in the hierarchy: location bar
• Topical trends: stream-view
• Facetted metadata hierarchy: tree
• Document metadata: table
• Document content: text pane
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Topical-Temporal Analysis Example (Demo)
84
“Japan, Tokyo, Bay” cluster (red)
• 2 temporal peaks
• Topically separated (different hills)
Hypothesis: two different events
Analysis for validation:
• Inspection
• Searching + highlighting
• Correlating metadata
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Visual Scatter/Gather Drill Down
• Identify and select relevant parts of the data set
• Retrigger analysis to focus on the chosen subset
85
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Dynamically Changing Repositories
• Data set change over time
Data elements added, removed, modified
• Consequence: visual representation must change
• Problem: ensure visual representation change is appropriate
Magnitude of visualization change proportional to magnitude of data change
Only areas corresponding to modified data should change
Other areas of the visualization remain (mostly) stable
User retains recognition within a changing visualization
86
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Dynamically Changing Repositories Incremental Integration of Changes
• Change in the layout corresponds to change in the data
User retains recognition and orientation through unchanged parts of the topography
87
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Motivation
• Large graphs increasingly common
Social networks
Semantic knowledge bases
Interlinked document repositories
• Need visual approaches for gaining insight into large graphs
• Challenges
Clutter caused by many nodes and intersecting edges
Limited load on the client
• Web and mobile clients
88
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Goals
• Provide an overview of the whole graph
Show the overall structure
• Avoid user and client overload
Introduce more details when zooming (LOD)
• Maximize clarity: apply techniques for avoiding overlap and clutter
Edge bundling, edge routing
• Combined Approach
Hierarchical aggregation of graph data (nodes and edges)
Clutter reduction: edge routing and bundling along the hierarchy
89
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Hierarchy Generation
• Aggregate nodes to meta-nodes (top-down)
1. Hierarchical clustering
• Node similarity depends on connectivity
2. Hierarchy Extraction from Ontologies
• Class hierarchy: traverse nodes/relations of particular types
• Aggregate edges to meta-edges (bottom-up)
Combine edges propagating outside of a cluster (inter-cluster edges)
90
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Clutter Reduction Techniques
• Force-directed edge bundling [Holten & van Wijk 2009]
Bundle edges propagating in “similar” direction
• Edge routing along the Voronoi mesh [Lambert et al. 2010]
Dijkstra's shortest path algorithm
91
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Edge Bundling – non-Hierarchical vs. Hierarchical
92
• Reduces edge clutter
• Some node-edge overlap remains
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Edge Routing – non-Hierarchical vs. Hierarchical
93
• Reduces edge clutter and eliminates edge-node overlap
• Price: massive edge overlap on Voronoi boundaries
Edge stroke indicator for number of overlapping edges
Visual Analytics Jan 7 & 14, 2013 Vedran Sabol (KMI, TU Graz)
Scalable Graph Visualization Level of Detail (Demo)
94
• Client loads more detailed geometry on demand