CIDR 2009: Jeff Heer Keynote

Post on 05-Dec-2014

5.972 views 2 download

description

This is a CIDR 2009 presentation. See http://infoblog.stanford.edu/ for more information and http://www-db.cs.wisc.edu/cidr/cidr2009/program.html for downloads.

Transcript of CIDR 2009: Jeff Heer Keynote

Voyagers and VoyeursSupporting Social Data Analysis

Jeffrey HeerComputer Science DepartmentStanford University

CIDR 2009 – Monterey, CA5 January 2009

A Tale of Two Visualizations

vizster

Observations

Groups spent more time in front of the visualization than individuals.

Friends encouraged each other to unearth relationships, probe community boundaries, and challenge reported information.

Social play resulted in informal analysis, often driven by story-telling of group histories.

NameVoyagerThe Baby Name Voyager

Social Data Analysis

Visual sensemaking can be social as well as cognitive.

Analysis of data coupled with social interpretation and deliberation.

How can user interfaces catalyze and support collaborative visual analysis?

sense.usA Web Application for Collaborative Visualization of Demographic Data

Voyagers and Voyeurs

Complementary faces of analysis

Voyager – focus on visualized data

Active engagement with the data

Serendipitous comment discovery

Voyeur – focus on comment listings

Investigate others’ explorations

Find people and topics of interest

Catalyze new explorations

Out of the Lab,Into the Wild

Wikimapia.org

DecisionSite posters

Spotfire Decision Site Posters

Tableau Server

Many-Eyes

Social Data Analysis In Action

1. Discussion and Debate

2. Text is Data, Too

3. Data Integrity and Cleaning

4. Integrating Data in Context

5. Pointing and Naming

For each, some thoughts on future directions.

I asked my colleagues: if you could give database researchers a wish list, what would it be?

Discussion and Debate

Tableau X-Box / Quest Diag?

“Valley of Death”

Content Analysis of Comments

Feature prevalence from content analysis (min Cohen’s = .74)High co-occurrence of Observations, Questions, and Hypotheses

ServiceSense.us Many-Eyes

0 20 40 60 80

Percentage

0 20 40 60 80

Percentage

ObservationQuestion

HypothesisData Integrity

LinkingSocializing

System DesignTesting

TipsTo-Do

Affirmation

Reduce the cost of synthesizing contributions

WANTED: Structured Conversation

Wikipedia: Shared Revisions NASA ClickWorkers: Statistics

Reduce the cost of synthesizing contributions

Can we represent data, visualizations, and social activity in a unified data model?

WANTED: Structured Conversation

Text is Data, Too

Visualization Popularity

Over 1/3 of Many-Eyes visualizations use free text

ServiceMany-Eyes Swivel

0.0 0.1 0.2 0.3 0.4 0.5

Percentage

0.0 0.1 0.2 0.3 0.4 0.5

Percentage

Tag CloudBubble Graph

Word TreeBar Chart

MapsNetwork Diagram

TreemapMatrix Chart

Line GraphScatterplot

Stacked GraphPie Chart

Histogram

Alberto Gonzales

WANTED: Better Tools for Text

Statistical Analysis of text (with ties to source!)

Entity Extraction

Aggregation and Comparison of texts

Get a “global” view of documents

We can do better than Tag Clouds (!?)

Use text analysis tools to enable analysis of structured conversation by the community.

Data Integrity and Cleaning

No cooks in 1910? … There may have been cooks then. But maybe not.

The great postmaster scourge of 1910?

Or just a bugin the data?

Content Analysis of Comments

16% of sense.us comments and 10% of Many-Eyes comments reference data quality or integrity.

ServiceSense.us Many-Eyes

0 20 40 60 80

Percentage

0 20 40 60 80

Percentage

ObservationQuestion

HypothesisData Integrity

LinkingSocializing

System DesignTesting

TipsTo-Do

Affirmation

WANTED: Data Cleaning Tools

Reshape data, reformat rows & columns

Handle missing data: label, repair, interpolate

Entity resolution and de-duplication

Group related values into aggregates

Assist table lookups & data transforms

Provide tools in situ to leverage collective

Transparency requires provenance

Integrating Data in Context

College Drug Use

College Drug Use

Harry Potter is Freaking Popular

WANTED: In-Situ Data Integration

Search for and suggest related data or views

User input for types, schema matching, or data

Apply in context of the current task

But record mappings for future use

Record provenance: chain of data sources

Examples: Google Web Tables, Pay-As-You-Go, Stanford Vispedia, Utah VisTrails

Pointing and Naming

“Look at that spike.”

“Look at the spike for Turkey.”

“Look at the spike in the middle.”

Free-form Data-aware

Visual Queries

Model selections as declarative queries over interface elements or underlying data

(-118.371≤ lon AND lon≤ -118.164)AND(33.915≤ lat AND lat≤ 34.089)

Visual Queries

Model selections as declarative queries over interface elements or underlying data

Applicable to dynamic, time-varying data

Retarget selection across visual encodings

Support social navigation and data mining

WANTED: Data-Aware Annotation

Meta-queries linking annotations to views

Visually specifying notification triggers

Annotating data aggregates (use lineage?)

Unified model (again!) to facilitate reference

How to make it work at scale?

How else to use machine-readable annotations?

Can annotations be used to steer data mining?

Conclusion

Social Data Analysis

Collective analysis of data supported by social interaction.

1. Discussion and Debate

2. Text is Data, Too

3. Data Integrity and Cleaning

4. Integrating Data in Context

5. Pointing and Naming

Summary

As visualization becomes common on the web, opportunities for collaborative analysis abound.

Weave visualizations into the web: data access, visualization creation, view sharing and pointing.

Support discovery, discussion, and integrationof contributions to leverage the collective.

Improve both processes and technologies for communication and dissemination.

Parting Thoughts

Visualizations may have a catalytic effecton social interaction around data.

Encourage participation by minimizing or offsetting interaction costs.

Provide incentives by fostering the personal relevance of the data.

Acknowledgements

@ Berkeley: Maneesh Agrawala, Wes Willett, danah boyd, Marti Hearst, Joe Hellerstein

@ IBM: Martin Wattenberg, Fernanda Viégas

@ PARC: Stu Card

@ Tableau: Jock Mackinlay, Chris Stolte, Christian Chabot

Jeffrey Heer Stanford University

jheer@stanford.eduhttp://jheer.org

Voyagers and VoyeursSupporting Social Data Analysis

With a collaborative spirit, with a collaborative platformwhere people can upload data, explore data, compare solutions, discuss the results, build consensus, we can engage passionate people, local communities, media and this will raise - incredibly - the amount of people who can understand what is going on.

And this would have fantastic outcomes: the engagement of people, especially new generations; it would increase knowledge, unlock statistics, improve transparency and accountability of public policies, change culture, increase numeracy, and in the end, improve democracy and welfare.

Enrico Giovannini, Chief Statistician, OECD. June 2007.