Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

75
Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012

Transcript of Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Page 1: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Text Visualization

Marti Hearst

Guest Lecture, i247, Spring 2012

Page 2: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Graphing Quantitative Data

Page 3: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Graphing Quantitative Data

Page 4: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Graphing Nominal Data

Page 5: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Preattentive Properties

Page 6: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Text is NOT Preattentive

SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ

DEZIDIXO

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP

SDOHTEM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER

SNMULOC

GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE

YRUCREM

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP

SDOHTEM

GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE

YRUCREM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER

SNMULOC

SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ

DEZIDIXO

CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP

SDOHTEM

SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER

SNMULOC

Page 7: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

TEXT AS NOTATION

Page 8: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Notation

Page 9: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Logograms

Page 10: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Syllabary

Page 11: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Syllabary

Page 12: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Alphabet

Page 13: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Alphabet

Page 14: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

HOW DO WE VISUALLY REPRESENT CONTENT?

So….

Page 15: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

HENRY JAMES, TURN OF THE SCREW

“It appeared that the narrative he had promised to read us really

required for a proper intelligence a few words of prologue. Let me

say here distinctly, to have done with it, that this narrative, from an

exact transcript of my own made much later, is what I shall

presently give. Poor Douglas, before his death—when it was in sight

—committed to me the manuscript that reached him on the third of

these days and that, on the same spot, with immense effect, he

began to read to our hushed little circle on the night of the fourth.”

Page 16: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Movies?

Georges Méliès Le Voyage dans la Lune (A Trip to the Moon) (1902)

Page 17: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Graphic Novel

Page 18: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

VISUALIZATION IN SEARCH

Page 19: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

A Comparative Study

• Reiterer et al., SIGIR 2000• Well-done study– They weren’t the creators of the viz’s tested– 40 participants, varied tasks

• Compared:– Plain html web page– Sortable search results (in a table view)– Tilebars-like view – Bar charts view– Scatterplot view

Page 20: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 21: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 22: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 23: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 24: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

A Comparative Study

• Reiterer et al., SIGIR 2000• Results:– People weren’t any better with viz’s than

with standard web view. Significantly worse with bar charts

– Subjective results: Sortable Table, then Tilebars, then simple web-based view

– People disliked the bar charts and scatter plots

Page 25: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Starfield (Clustering-based) Visualizations

Wise et al 95

Page 26: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Starfield (Clustering-based) Visualizations

Page 27: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Starfield (Clustering-based) Visualizations

Wise et al 95

Page 28: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Starfield (Clustering-based) Visualizations

Page 29: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Are visual clusters useful?

• Four Clustering Visualization Usability Studies

• Conclusions:

– Huge 2D maps may be inappropriate focus for

information retrieval

• cannot see what the documents are about

• space is difficult to browse for IR purposes

• (tough to visualize abstract concepts)

– Perhaps more suited for pattern discovery and gist-like

overviews.

Page 30: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Clustering Algorithm Problems

• Doesn’t work well if data is too homogenous or too

heterogeneous

• Often is difficult to interpret quickly

– Automatically generated labels are unintuitive and occur at

different levels of description

• Often the top-level can be ok, but the subsequent

levels are very poor

• Need a better way to handle items that fall into more

than one cluster

Page 31: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 32: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Creative Facet Visualization

• Aduna Autofocus

Page 33: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Creative Facet Visualization

• We Feel Fine

Page 34: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Creative Facet Visualization

• Fathumb mobile search interface• http://research.microsoft.com/vibe/projects/FaThumb.aspx

Page 35: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Creative Facet Visualization

• Hutchinson et al.

Page 36: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

VISUALIZATION IN TEXT ANALYSIS

Page 37: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 38: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 39: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 40: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

WHAT’S UP WITH TAG CLOUDS?

Hearst & Rosner, 2007

Page 41: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 42: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.
Page 43: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Definition

Tag Cloud: A visual representation of

social tags, organized into paragraph-

style layout, usually in alphabetical

order, where the relative size and

weight of the font for each tag

corresponds to the relative frequency

of its use.

Page 44: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

DefinitionTag Cloud: A visual representation of

social tags, organized into

paragraph-style layout, usually in

alphabetical order, where the relative

size and weight of the font for

each tag corresponds to the relative

frequency of its use.

Page 45: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

flickr’s tag cloud

Page 46: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

del.icio.us

Page 47: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

del.icio.us

Page 48: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

blogs

Page 49: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

ma.gnolia.com

Page 50: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

IBM’s manyeyes project

Page 51: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Amazon.com: Tag clouds on term frequenies

Page 52: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Cloud AlternativesProvided by Martin Wattenberg

Page 53: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Alternative: “Semantic” Layout• Improving Tag-

Clouds as Visual

Information

Retrieval

Interfaces, Yusef

Hassan-

Monteroa, 1 and

Víctor Herrero-

Solana,

InSciT2006

• Tags grouped by

“similarity, based

on clustering

techniques and

co-occurrence

analysis”

Page 54: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

I was puzzled by the questions:

• What are designers and authors’

intentions in creating or using tag

clouds?

• How do they expect their readers to

use them?

Page 55: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

On the positive side:

• Compact

• Draws the eye towards the most frequent

(important?) tags

• You get three dimensions simultaneously!

– alphabetical order

– size indicating importance

– the tags themselves

Page 56: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Weirdnesses

• Initial encounters unencouraging

– Some reports from industry:

• Is the computer broken?

• Is this a ransom note?

Page 57: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Weirdnesses• Violates principles of perceptual design

– Longer words grab more attention than shorter

• Length of tag is conflated with its size

– White space implies meaning when there is none intended

• Ascenders and descenders can also effect focus

– Eye moves around erratically, no flow or guides for visual focus

– Proximity does not hold meaning

• The paragraph-style layout makes it quite arbitrary which terms are above,

below, and otherwise near which other terms

– Position within paragraph has saliency effects

– Visual comparisons difficult (see Tufte)

Page 58: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Weirdnesses• Meaningful

associations

are lost

– Where are the

different

country names

in this tag

clouds?

Page 59: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Weirdnesses

Which operating systems are

mentioned?

Page 60: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Cloud Study (1)

• First part compared tag cloud layouts– Independent Variables:

• Tag size• Tag proximity to a large font• Tag quadrant position

– Task: recall after a distractor task– 13 participants; effects for size and

quadrant• Second part compared tag clouds to

lists– 11 participants– Tested recognition (from a set of like

words) and impression formation• Alphabetical lists were best for the latter;

no differences for the former

Getting our head in the clouds: Toward evaluation studies of tagclouds, Walkyria Rivadeneira Daniel M. Gruen Michael J. Muller David R. Millen, CHI 2007 note

Page 61: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Cloud Study (2)• 62 participants did a selection task

– (find this country out of a list of 10 countries)

– Independent Variables:• Horizontal list• Horizontal list, alphabetical• Vertical list• Vertical list, alphabetical• Spatial tag cloud• Spatial tag cloud, alphabetical

– Order for non-alphabetical not described– Alphabetical fastest in all cases, lists

faster than spatial– May have used poor clouds (some

people couldn’t “see” larger font answers)

An Assessment of Tag Presentation Techniques; Martin Halvey, Mark Keane, poster at WWW 2007.

Page 62: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

A Justifying Claim• You get three dimensions

simultaneously!

– alphabetical order

– size indicating importance

– the tags themselves

… but is this really a conscious design

decision?

Page 63: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Solution: Celebrity Interviews• I was really confused about tag

clouds, so I decided to ask the people

behind the puffs

– 15 interviews, conducted at foocamp’06

• Several web 2.0 leaders

– 5 more interviews at Google and

Berkeley

Page 64: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

A Surprise• 7 interviewees DID NOT REALIZE that alphabetical ordering is

standard.

– 2 of these people were in charge of such sites but had had others

write the code

• What was the answer given to “what order are tags shown

in?”

– hadn’t thought about it

– don’t think about tag clouds that way

– random order

– ordered by semantic similarity

• Suggests that perhaps people are too distracted by the layout

to use the alphabetical ordering

Page 65: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Suggested main purposes:• To signal the presence of tags on the site

• A good way to get the gist of the site

• An inviting and fun way to get people interacting with the site

• To show what kinds of information are on the site

– Some of these said they are good for navigation

• Easy to implement

Page 66: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Clouds as Self-Descriptions

• Several noted that a tag cloud showing one’s own tags

can be evocative

– A good summary of what one is thinking and reading about

– Useful for self-reflection

– Useful for showing others one’s thoughts

• One example: comparing someone else’s tags to own’s one to see

what you have in common, and what special interests differentiate

you

• Useful for tracking changes in friends’ lives

– Oh, a new girl’s name has gotten larger; he must have a new girlfriend!

Page 67: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Clouds as showing “Trends”• Several people used this term, that tag clouds show trends

in someone’s behavior

– Trends are usually patterns across time, which are not inherently

visible in tag clouds

– To note a trend using a tag cloud, one must remember what was

there at an earlier time, and what changed

• tracking the girls’ names example

– This suggests a reason for the importance of the large tags –

draws one’s attention to what is big now versus was used to be

large.

– Suggests also why it doesn’t matter that you can’t see small tags.

Page 68: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Follow-up Study• Informed by the interview results, we search for, read, and coded web pages that

mentioned tag clouds.

– Looked at about 140 discussions

– Developed 21 codes

– Looked at another 90 discussions

• Used web queries: “tag clouds”, usability tag clouds, etc

• Sampled every 10th url

– 58% personal blogs

– 20% commercial blogs

– 10% commercial web pages

– rest from group blogs and discussion lists

– Doesn’t tell us what people who don’t write about tag clouds think.

Page 69: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

The Role of Popularity• Popularity in the sense that tag clouds (and tagging) are trendy and

popular.

– Some people liked the visualization, but their popularity made them less

appealing

• Famous post: “Tag clouds are the new mullets”

• Led to self-consciousness about liking them

– Many complained about unaesthetic cloud designs

– Little consensus on if they are a fad or have staying power

• Popularity also in the sense of the large font size for more popular

tags

– Many people like the prominence of large tags, but several commented on the

tyranny of the popular

Page 70: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

The Role of Navigation

• Opinions vary

– Many simply state they are useful for navigation, but

with no support for this claim

– Some claim the compactness makes navigation easier

than a vertical list

– Some object to the varying font size on scannability

– Others object to the lack of organization

– Overall, there is no evidence either way that we could

find in the blog community

Page 71: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Aesthetic Considerations

• Disagreement on the aesthetic and emotional

appeal, especially for lay users.

• Those who like them find them fun and appealing

• Those who don’t find them messy, strange, like a

ransom note

• Informal reports with first time users who are not

in the Web 2.0 community are negative

Page 72: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Trends again

• As in the interviews, the benefit of

“trends” was mentioned many times.

• There is another sense of “trend” as

“tendency or inclination,” and this

might be what people mean.

Page 73: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Summary of Stated Reasons for Tag Clouds(Note: some refuted by studies)

Page 74: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Tag Clouds as Social Information

• An emphasis that tag clouds are meant to show human

behavior.

• We found reports of people commenting on other uses that

were invalid because they did not reflect live user input:

– One blogger noted the incongruity of an online library using

keyword frequencies in a tag cloud rather than having it reflect

patron’s usage of the collection.

– An online community noticed one site’s cloud didn’t change over

time and realized the sizes were decided by marketing. This was

greated with derision.

Page 75: Text Visualization Marti Hearst Guest Lecture, i247, Spring 2012.

Implications

• Assume tag clouds are meant to reflect human

mental activity (individual or group)

• Then what might seem design flaws from an

information conveyance perspective may not be

• A large part of the appeal is the fun and liveliness.

– The informality of the layout reflects the human

activity beneath it.