DB/IR Keynote - Data for the People

Data for the People, by the People

Mor Naaman

Yahoo!

Mor Naaman: Data for the People

Heard of Flickr?

2


Guess the Tags

3

Zion Hiking Mountains Landscape

Nature Valley


Guess the Tags

4

Dog Puppy White Animal

Pet Sad Nepal


What is “Social Media”?

Online media published or shared by individuals and organizations, in an environment that encourages significant individual participation and that promotes curation, discussion and re-use.

5


Social Media Cycle

6

User

Community

Applications

Data

Motivations


Social Media Opportunity

7

User

Community

Applications

Data

Motivations

New data

New applications and experiences

Social environment encourages engagement


The Algorithm is NOT King

8

Application design

User research

Deep understanding of users, tasks


Social Media Science?

9

A Social Media Science?


Outline

- The People

- The Data

- The Multimedia Opportunity

10


Outline

- The PeopleFlickr “interestingness”

Why we tag?

Role of social constructs?

- The Data


11


Flickr Interestingness

12

Views

Comments

Favorites

...

The most “interesting” photos are likely to generate the most “activity”



13



14


Flickr Tag Affordances

Displayed next to photo

Can be used to search:Your own photos

Others’ photos

Public photos

15


Why We Tag? A User Study

13 ZoneTag users (23-45, 9m, 4f)

All “taggers” (no use to ask non-taggers why they tag)

Structured interviews

16

published in CHI 2007


Motivation Taxonomy

Sociality

Social

Self

Function Organization Communication

* Retrieval, Directory* Search

* Context for self* Memory

* Contribution, attention* Ad hoc photo pooling

* Content descriptors* Social Signaling

“If I tagged ahead of time I can go back and get all my pictures of

[my children]…”

“…I then think “well, maybe I

should tag this” so I can find it again

later”“I’m obsessive-compulsive”

17


Motivation Taxonomy

Sociality

Social

Self






18


Motivation Taxonomy

“I tag photos with what I think

might be interesting to other people, stuff I think

people will like”

“I know that tagging can connect my photos to

activities, and get more interest”

“I want to look at all [my

neighborhood’s] tags. … That’s definitely a reason I’m

putting these tags in ”

Sociality

Social

Self






19


Motivation Taxonomy

I can tell my mom [with the tag] “look, we went

to…”“I left reviews of places – like at the airport, when my

flight was delayed, I tagged “Aloha

Air sucks.”

Sociality

Social

Self






20


The Numbers Agree

21

published in CHI 2008

Tags (R2 = .571)

.115*N/S

.150***

Self

Family & Friends

Public

.489***Groups

.270***Contacts .279***

Photos


Why Not Tag (Others’ Photos)?

2251

Not collected

Not identified

Not prominent

In user’s account

As coming from the tagger

In the interface, as “opinion”

Not aggregated Can’t “vote” on tag/item pair


Is Facebook Different?

Social constructs encourage “people” tagging

23

Propagated

Explained

To tagee’s account

To tagger’s viewers


Is the ESP Game Different?

No “social” motivations, game mechanism

Tagging other

people’s

content

24


Tagging Systems Structure

Source, type of object

Tagging rights

Tagging support/suggestions

Aggregation

Display/functionality

...

25

published in HyperText 2006


Communities, Vocabulary

(Sen et al., CSCW 2006)

26 Figure 1: relationship between community influenceand user tendency.

and so on. Personal tendency evolves as people interact withthe tagging system.

Figure 1 indicates how users’ own tagging behavior influ-ences their future behavior through creating investment andforming habits. The tags one has applied are an investmentin a personal ontology for organizing items. Changing on-tologies midstream is costly. For someone who has labeledPepsi, Coke, and Sprite as “pop”, it would make little senseto label RC and Mountain Dew as “soda”. Further, peo-ple are creatures of habit, prone to repeating behaviors theyhave performed frequently in the past [15]. Both habit andinvestment argue that people will tend to apply tags in thefuture much as they have applied them in the past.

There are also other factors that might influence a user’spersonal tendency to apply tags: they might lose or gaininterest in the system, become more knowledgeable abouttagged items, or become more or less favorably disposedto tagging as a way of organizing information. We do notmodel these factors in this paper.

Community influence. Figure 1 suggests that the com-munity influences tag selection by changing a user’s personaltendency. Golder and Huberman find that the relative pro-portions of tags applied to a given item in del.icio.us appearsto stabilize over time [9]. They hypothesize that the set ofpeople who bookmark an item stabilize on a set of termsin large part because people are influenced by the taggingbehavior of other community members. Similarly, Cattutoexamines whether the tags most recently applied to an itemaffect the user’s tag application for the item [4].

The theory of social proof supports the idea that seeingtags influences behavior. Social proof states that people actin ways they observe others acting because they come to be-lieve it is the correct way for people to act [5]. For example,Asch found that people conform to others’ behavior evenagainst the evidence of their own senses [1]. Cosley et al.found that a recommender system can induce conformingbehavior, influencing people to rate movies in ways skewedtoward a predicted rating the system displays, regardless ofthe prediction accuracy [6].

Research questions. Our work differs from Golder, Hu-berman, and Cattuto in an important way. Their analysesfocus on how vocabulary emerges around items, i.e., howtags applied to an item affect future tags applied to thatitem. In contrast, we focus on factors affecting the way in-dividual users apply tags across the domain of tagged items.Our first two research questions address the strength of the

two factors we believe most affect the evolution of individu-als’ vocabularies:

RQ1: How strongly do investment and habit a!ect per-sonal tagging behavior?

RQ2: How strongly does community influence a!ect per-sonal tagging behavior?

To the extent that the community influences individualtaggers, system designers have the power to shape the waythe community’s vocabulary evolves by choosing which tagsto display. In the extreme case, a system might never showothers’ tags, thus eliminating community influence entirely.Even systems that do make others’ tags visible will oftenhave too many tags to practically display. Figure 1 showsthe tag selection algorithm acts as a filter on the influenceof the community. We ask two research questions about theeffect of choosing tags to present:

RQ3: How does the tag selection algorithm influence theevolution of the community’s vocabulary?

RQ4: How does the tag selection algorithm a!ect users’satisfaction with the system?

Finally, we examine whether communities converge on theclasses of tags they use (e.g., factual versus subjective),rather than on individual tags. We explore whether thesedifferent classes of tags are more or less valuable to users oftagging systems:

RQ5: Do people find certain tag classes more or less usefulfor particular user tasks?

Our work differs from prior tag-related research in a num-ber of ways. First, we focus on people rather than items.Second, we study a new tagging system rather than a rela-tively mature one. Third, we compare behavior across sev-eral variations of the same system rather than looking at asingle example. Fourth, we study tagging as a secondaryfeature, rather than as the community’s primary focus.

We believe that our perspective and questions will givefresh insight into the mechanisms that affect the evolutionand utility of tagging communities. We use this insight toprovide designers with tools and guidelines they can use toshape the behavior of their own systems.

The rest of this paper is organized as follows. In section 2we discuss the design space of tagging systems and presentthe tagging system we built for users of the MovieLens rec-ommender system. Section 3 presents our experimental ma-nipulations and metrics within this tagging system. Sections4, 5, and 6 address our first three research questions relatedto personal tendency, community influence, and tag selec-tion algorithm. Section 7 covers research questions four andfive, which explore the value of a vocabulary to the com-munity. We conclude in section 8 with a discussion of ourfindings, limitations, design recommendations, and ideas forfuture research in tagging systems.

2. DESIGN OF TAGGING SYSTEMSIn this section, we briefly outline a design space of tag-

ging systems and then describe the choices we made for theMovieLens tagging system.

2.1 Tagging Design Space

Figure 3: Movie details page tag display.

Figure 4: Adding tags with auto-complete.

links that display a list of movies that have been tagged withthe clicked tag. Second, a tag search box with the auto-completion feature is provided to facilitate quick access tolists of movies that have been tagged with a particular tag.Finally, we added a “Your Tags” page that lists all the tagsthat a user has applied along with a sampling of movies thateach tag was applied to.

3. EXPERIMENTAL SETUPEach user was provided with the common tagging ele-

ments described in section 2.2. We now describe the experi-mental manipulations we performed to gain insight into ourresearch questions.

We randomly assigned users who logged in to MovieLensduring the experiment to one of four experimental groups.Each group’s tags were maintained independently (i.e. mem-bers of one group could not see another group’s tags).

Each group used a di!erent tag selection algorithm thatchose which tags to display, if any, that had been applied byother members of their group. We used these algorithms tomanipulate the dimensions of tag sharing and tag visibility.

The unshared group was not shown any community tags,corresponding to a private system where no tags are sharedbetween members.

The shared group saw tags applied by other membersof their group to a given movie. If there were more tagsavailable than a widget supported (i.e. three tags on themovie list, seven tags on the auto-complete list), the systemrandomly selected which tags to display.

The shared-pop group interface was similar to that ofthe shared group. However, when there were more tagsavailable than a widget supported, the system displayed themost popular tags, i.e., those applied by the greatest num-ber of people. Both the details page and the auto-complete

Table 1: Overall tag usage statistics by experimentalgroup. Note that the tags column overall total issmaller than the sum of the groups, because twogroups might independently use the same tag.

group users taggers tags tag applicationsunshared 830 108 601 1,546shared 832 162 809 1,685shared-pop 877 154 1,697 4,535shared-rec 827 211 1,007 3,677overall 3,366 635 3,263 11,443

drop-down displayed the number of times a tag was appliedin parentheses. We expected this group to exhibit increasedcommunity influence compared to the shared group because,since everyone would see the most popular items, peoplewould tend to share the same view of the community’s be-havior.

The shared-rec group interface used a recommenda-tion algorithm to choose which tags to display for particularmovies. When displaying tags for a target movie, the sys-tem selected the tags most commonly applied to both thetarget movie and to the most similar movies to the targetmovie. Similarity between a pair of movies was defined asthe cosine similarity of the ratings provided by MovieLensusers. Note that this means that a tag that was never ac-tually applied to a movie could appear as being associatedwith that movie–and further, that tags could be displayedfor a movie that had never had a tag applied to it.

We collected usage data from January 12, 2006 throughFebruary 13, 2006. Table 1 lists basic usage statistics overalland by experimental group. During the experiment, 3,366users logged into MovieLens, 635 of whom applied at leastone tag. A total of 3,263 tags were used across 11,443 tagapplications. (A tag is a particular word or phrase used ina tagging system. A tag application is when a user appliesa particular tag to a given item.)

3.1 MetricsAs shown in Table1, basic usage metrics di!ered widely

between experimental groups. However, these di!erencesare not statistically significant due to e!ects from “powertaggers.” Most tag applications are generated by relativelyfew users, approximating a power law distribution (y =15547x!1.4491, R2 = 0.9706). The mean number of tag ap-plications per user was about 18, but the median was three.The most prolific user applied 1,521 tags, while 25 users ap-plied 100 or more. Because of these skewed distributions,di!erences such as the number of tags applied per group,are not statistically significant.

Further, most of our research questions are not about dif-ferences in quantity, but rather, about how the tags peopleapply and view influence their future decisions on which tagsto apply. In most cases, we study this influence at the levelof categories of tags, which we call tag classes. Golder etal. present seven detailed classes of tags[9]. We collapseGolder’s seven classes into three more general classes thatare related to specific user tasks that tags could supportin the MovieLens community. We list short descriptions ofGolder’s tag classes that were folded into each of our tagclasses in parentheses.

1. Factual tags identify “facts” about a movie such as


Communities, Vocabulary

Movie Lens Tagging experimentPrivate tags

Shared tags (several conditions)

27

Tag Group:

Subjective Factual Personal

Unshared 24% 38% 39%

Shared (pop) 9% 82% 9%


MovieLens Social Psychology

Can social psychology principals be used to elicit contribution?

(Ling et al., J. Com. Med. Comm. 05)

28


Outline

- The People

- The Data


29


Outline

- The People

- The DataSocial Media Patterns

Example: TagMaps / World Explorer


30


Community-contributed data?

Media

Descriptive text (title, caption, tag)

Discussions and comments

Views and view patterns

Item use and feedback

Reuse and remix

Micro- and explicit recommendations

“Context Metadata”

…

31


Social Media Patterns

Semantic space (from any text)

Activity and viewing data

User/personal data

Social network

Location/time metadata

32


E.g., Semantic Patterns

33


E.g., Social Patterns

34


More Flickr Metadata: Location

35


This is not An Arch

“Noisy” data

Photographer biases

Wrong data

...6 kms5 kms

36


Tag Patterns

37


Tag Patterns

38


Tag Patterns

39


Tag Patterns

40


Tag Patterns: for the money!

41


Geo/Temporal Patterns

42

Jan-05

May-05

Sep-05

Jan-06

May-06

Sep-06

Jan-07

May-07


BYOBW!

43

published in SIGIR 2007


Location-driven Modeling

44


Extracting Knowledge

45

More “activity” in a certain locationindicates the importance of that location

Tags that are unique to a certain location can be used to represent the location


Translation into simple algorithm

Clustering of photos

Scoring of tagsTF / IDF / UF

46

(u2,bridge)

(u1,car)

(u1,bridge)(u3,car)

(u3,museum)


Tag Maps - SF

47


Attraction Maps of Paris

Stanley Milgram, 1976. ”Psychological Maps of Paris”

48


Tag Maps of Paris

Y!RB,

2006. TagMaps

49


Make a World Explorer

50

published in JCDL 2007

http://tagmaps.research.yahoo.com



Mor Naaman: Data for the People51

Better Image Search


Outline

- The People

- The Data


52


Social Media = Context

Context is kingPredictor of content

Modifies perception of content

Social media: context also predicts activity?

53


Social Media = Challenge

Content is still hard…

Unstructured data (no semantics)

Tags, not ground truth labels

Noise

Scale • Computation

• Long tail means no supervised learning

54


Rolling in Content

We identified the landmarks...

We know where they are...

We can get the matching photos...

55


System Overview

56

published in WWW 2008

published in ACM MM 2007


System Overview

57


Learning from noisy labels

58


Visual Features

•Color: moments over a 5x5 grid

•Texture: Gabor over global image

•Interest points: SIFT

59


Ranking Clusters (1)

60

Same “objects” that appear often in cluster’s photos suggest relevance


Ranking Clusters (2)

61

Use Visual Features to compare average intra-cluster and inter-cluster similarity

Similarity between photos inside cluster versus outside the cluster suggests coherence


Ranking Clusters - More

Number of usersMore users -> more shared interest

Temporal spreadPersistent over time -> more likely to be location (or use method described earlier)

Visual coherenceMeasure of diversity of visual cluster

Visual connectivitySame objects?

62


Ranking Images

63


System Overview

64


Sample Results: Golden Gate

Tags-only Tags+Location Tags+Location+Visual

XX

X

X

XX

XX

X65


Performance: PrecisionP

re

cis

ion

@ 1

0

0

0.25

0.50

0.75

1.00

alcatraz

baybridge

coittower

deyoung

ferrybuilding

goldengatebridge

lombardstreet

palaceoffinearts

sfmoma

transamerica

average

Tag-Only Tag-Location Tag-Visual Tag-Location-Visual

66

+45% w/visual

+30% w/location


Performance: RepresentativeR

ep

res

en

tati

ve

Ph

oto

s

0

2.5

5.0

7.5

10.0

alcatraz

baybridge

coittower

deyoung

ferrybuilding

goldengatebridge

lombardstreet

palaceoffinearts

sfmoma

transamerica

average

Tag-Only Tag-Location Tag-Visual

67


Improve Relevance

68


Repeated in Other Context

Analyze context to extract patternsReduce content analysis to constrained scenario/task

Leverage content to improve metadata, relevance

69


Social Media @ Music Events

70

Analyze context to get set of media items from a single event

Use content (AF) to robustly synchronize the clips

Increase relevance,

findability


Summary

New data

New applications

User motivations

71

User

Community

Applications

Data

Motivations


Social Media = Opportunity

To better understand media contentAnd robustly apply content analysis

To predict and enhance use and engagement

To invent new multimedia systems

72


Notes

73

All photos CC or with permission:http://www.blog.spoongraphics.co.uk/freebies/vector-resources-part-5-icons

http://flickr.com/photos/oneeighteen/1610814928/

http://flickr.com/photos/klash/858533852/

http://flickr.com/photos/dooptheory/372807360/

http://flickr.com/photos/stuckincustoms/486035954/


http://flickr.com/photos/moriza/126238642/

http://flickr.com/photos/sunsurfr/537823498/

http://flickr.com/photos/708718/2053412156/



http://www.blog.spoongraphics.co.uk/freebies/vector-resources-part-5-icon

http://www.blog.spoongraphics.co.uk/freebies/vector-resources-part-5-icon






















Thanks

With: Lyndon Kennedy, Tye Rattenbury, Alex Jaffe, Shane Ahern, Simon King, Rahul Nair, Jeannie Yang

Some Slides: http://slideshare.net/mor

http://infolab.stanford.edu/[email protected]@cs.stanford.edu

74

http://slideshare.net/mor

http://slideshare.net/mor

http://infolab.stanford.edu/~mor

http://infolab.stanford.edu/~mor

mailto:[email protected]

mailto:[email protected]

DB/IR Keynote - Data for the People

Business

Transcript of DB/IR Keynote - Data for the People