DB/IR Keynote - Data for the People
description
Transcript of DB/IR Keynote - Data for the People
Data for the People, by the People
Mor Naaman
Yahoo!
Mor Naaman: Data for the People
Heard of Flickr?
2
Mor Naaman: Data for the People
Guess the Tags
3
Zion Hiking Mountains Landscape
Nature Valley
Mor Naaman: Data for the People
Guess the Tags
4
Dog Puppy White Animal
Pet Sad Nepal
Mor Naaman: Data for the People
What is “Social Media”?
Online media published or shared by individuals and organizations, in an environment that encourages significant individual participation and that promotes curation, discussion and re-use.
5
Mor Naaman: Data for the People
Social Media Cycle
6
User
Community
Applications
Data
Motivations
Mor Naaman: Data for the People
Social Media Opportunity
7
User
Community
Applications
Data
Motivations
New data
New applications and experiences
Social environment encourages engagement
Mor Naaman: Data for the People
The Algorithm is NOT King
8
Application design
User research
Deep understanding of users, tasks
Mor Naaman: Data for the People
Social Media Science?
9
A Social Media Science?
Mor Naaman: Data for the People
Outline
- The People
- The Data
- The Multimedia Opportunity
10
Mor Naaman: Data for the People
Outline
- The PeopleFlickr “interestingness”
Why we tag?
Role of social constructs?
- The Data
- The Multimedia Opportunity
11
Mor Naaman: Data for the People
Flickr Interestingness
12
Views
Comments
Favorites
...
The most “interesting” photos are likely to generate the most “activity”
Mor Naaman: Data for the People
Flickr Interestingness
13
Mor Naaman: Data for the People
Flickr Interestingness
14
Mor Naaman: Data for the People
Flickr Tag Affordances
Displayed next to photo
Can be used to search:Your own photos
Others’ photos
Public photos
15
Mor Naaman: Data for the People
Why We Tag? A User Study
13 ZoneTag users (23-45, 9m, 4f)
All “taggers” (no use to ask non-taggers why they tag)
Structured interviews
16
published in CHI 2007
Mor Naaman: Data for the People
Motivation Taxonomy
Sociality
Social
Self
Function Organization Communication
* Retrieval, Directory* Search
* Context for self* Memory
* Contribution, attention* Ad hoc photo pooling
* Content descriptors* Social Signaling
“If I tagged ahead of time I can go back and get all my pictures of
[my children]…”
“…I then think “well, maybe I
should tag this” so I can find it again
later”“I’m obsessive-compulsive”
17
Mor Naaman: Data for the People
Motivation Taxonomy
Sociality
Social
Self
Function Organization Communication
* Retrieval, Directory* Search
* Context for self* Memory
* Contribution, attention* Ad hoc photo pooling
* Content descriptors* Social Signaling
18
Mor Naaman: Data for the People
Motivation Taxonomy
“I tag photos with what I think
might be interesting to other people, stuff I think
people will like”
“I know that tagging can connect my photos to
activities, and get more interest”
“I want to look at all [my
neighborhood’s] tags. … That’s definitely a reason I’m
putting these tags in ”
Sociality
Social
Self
Function Organization Communication
* Retrieval, Directory* Search
* Context for self* Memory
* Contribution, attention* Ad hoc photo pooling
* Content descriptors* Social Signaling
19
Mor Naaman: Data for the People
Motivation Taxonomy
I can tell my mom [with the tag] “look, we went
to…”“I left reviews of places – like at the airport, when my
flight was delayed, I tagged “Aloha
Air sucks.”
Sociality
Social
Self
Function Organization Communication
* Retrieval, Directory* Search
* Context for self* Memory
* Contribution, attention* Ad hoc photo pooling
* Content descriptors* Social Signaling
20
Mor Naaman: Data for the People
The Numbers Agree
21
published in CHI 2008
Tags (R2 = .571)
.115*N/S
.150***
Self
Family & Friends
Public
.489***Groups
.270***Contacts .279***
Photos
Mor Naaman: Data for the People
Why Not Tag (Others’ Photos)?
2251
Not collected
Not identified
Not prominent
In user’s account
As coming from the tagger
In the interface, as “opinion”
Not aggregated Can’t “vote” on tag/item pair
Mor Naaman: Data for the People
Is Facebook Different?
Social constructs encourage “people” tagging
23
Propagated
Explained
To tagee’s account
To tagger’s viewers
Mor Naaman: Data for the People
Is the ESP Game Different?
No “social” motivations, game mechanism
Tagging other
people’s
content
24
Mor Naaman: Data for the People
Tagging Systems Structure
Source, type of object
Tagging rights
Tagging support/suggestions
Aggregation
Display/functionality
...
25
published in HyperText 2006
Mor Naaman: Data for the People
Communities, Vocabulary
(Sen et al., CSCW 2006)
26 Figure 1: relationship between community influenceand user tendency.
and so on. Personal tendency evolves as people interact withthe tagging system.
Figure 1 indicates how users’ own tagging behavior influ-ences their future behavior through creating investment andforming habits. The tags one has applied are an investmentin a personal ontology for organizing items. Changing on-tologies midstream is costly. For someone who has labeledPepsi, Coke, and Sprite as “pop”, it would make little senseto label RC and Mountain Dew as “soda”. Further, peo-ple are creatures of habit, prone to repeating behaviors theyhave performed frequently in the past [15]. Both habit andinvestment argue that people will tend to apply tags in thefuture much as they have applied them in the past.
There are also other factors that might influence a user’spersonal tendency to apply tags: they might lose or gaininterest in the system, become more knowledgeable abouttagged items, or become more or less favorably disposedto tagging as a way of organizing information. We do notmodel these factors in this paper.
Community influence. Figure 1 suggests that the com-munity influences tag selection by changing a user’s personaltendency. Golder and Huberman find that the relative pro-portions of tags applied to a given item in del.icio.us appearsto stabilize over time [9]. They hypothesize that the set ofpeople who bookmark an item stabilize on a set of termsin large part because people are influenced by the taggingbehavior of other community members. Similarly, Cattutoexamines whether the tags most recently applied to an itemaffect the user’s tag application for the item [4].
The theory of social proof supports the idea that seeingtags influences behavior. Social proof states that people actin ways they observe others acting because they come to be-lieve it is the correct way for people to act [5]. For example,Asch found that people conform to others’ behavior evenagainst the evidence of their own senses [1]. Cosley et al.found that a recommender system can induce conformingbehavior, influencing people to rate movies in ways skewedtoward a predicted rating the system displays, regardless ofthe prediction accuracy [6].
Research questions. Our work differs from Golder, Hu-berman, and Cattuto in an important way. Their analysesfocus on how vocabulary emerges around items, i.e., howtags applied to an item affect future tags applied to thatitem. In contrast, we focus on factors affecting the way in-dividual users apply tags across the domain of tagged items.Our first two research questions address the strength of the
two factors we believe most affect the evolution of individu-als’ vocabularies:
RQ1: How strongly do investment and habit a!ect per-sonal tagging behavior?
RQ2: How strongly does community influence a!ect per-sonal tagging behavior?
To the extent that the community influences individualtaggers, system designers have the power to shape the waythe community’s vocabulary evolves by choosing which tagsto display. In the extreme case, a system might never showothers’ tags, thus eliminating community influence entirely.Even systems that do make others’ tags visible will oftenhave too many tags to practically display. Figure 1 showsthe tag selection algorithm acts as a filter on the influenceof the community. We ask two research questions about theeffect of choosing tags to present:
RQ3: How does the tag selection algorithm influence theevolution of the community’s vocabulary?
RQ4: How does the tag selection algorithm a!ect users’satisfaction with the system?
Finally, we examine whether communities converge on theclasses of tags they use (e.g., factual versus subjective),rather than on individual tags. We explore whether thesedifferent classes of tags are more or less valuable to users oftagging systems:
RQ5: Do people find certain tag classes more or less usefulfor particular user tasks?
Our work differs from prior tag-related research in a num-ber of ways. First, we focus on people rather than items.Second, we study a new tagging system rather than a rela-tively mature one. Third, we compare behavior across sev-eral variations of the same system rather than looking at asingle example. Fourth, we study tagging as a secondaryfeature, rather than as the community’s primary focus.
We believe that our perspective and questions will givefresh insight into the mechanisms that affect the evolutionand utility of tagging communities. We use this insight toprovide designers with tools and guidelines they can use toshape the behavior of their own systems.
The rest of this paper is organized as follows. In section 2we discuss the design space of tagging systems and presentthe tagging system we built for users of the MovieLens rec-ommender system. Section 3 presents our experimental ma-nipulations and metrics within this tagging system. Sections4, 5, and 6 address our first three research questions relatedto personal tendency, community influence, and tag selec-tion algorithm. Section 7 covers research questions four andfive, which explore the value of a vocabulary to the com-munity. We conclude in section 8 with a discussion of ourfindings, limitations, design recommendations, and ideas forfuture research in tagging systems.
2. DESIGN OF TAGGING SYSTEMSIn this section, we briefly outline a design space of tag-
ging systems and then describe the choices we made for theMovieLens tagging system.
2.1 Tagging Design Space
Figure 3: Movie details page tag display.
Figure 4: Adding tags with auto-complete.
links that display a list of movies that have been tagged withthe clicked tag. Second, a tag search box with the auto-completion feature is provided to facilitate quick access tolists of movies that have been tagged with a particular tag.Finally, we added a “Your Tags” page that lists all the tagsthat a user has applied along with a sampling of movies thateach tag was applied to.
3. EXPERIMENTAL SETUPEach user was provided with the common tagging ele-
ments described in section 2.2. We now describe the experi-mental manipulations we performed to gain insight into ourresearch questions.
We randomly assigned users who logged in to MovieLensduring the experiment to one of four experimental groups.Each group’s tags were maintained independently (i.e. mem-bers of one group could not see another group’s tags).
Each group used a di!erent tag selection algorithm thatchose which tags to display, if any, that had been applied byother members of their group. We used these algorithms tomanipulate the dimensions of tag sharing and tag visibility.
The unshared group was not shown any community tags,corresponding to a private system where no tags are sharedbetween members.
The shared group saw tags applied by other membersof their group to a given movie. If there were more tagsavailable than a widget supported (i.e. three tags on themovie list, seven tags on the auto-complete list), the systemrandomly selected which tags to display.
The shared-pop group interface was similar to that ofthe shared group. However, when there were more tagsavailable than a widget supported, the system displayed themost popular tags, i.e., those applied by the greatest num-ber of people. Both the details page and the auto-complete
Table 1: Overall tag usage statistics by experimentalgroup. Note that the tags column overall total issmaller than the sum of the groups, because twogroups might independently use the same tag.
group users taggers tags tag applicationsunshared 830 108 601 1,546shared 832 162 809 1,685shared-pop 877 154 1,697 4,535shared-rec 827 211 1,007 3,677overall 3,366 635 3,263 11,443
drop-down displayed the number of times a tag was appliedin parentheses. We expected this group to exhibit increasedcommunity influence compared to the shared group because,since everyone would see the most popular items, peoplewould tend to share the same view of the community’s be-havior.
The shared-rec group interface used a recommenda-tion algorithm to choose which tags to display for particularmovies. When displaying tags for a target movie, the sys-tem selected the tags most commonly applied to both thetarget movie and to the most similar movies to the targetmovie. Similarity between a pair of movies was defined asthe cosine similarity of the ratings provided by MovieLensusers. Note that this means that a tag that was never ac-tually applied to a movie could appear as being associatedwith that movie–and further, that tags could be displayedfor a movie that had never had a tag applied to it.
We collected usage data from January 12, 2006 throughFebruary 13, 2006. Table 1 lists basic usage statistics overalland by experimental group. During the experiment, 3,366users logged into MovieLens, 635 of whom applied at leastone tag. A total of 3,263 tags were used across 11,443 tagapplications. (A tag is a particular word or phrase used ina tagging system. A tag application is when a user appliesa particular tag to a given item.)
3.1 MetricsAs shown in Table1, basic usage metrics di!ered widely
between experimental groups. However, these di!erencesare not statistically significant due to e!ects from “powertaggers.” Most tag applications are generated by relativelyfew users, approximating a power law distribution (y =15547x!1.4491, R2 = 0.9706). The mean number of tag ap-plications per user was about 18, but the median was three.The most prolific user applied 1,521 tags, while 25 users ap-plied 100 or more. Because of these skewed distributions,di!erences such as the number of tags applied per group,are not statistically significant.
Further, most of our research questions are not about dif-ferences in quantity, but rather, about how the tags peopleapply and view influence their future decisions on which tagsto apply. In most cases, we study this influence at the levelof categories of tags, which we call tag classes. Golder etal. present seven detailed classes of tags[9]. We collapseGolder’s seven classes into three more general classes thatare related to specific user tasks that tags could supportin the MovieLens community. We list short descriptions ofGolder’s tag classes that were folded into each of our tagclasses in parentheses.
1. Factual tags identify “facts” about a movie such as
Mor Naaman: Data for the People
Communities, Vocabulary
Movie Lens Tagging experimentPrivate tags
Shared tags (several conditions)
27
Tag Group:
Subjective Factual Personal
Unshared 24% 38% 39%
Shared (pop) 9% 82% 9%
Mor Naaman: Data for the People
MovieLens Social Psychology
Can social psychology principals be used to elicit contribution?
(Ling et al., J. Com. Med. Comm. 05)
28
Mor Naaman: Data for the People
Outline
- The People
- The Data
- The Multimedia Opportunity
29
Mor Naaman: Data for the People
Outline
- The People
- The DataSocial Media Patterns
Example: TagMaps / World Explorer
- The Multimedia Opportunity
30
Mor Naaman: Data for the People
Community-contributed data?
Media
Descriptive text (title, caption, tag)
Discussions and comments
Views and view patterns
Item use and feedback
Reuse and remix
Micro- and explicit recommendations
“Context Metadata”
…
31
Mor Naaman: Data for the People
Social Media Patterns
Semantic space (from any text)
Activity and viewing data
User/personal data
Social network
Location/time metadata
32
Mor Naaman: Data for the People
E.g., Semantic Patterns
33
Mor Naaman: Data for the People
E.g., Social Patterns
34
Mor Naaman: Data for the People
More Flickr Metadata: Location
35
Mor Naaman: Data for the People
This is not An Arch
“Noisy” data
Photographer biases
Wrong data
...6 kms5 kms
36
Mor Naaman: Data for the People
Tag Patterns
37
Mor Naaman: Data for the People
Tag Patterns
38
Mor Naaman: Data for the People
Tag Patterns
39
Mor Naaman: Data for the People
Tag Patterns
40
Mor Naaman: Data for the People
Tag Patterns: for the money!
41
Mor Naaman: Data for the People
Geo/Temporal Patterns
42
Jan-05
May-05
Sep-05
Jan-06
May-06
Sep-06
Jan-07
May-07
Mor Naaman: Data for the People
BYOBW!
43
published in SIGIR 2007
Mor Naaman: Data for the People
Location-driven Modeling
44
Mor Naaman: Data for the People
Extracting Knowledge
45
More “activity” in a certain locationindicates the importance of that location
Tags that are unique to a certain location can be used to represent the location
Mor Naaman: Data for the People
Translation into simple algorithm
Clustering of photos
Scoring of tagsTF / IDF / UF
46
(u2,bridge)
(u1,car)
(u1,bridge)(u3,car)
(u3,museum)
Mor Naaman: Data for the People
Tag Maps - SF
47
Mor Naaman: Data for the People
Attraction Maps of Paris
Stanley Milgram, 1976. ”Psychological Maps of Paris”
48
Mor Naaman: Data for the People
Tag Maps of Paris
Y!RB,
2006. TagMaps
49
Mor Naaman: Data for the People
Make a World Explorer
50
published in JCDL 2007
http://tagmaps.research.yahoo.com
Mor Naaman: Data for the People51
Better Image Search
Mor Naaman: Data for the People
Outline
- The People
- The Data
- The Multimedia Opportunity
52
Mor Naaman: Data for the People
Social Media = Context
Context is kingPredictor of content
Modifies perception of content
Social media: context also predicts activity?
53
Mor Naaman: Data for the People
Social Media = Challenge
Content is still hard…
Unstructured data (no semantics)
Tags, not ground truth labels
Noise
Scale • Computation
• Long tail means no supervised learning
54
Mor Naaman: Data for the People
Rolling in Content
We identified the landmarks...
We know where they are...
We can get the matching photos...
55
Mor Naaman: Data for the People
System Overview
56
published in WWW 2008
published in ACM MM 2007
Mor Naaman: Data for the People
System Overview
57
Mor Naaman: Data for the People
Learning from noisy labels
58
Mor Naaman: Data for the People
Visual Features
•Color: moments over a 5x5 grid
•Texture: Gabor over global image
•Interest points: SIFT
59
Mor Naaman: Data for the People
Ranking Clusters (1)
60
Same “objects” that appear often in cluster’s photos suggest relevance
Mor Naaman: Data for the People
Ranking Clusters (2)
61
Use Visual Features to compare average intra-cluster and inter-cluster similarity
Similarity between photos inside cluster versus outside the cluster suggests coherence
Mor Naaman: Data for the People
Ranking Clusters - More
Number of usersMore users -> more shared interest
Temporal spreadPersistent over time -> more likely to be location (or use method described earlier)
Visual coherenceMeasure of diversity of visual cluster
Visual connectivitySame objects?
62
Mor Naaman: Data for the People
Ranking Images
63
Mor Naaman: Data for the People
System Overview
64
Mor Naaman: Data for the People
Sample Results: Golden Gate
Tags-only Tags+Location Tags+Location+Visual
XX
X
X
XX
XX
X65
Mor Naaman: Data for the People
Performance: PrecisionP
re
cis
ion
@ 1
0
0
0.25
0.50
0.75
1.00
alcatraz
baybridge
coittower
deyoung
ferrybuilding
goldengatebridge
lombardstreet
palaceoffinearts
sfmoma
transamerica
average
Tag-Only Tag-Location Tag-Visual Tag-Location-Visual
66
+45% w/visual
+30% w/location
Mor Naaman: Data for the People
Performance: RepresentativeR
ep
res
en
tati
ve
Ph
oto
s
0
2.5
5.0
7.5
10.0
alcatraz
baybridge
coittower
deyoung
ferrybuilding
goldengatebridge
lombardstreet
palaceoffinearts
sfmoma
transamerica
average
Tag-Only Tag-Location Tag-Visual
67
Mor Naaman: Data for the People
Improve Relevance
68
Mor Naaman: Data for the People
Repeated in Other Context
Analyze context to extract patternsReduce content analysis to constrained scenario/task
Leverage content to improve metadata, relevance
69
Mor Naaman: Data for the People
Social Media @ Music Events
70
Analyze context to get set of media items from a single event
Use content (AF) to robustly synchronize the clips
Increase relevance,
findability
Mor Naaman: Data for the People
Summary
New data
New applications
User motivations
71
User
Community
Applications
Data
Motivations
Mor Naaman: Data for the People
Social Media = Opportunity
To better understand media contentAnd robustly apply content analysis
To predict and enhance use and engagement
To invent new multimedia systems
72
Mor Naaman: Data for the People
Notes
73
All photos CC or with permission:http://www.blog.spoongraphics.co.uk/freebies/vector-resources-part-5-icons
http://flickr.com/photos/oneeighteen/1610814928/
http://flickr.com/photos/klash/858533852/
http://flickr.com/photos/dooptheory/372807360/
http://flickr.com/photos/stuckincustoms/486035954/
http://flickr.com/photos/stuckincustoms/500771520/
http://flickr.com/photos/moriza/126238642/
http://flickr.com/photos/sunsurfr/537823498/
http://flickr.com/photos/708718/2053412156/
http://flickr.com/photos/stuckincustoms/497814516/
http://flickr.com/photos/moriza/271942020/
Mor Naaman: Data for the People
Thanks
With: Lyndon Kennedy, Tye Rattenbury, Alex Jaffe, Shane Ahern, Simon King, Rahul Nair, Jeannie Yang
Some Slides: http://slideshare.net/mor
http://infolab.stanford.edu/[email protected]@cs.stanford.edu
74