Questioning Yahoo! Answers - Stanford...
Transcript of Questioning Yahoo! Answers - Stanford...
QuestioningYahoo! AnswersZoltán Gyö[email protected]
Question Answering on the Web ● April 22, 2008 2
OutlineYahoo! Answers modelStatistics• Basics• Diversity• Authority
Problems• Interaction model• Others
Question Answering on the Web ● April 22, 2008 3
Yahoo! Answers Model
Open
Undecided
Resolved
ask.
answer.
hand-picked
voted
Question Answering on the Web ● April 22, 2008 4
Yahoo! Answers ModelBrowse• Shallow hierarchy of categories
Search• “Advanced” search
– Keyword match in Qs, best As– Category– Filter by language/location– Question status– Date submitted
• No sorting criteria
discover.
Health
Alternative Medicine
Dental
Diet & Fitness
Women’s Health
Diseases and Conditions
General Health Care
Mental Health
Men’s Health
Other
Question Answering on the Web ● April 22, 2008 5
Yahoo! Answers ModelPost-resolutionopinions• Thumbs-up/
down votes• Comments• Interesting• Save to…
User points/levels
Question Answering on the Web ● April 22, 2008 7
Motivation: SearchEnable web-style searchImprove ranking• Relevance
– Query dependent– Text-based
• Importance– Query independent (global)
Question Answering on the Web ● April 22, 2008 8
Motivation: SearchImportance• Formulation of questions, answers
– Text analysis• Popularity/community impact
– Many answers– Many votes– Involvement of influential users
» Knowledgeable answerers (many best answers)» Good askers» Active users» Leading users
Question Answering on the Web ● April 22, 2008 9
Data Set10 months: from August 2005 to May 20061,690,459 questions• 895,137 voted• 795,322 hand-picked
10,995,265 answers2,971,000 votes969,652 users• 700,634 askers• 545,104 answerers
Question Answering on the Web ● April 22, 2008 11
Qs & As per Period
Answers (Avg. per Q)QuestionsMonths
2,953,255 (5.47)539,52202 & 03
7,174,483 (7.82)917,25704 & 05
864,899 (3.72)232,45912 & 01
2,100 (2.04)1,02710 & 11
365 (1.88)19408 & 09
Question Answering on the Web ● April 22, 2008 12
Distributions
1 5 10 50 100 500 1000
110
010
000
Questions per user
Num
ber o
f use
rs
1 10 100 1000 10000
110
010
000
Answers per user
Num
ber o
f use
rs
1 5 10 50 100 500 1000
110
010
000
Answers per question
Num
ber o
f que
stio
ns
Power laws (above)• Qs, As / user
… and not• Answers per Q• Limited exposure
Question Answering on the Web ● April 22, 2008 13
Votes per Question
Self votes: 782,583 (26.34%)• 536,877 (60% v / 31.7% t) questions influenced• 395,965 (44% v / 23.4% t) questions decided by
a single vote, which was a self vote
1 2 5 10 20 50 100 200 500
110
010
000
Votes per question
Num
ber o
f que
stio
ns
Question Answering on the Web ● April 22, 2008 14
CategoriesTree of 728 nodes• 24 top-level categories (TLCs)• 218 subcategories• 486 subsubcategories under 10 TLCs
– Mostly geographical
Average TLC per asker: 1.5• 6 out of 7 asks questions in a single TLC
Average TLC per answerer: 3.5
Question Answering on the Web ● April 22, 2008 15
Qs, As per Category
Ar Bu Ca Co Cn Di Ed En Fo Ga He Ho Lo Lv Ne Ot Pe Po Pr Sc So Sp Tr Y!
Categories
Que
stio
ns p
er C
ateg
ory
050
000
1000
0015
0000
2000
00
6.55
3.734.26
3.5 3.32
7.11
4.44
7.248.03
4.5
6.4
4.42
3.21
9.2
5.4
4.09
8.23
6.93
10.05
5
10.39
5.87
4.454.75
A/Q ratio
Question Answering on the Web ● April 22, 2008 16
# of Questions vs. # of TLCs
1 5 10 50 100 500 1000
12
510
20
Questions per Asker
Cat
egor
ies
per A
sker
Question Answering on the Web ● April 22, 2008 17
# of Answers vs. # of TLCs
1 10 100 1000 10000
12
510
20
Answers per Answerer
Cat
egor
ies
per A
nsw
erer
Question Answering on the Web ● April 22, 2008 18
Assessing Authority1
2
3
3
1
3
1
2
2221
Nodes• Users (askers and
answerers)• Questions• Answers (best and
not best)Links
Question Answering on the Web ● April 22, 2008 19
Assessing Authority1
2
3
3
1
3
1
2
2221
Disregard• Text• Votes• Points• Thumbs-up/down• Comments
Question Answering on the Web ● April 22, 2008 20
Assessing Authority1
2
3
3
1
3
1
2
2221
Which links to trust?
Question Answering on the Web ● April 22, 2008 21
Assessing Authority1
2
3
3
1
3
1
2
2221
Which links to trust?Best answers
Question Answering on the Web ● April 22, 2008 22
Assessing Authority1
2
3
3
1
3
1
2
2221
Which links to trust?Best answers
Question Answering on the Web ● April 22, 2008 23
Assessing AuthorityRandomized hubs & authorities• Authorities = users providing many best
answers to important questions• Hubs = users asking important questions best-
answered by authorities
auth = ε · 1 + (1 – ε) · ATrow · hub
hub = ε · 1 + (1 – ε) · Acol · authadjacency matrixrandom jump (~ 0.1)
Question Answering on the Web ● April 22, 2008 30
Problem: Interaction Model3 types of interaction1. Focused question / expert answer
– “Can CDs be shipped as Media Mail?”– Original intention
2. Opinions, discussions– “Do you like country music?”– “What do you think about…?”– No support for (sub)threads
Question Answering on the Web ● April 22, 2008 31
Problem: Interaction Model3 types of interaction (cont’d)3. (In between)
– “Where can I get the best French bread in SF?”4. Chat
– “Anyone online now?”– No support for real-time interaction
Questions• How to facilitate each type?• How to avoid interference?
Question Answering on the Web ● April 22, 2008 32
More ProblemsRanking Finding similar usersIdentifying potential expert answerersInformation push/pullSuggesting categories, tags
Question Answering on the Web ● April 22, 2008 33
SummaryYahoo! Answers• 10 months (08/05-05/06)• Rapid adoption, increase in average A / Q• 70% of users ask, 55% of answer, 28% do both• User interest typically spans 2-4 TLCs• Some correlation between authority, points
Problems• Interaction model• Ranking, recommendations