Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee.
-
Upload
rudolph-ferguson -
Category
Documents
-
view
219 -
download
0
Transcript of Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee.
Introduction to Q&A systems Yahoo! Answer and Naver KiN
KSE 801Uichin Lee
Online Q&A Systems
Baidu Knows
Naver Knowledge iN Naver Mobile Q&A
Closed in Sept 2011
Naver Knowledge iN
• Largest search engine in Korea - 70% of search (Google: 2%)
• Comprehensive portal – integrated news, blogs, ‘knowledge search’
• “Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge” --- NHN CEO Chae Hwi-young
• Knowledge-In had 60 million questions and answers as of Feb 2007
Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf
Culture of Generosity
• “(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.” -- Eckart Walther (Yahoo Research)
Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf
Knowledge Sharing and Yahoo Answers:
Everyone Knows Something
Lada A. Adamic, Jun Zhang, Eytan Bakshy, Mark S. Ackerman
University of MichiganWWW 2008
Original slides from http://140.116.246.92:5000/2008_WebIR_ppt/06.ppt
What is Yahoo Answers (YA)?
• A large and diverse question-answer forum.• YA has 25 top-level and 1002 (continually expanding) lower level
categories.• Ask: thread (title) and content.• Best Answers: those answers are rated by the asker or voted by
YA users.
Yahoo! Answer Dataset
• One month of YA activity
• 8,452,337 answers to 1,178,983 questions– 433,402 unique repliers 495,414 unique askers – 211,372 users both asked and replied
Characterizing YA: Posts/Replies
• Thread length (# replies per post) vs. post length (how verbose the answers are) w/ some categories named
Factual: programming, chemistry, physics: few,
but lengthy replies (less interactive)
Discussion: attracting many replies w/ moderate length
(more interactive)
Characterizing YA: Category Clusters• Classified the most active categories (>1000 posted questions) into
3 categories using k-means clustering on three metrics: thread length, content length, and asker/replier overlap
189 categories(91% of questions)
(GREEN) Discussion: Politics , Sport,
Religion
(BLUE) Advice/Opinion: Fashion, Baby names, Fast Food, Dogs/Cats
(RED) Facts: Biology, Programming, Repairs
Characterizing YA: Network Structure
Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C
Thread 2: Binary file with ASCII data user ARe: File with... user C
A
B
C
Outdegree: 1Indegree: 0
Outdegree: 0Indegree: 2
Outdegree: 1Indegree: 0
CCD
F: P
rob
(Deg
ree>
=k)
CCD
F: P
rob
(Deg
ree>
=k)
• Heavy tailed distribution, yet diverse across different categories
Characterizing YA: Ego Networks• Distinguishing answer person from discussion person in online forum
with an ego network (Welser 2007)• Ego networks consist of a focal node ("ego") and the nodes to whom
ego is directly connected to (these are called "alters") plus the ties among the alters, if any
Programming Marriage Wrestling
Characterizing YA: Motif Analysis
• How often interactions are reciprocal (the asker becomes the replier for another question)
• How often the triads are complete (three users who have all replied to one another)
Characterizing YA: Expertise Depth
• Rate 100 random questions from the Programming into 5 levels of expertise
• Outcome: only one question (1%) requires expertise above level 3
YA is very broad but not very deep
Expertise/Knowledge Across CategoriesPeople who answer questions in one category are likely to answer questions in related categories
16
Expertise/Knowledge Across Categories
Overlap in users who answered in one category (rows) and asked in another (columns)
User Entropy: Category Dispersion
• Goal: Entropy is just such a measure-the more concentrated a person’s answers, the lower the entropy, and the higher the focus.
• For example ,a user who is a dog trainer and we find that all her answers are in the Dog subcategory. Therefore her entropy is 0.
• Another user whose 40 questions are scattered among 17 of the 25 top-level categories and 26 subcategories. He posted no more than 4 answers in any one category and his combined 2-level entropy is 5.75
User Entropy: Category Dispersion
• Calculate entropy at each level and then do a summation
H1= 0.3 log(0.3)0.7 ∗ log(0.7) = 0.61∗
H2= −0.2 log(0.2) − 0.1 log(0.1) − 0.7 log(0.7) = 0.81∗ ∗ ∗
HT = H1 + H2 = 1.42
Level 1
Level 2
Category Dispersion vs. Best Answers
Predicting Best Answers
Questions in, Knowledge iN?: A Study of Naver's Question Answering
Community
Kevin K. Nam, Mark S. Ackerman, Lada A. Adamic
University of MichiganCHI 2009
Interactions in Naver KiN
Research Methods
• Naver dataset (via crawling): – Expertise score– Reward vs. answers– Patterns of participation (intensity, active periods)
• Focused interviews (26 KiN users):– Motivation for participation– Allocation of expertise
Patterns of Participation• Those who ask don’t answer • Top answers’(called gurus) z-score
Motivation for Participation• Altruism and helping others
– “Since I was a doctor, I was browsing the medical directories [in KiN]. I found a lot of wrong answers and information, and was afraid they would cause problems. So I thought I’d contribute in fixing it hoping that it’d be good for the society. [Sangmin]”
– I try to answer so that regular people can share knowledge, rather than technical knowledge. ...Someone needs it, and I have the ability to do it, and it’ll be a service to society. [Mirae]
• Business motives– “I’ve been working as an insurance agent for 9 years. I started answering in
Knowledge-iN as part of my business activity. In the evening, I answered questions to solicit potential clients.... So when I’d leave an answer, I’d say I would meet with you face-to-face to talk about more details and give you advice. [Taein]”
– Two interviewees stated that they had originally started on Naver to gain clients, but they found it to be less valuable than they had hoped. Instead, they stayed as a hobby and for altruistic reasons.
Motivation for Participation
• Learning– “My first intention [in answering] was to organize and
review my knowledge and practice it by explaining it to others. [Taein]”
– “Answering questions helps me study. I can learn from answering [in Translation]. I get to review what I used to know such as vocabularies and idioms. [Minhyuk]”
• Hobby and personal competence– “Yes [I answer everyday]. I am addicted (laughs).
[Nami]”
Motivation for Participation• Points
– “I don’t care about the points. [but] It’s fun to see points accumulate and my character level up [increase to the next level]. [Jeyeon]”
– “Usually questions w/ points do not seem frivolous. I feel like answering questions with points, not because of the points, but because those questions are more detailed and seek realistic help.”
Higher points elicit more answersPoint bounty for best answers
Law Category
Points postedPoints posted
Allocation of Expertise
• Knowledge level and quality of Naver KiN: – Useful for getting information on commonsense knowledge, current
events, basic domain knowledge, advice and recommendations from people, and diverse opinions
– But looking for Internet cafes for more detailed/expertise information
• Why?– Just to cover as many questions as possible (and earning points): time
pressure– Minimizing their efforts on answering; still others are willing to answer
questions “slightly” beyond their expertise– Other factors: lack of detailed information in the question, lack of
sense of community
Allocation of Expertise
• Intermittent participation
Weekly activity levels of a user
Weekly contributions averaged over users who posted > 100 answers and became active more than a year prior to the crawl.
in-active periods due to family obligation, loss of internet access, etc.
Summary
• YA/KiN: – Participation patterns:
• Facts vs. discussion, heavy tailed in/out-degree dist. • Intermittent participation (due to personal reasons)
– Less expertise level of answers (due to lack of motivation, lack of sense of community, etc.)
– Answers mostly focus on a few categories – YA: best answers tend to have lengthy answers; KiN: best
answers are generally located at the last answer position (or second to the last)
– Motivation (KiN): altruism, business, learning, hobby and personal competence, points, etc.