Post on 18-Nov-2014
description
A Query Routing Model to Rank
Expert Candidates on Twitter
Cleyton Souza, Jonathas Magalhães, Evandro Costa and
Joseana Fechine LIA - Laboratory of Artificial Intelligence
UFCG - Federal University of Campina Grande Campina Grande - Brazil
Introduction
• What is Social Query?
– It is the process of asking questions trough social
media (e.g., Twitter, Facebook, etc.)! [Morris et al.]
– The common strategy is sharing the question with everyone, but this way there is no guarantee that you will receive a good and quick answer
• Directing questions to someone is more efficient.
• What is Query Routing?
– It is the process of directing questions to appropriate
answerers (people able to help)!
cleyton.caetano.souza@copin.ufcg.edu.br 2
Introduction
• What are we proposing?
– A Query Routing Model: a technique that finds the most suitable person to help you based on knowledge, trust and activity.
– We are focusing in the Twitter context!
cleyton.caetano.souza@copin.ufcg.edu.br 3
A Query Routing Model to Rank Expert
Candidates on Twitter
Agenda
• Introduction
• Related Work
• Proposal
• Evaluation
– Methodology
– Results
– Treats to Validity
• Conclusion & Future Work
cleyton.caetano.souza@copin.ufcg.edu.br 4
Related Word (1/2)
• What are the differentials of our proposal to Previous Work? – Context – We are focusing on a Social Network
Context;
• While previous work focused on Community Question and Answering context…
• Why did we choose Twitter? – It is one of the most popular Online Social Networks;
– Less than 18% percent of questions asked on Twitter are answered [Paul et al.];
– [Nichols and Kang] confirmed that directing questions significantly improve the response rate;
cleyton.caetano.souza@copin.ufcg.edu.br 5
Related Word (2/2)
• What are the differentials of our proposal to Previous Work? – Problem – We are leading with the Query Routing
problem as a Multi-criteria Decision Making Problem (Weight Product Model – WPM); • While previous work applied mainly probabilistic
models…
• Why did we choose WPM? – [Triantaphyllou and Mann] confirmed that for problems with
dependence up to three variables, WPM achieves the best performance
cleyton.caetano.souza@copin.ufcg.edu.br 6
Proposal
• Some user on Twitter has a question • Our model analyzes the question and orders his
followers based on three criteria (further details [Souza et al.]) – Knowledge (K) – using bag of words strategy; – Trust (T) – a combination of similarity and
conversation rate; – Activity (A) – mean latency time between
consecutive messages;
• What do we want? – We want to find the best combination of K, T and A!
cleyton.caetano.souza@copin.ufcg.edu.br 7
Knowledge
• We want to ask someone who knows about the topic of the question
• We used Vector Space Model
– Users and question are represented by a vector of terms
– We match users and questions using cosine similarity between these vectors
cleyton.caetano.souza@copin.ufcg.edu.br 8
Trust/Closeness
• Sometimes, we want receive answers from people close to us
• How we automatically discover these people
– We analyze the conversation rate between the questioner and each follower
– We analyze the followers set similarity between the questioner and each follower
– We established that trust is the product between conversation rate and followers set similarity
cleyton.caetano.souza@copin.ufcg.edu.br 9
Activity
• Sometimes, we prefer a quick answer with low
quality instead a high quality answer but slow
• Our assumption is that people who produces a lot of content in a short time will provide quick answers
• Activity is a mean latency time between consecutive posts
cleyton.caetano.souza@copin.ufcg.edu.br 10
Proposal
• How we compare the criteria configuration of the followers? – We use Weight Product Model - we compare two
users using the following function:
𝑐𝑜𝑚𝑝 𝑢, 𝑣 =𝑚𝑎𝑝 𝐾𝑢
𝑚𝑎𝑝 𝐾𝑣
𝑤𝑘
∗𝑚𝑎𝑝 𝑇𝑢
𝑚𝑎𝑝 𝑇𝑣
𝑤𝑡
*𝑚𝑎𝑝 𝐴𝑢
𝑚𝑎𝑝 𝐴𝑣
𝑤𝑎
– The result of comparison tell us who is the best user!
– We sum the victories of each user and order them based on their total of victories!
cleyton.caetano.souza@copin.ufcg.edu.br 11
Evaluation
• We used a Quantitative Approach!
• Methodology
1. We selected 160 questions and their answers published on Twitter
2. We manually ranked the answers of each question based on their utility
cleyton.caetano.souza@copin.ufcg.edu.br 12
Evaluation
Question How Much it costs go to Disneyland?
Answer Answer Type Utility
I don’t know A unhelpful answer 1
I think @someone knows Indicating someone or some source 2
Between $1000 and $2000 A uncertainty answer 3
I was last year and I spent $700 A direct answer 4
• We manually ranked the answers of each question based on their utility
• We used as tie-breaker the order in which the answers were given
cleyton.caetano.souza@copin.ufcg.edu.br 13
Evaluation
• Methodology 4. We crawled information about their questioners and
answerers (user profile, followers set, following set, tweets);
5. We ranked the answerers using our proposal
6. We compared both ranks using nDCG
• Our aim is answer the following questions – Does our Model perform well to predict the utility of
the answers?
– Does WPM reach better performance than the use of criteria individually?
cleyton.caetano.souza@copin.ufcg.edu.br 14
Results Question Type [Morris et al.] Amount of Questions Mean of nDCG
Recommendation 56
0,92 ± 0,23
Opinion 17
0,83 ± 0,31
Factual Knowledge 40
0,91 ± 0,26
Rhetorical 15
0,90 ± 0,25
Invitation 3
0,99 ± 0,01
Favor 8
1,00 ± 0,00
Social connection 12
0,87 ± 0,28
Offer 9
0,84 ± 0,31
Mean 160 0,90
cleyton.caetano.souza@copin.ufcg.edu.br 15
Does our Model perform well to predict the aptitude of the expert candidates?
• Promising results
– We reach a mean of nDCG bigger than 0.9;
– A one-tailed binomial test statically confirmed that QR model predicted the ideal rank in more than 64% of cases (p-value= 0.03219 and α=5%);
• An improvement in comparison with [Souza et al. 2012]
cleyton.caetano.souza@copin.ufcg.edu.br 16
Does WPM reach better performance than the use of individually criteria?
cleyton.caetano.souza@copin.ufcg.edu.br 17
Figure 1: Boxplot comparing WPM with Individually Criterion
Does WPM reach better performance than the use of individually criteria?
Hypotheses P-value Conclusion
WPM has a nDCG distribution better than Knowledge 1.357e-15 True
WPM has a nDCG distribution better than Activity 6.701e-16 True
WPM has a nDCG distribution better than Trust 4.025e-16 True
• We performed a pairwise comparison using Wilcoxon Signed Rank Test (α=5%)
cleyton.caetano.souza@copin.ufcg.edu.br 18
Treats to Validity
• Evaluation Methodology
• Few Questions
• Manually order answers
cleyton.caetano.souza@copin.ufcg.edu.br 19
Conclusion & Future Work
• We proposed a QR Model for Twitter – We achieved promising results in a young field – We confirmed the superiority of WPM use – We created a public dataset for future research in the
area
• Future Work – Is directing questions to experts more effective than
sharing questions? – How is the relationship between the weights given to
criteria with the qualities (truth, intimacy, speed) of the received answer?
cleyton.caetano.souza@copin.ufcg.edu.br 20
References • M. Morris, J. Teevan, and K. Panovich, “What do people ask their social networks, and
why?: a survey study of status message q&a behavior”, Proceedings of the 28th ACM International Conference on Human Factors in Computing Systems, 2010, pp. 1739–1748
• J. Nichols, and J. Kang. “Asking questions of targeted strangers on social networks”. Proceedings of the ACM Conference on Computer Supported Cooperative Work, 2012, pp. 999–1002.
• S. Paul, L. Hong, and E. Chi, “Is Twitter a good place for asking questions? a characterization study”. Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, 2011, pp. 578–581.
• C. Souza, J. Magalhães and E. Costa. “A Formal Model to the Routing Questions Problem in the Context of Twitter”. Proceedings of the IADIS International Conference WWW/Internet, 2011 .
• C. Souza, J. Magalhães, E. Costa e J. Fechine. “Predicting Potential Responders in Twitter : A Query Routing Algorithm”. Proceedings of the 12th International Conference on Computational Science and Its Applications, 2012, pp. 714–729.
• E. Triantaphyllou, and S. Mann, “An examination of the effectiveness of multi-dimensional decision-making methods: A decision-making paradox,” Decision Support Systems, vol. 5, 1989, pp. 303–312
cleyton.caetano.souza@copin.ufcg.edu.br 21
Questions?
cleyton.caetano.souza@copin.ufcg.edu.br 22