Download - A Query Routing Model to Rank Expertcandidates on Twitter

A Query Routing Model to Rank

Expert Candidates on Twitter

Cleyton Souza, Jonathas Magalhães, Evandro Costa and

Joseana Fechine LIA - Laboratory of Artificial Intelligence

UFCG - Federal University of Campina Grande Campina Grande - Brazil

Introduction

• What is Social Query?

– It is the process of asking questions trough social

media (e.g., Twitter, Facebook, etc.)! [Morris et al.]

– The common strategy is sharing the question with everyone, but this way there is no guarantee that you will receive a good and quick answer

• Directing questions to someone is more efficient.

• What is Query Routing?

– It is the process of directing questions to appropriate

answerers (people able to help)!

[email protected] 2

Introduction

• What are we proposing?

– A Query Routing Model: a technique that finds the most suitable person to help you based on knowledge, trust and activity.

– We are focusing in the Twitter context!

[email protected] 3

A Query Routing Model to Rank Expert

Candidates on Twitter

Agenda

• Introduction

• Related Work

• Proposal

• Evaluation

– Methodology

– Results

– Treats to Validity

• Conclusion & Future Work

[email protected] 4

Related Word (1/2)

• What are the differentials of our proposal to Previous Work? – Context – We are focusing on a Social Network

Context;

• While previous work focused on Community Question and Answering context…

• Why did we choose Twitter? – It is one of the most popular Online Social Networks;

– Less than 18% percent of questions asked on Twitter are answered [Paul et al.];

– [Nichols and Kang] confirmed that directing questions significantly improve the response rate;

[email protected] 5

Related Word (2/2)

• What are the differentials of our proposal to Previous Work? – Problem – We are leading with the Query Routing

problem as a Multi-criteria Decision Making Problem (Weight Product Model – WPM); • While previous work applied mainly probabilistic

models…

• Why did we choose WPM? – [Triantaphyllou and Mann] confirmed that for problems with

dependence up to three variables, WPM achieves the best performance

[email protected] 6

Proposal

• Some user on Twitter has a question • Our model analyzes the question and orders his

followers based on three criteria (further details [Souza et al.]) – Knowledge (K) – using bag of words strategy; – Trust (T) – a combination of similarity and

conversation rate; – Activity (A) – mean latency time between

consecutive messages;

• What do we want? – We want to find the best combination of K, T and A!

[email protected] 7

Knowledge

• We want to ask someone who knows about the topic of the question

• We used Vector Space Model

– Users and question are represented by a vector of terms

– We match users and questions using cosine similarity between these vectors

[email protected] 8

Trust/Closeness

• Sometimes, we want receive answers from people close to us

• How we automatically discover these people

– We analyze the conversation rate between the questioner and each follower

– We analyze the followers set similarity between the questioner and each follower

– We established that trust is the product between conversation rate and followers set similarity

[email protected] 9

Activity

• Sometimes, we prefer a quick answer with low

quality instead a high quality answer but slow

• Our assumption is that people who produces a lot of content in a short time will provide quick answers

• Activity is a mean latency time between consecutive posts

[email protected] 10

Proposal

• How we compare the criteria configuration of the followers? – We use Weight Product Model - we compare two

users using the following function:

𝑐𝑜𝑚𝑝 𝑢, 𝑣 =𝑚𝑎𝑝 𝐾𝑢

𝑚𝑎𝑝 𝐾𝑣

𝑤𝑘

∗𝑚𝑎𝑝 𝑇𝑢

𝑚𝑎𝑝 𝑇𝑣

𝑤𝑡

*𝑚𝑎𝑝 𝐴𝑢

𝑚𝑎𝑝 𝐴𝑣

𝑤𝑎

– The result of comparison tell us who is the best user!

– We sum the victories of each user and order them based on their total of victories!


Evaluation

• We used a Quantitative Approach!

• Methodology

1. We selected 160 questions and their answers published on Twitter

2. We manually ranked the answers of each question based on their utility


Evaluation

Question How Much it costs go to Disneyland?

Answer Answer Type Utility

I don’t know A unhelpful answer 1

I think @someone knows Indicating someone or some source 2

Between $1000 and $2000 A uncertainty answer 3

I was last year and I spent $700 A direct answer 4

• We manually ranked the answers of each question based on their utility

• We used as tie-breaker the order in which the answers were given


Evaluation

• Methodology 4. We crawled information about their questioners and

answerers (user profile, followers set, following set, tweets);

5. We ranked the answerers using our proposal

6. We compared both ranks using nDCG

• Our aim is answer the following questions – Does our Model perform well to predict the utility of

the answers?

– Does WPM reach better performance than the use of criteria individually?


Results Question Type [Morris et al.] Amount of Questions Mean of nDCG

Recommendation 56

0,92 ± 0,23

Opinion 17

0,83 ± 0,31

Factual Knowledge 40

0,91 ± 0,26

Rhetorical 15

0,90 ± 0,25

Invitation 3

0,99 ± 0,01

Favor 8

1,00 ± 0,00

Social connection 12

0,87 ± 0,28

Offer 9

0,84 ± 0,31

Mean 160 0,90


Does our Model perform well to predict the aptitude of the expert candidates?

• Promising results

– We reach a mean of nDCG bigger than 0.9;

– A one-tailed binomial test statically confirmed that QR model predicted the ideal rank in more than 64% of cases (p-value= 0.03219 and α=5%);

• An improvement in comparison with [Souza et al. 2012]


Does WPM reach better performance than the use of individually criteria?


Figure 1: Boxplot comparing WPM with Individually Criterion

Does WPM reach better performance than the use of individually criteria?

Hypotheses P-value Conclusion

WPM has a nDCG distribution better than Knowledge 1.357e-15 True

WPM has a nDCG distribution better than Activity 6.701e-16 True

WPM has a nDCG distribution better than Trust 4.025e-16 True

• We performed a pairwise comparison using Wilcoxon Signed Rank Test (α=5%)


Treats to Validity

• Evaluation Methodology

• Few Questions

• Manually order answers


Conclusion & Future Work

• We proposed a QR Model for Twitter – We achieved promising results in a young field – We confirmed the superiority of WPM use – We created a public dataset for future research in the

area

• Future Work – Is directing questions to experts more effective than

sharing questions? – How is the relationship between the weights given to

criteria with the qualities (truth, intimacy, speed) of the received answer?


References • M. Morris, J. Teevan, and K. Panovich, “What do people ask their social networks, and

why?: a survey study of status message q&a behavior”, Proceedings of the 28th ACM International Conference on Human Factors in Computing Systems, 2010, pp. 1739–1748

• J. Nichols, and J. Kang. “Asking questions of targeted strangers on social networks”. Proceedings of the ACM Conference on Computer Supported Cooperative Work, 2012, pp. 999–1002.

• S. Paul, L. Hong, and E. Chi, “Is Twitter a good place for asking questions? a characterization study”. Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, 2011, pp. 578–581.

• C. Souza, J. Magalhães and E. Costa. “A Formal Model to the Routing Questions Problem in the Context of Twitter”. Proceedings of the IADIS International Conference WWW/Internet, 2011 .

• C. Souza, J. Magalhães, E. Costa e J. Fechine. “Predicting Potential Responders in Twitter : A Query Routing Algorithm”. Proceedings of the 12th International Conference on Computational Science and Its Applications, 2012, pp. 714–729.

• E. Triantaphyllou, and S. Mann, “An examination of the effectiveness of multi-dimensional decision-making methods: A decision-making paradox,” Decision Support Systems, vol. 5, 1989, pp. 303–312


Questions?