Recommending Questions Using the MDL-based Tree Cut Model
description
Transcript of Recommending Questions Using the MDL-based Tree Cut Model
Recommending Questions Using the MDL-based Tree Cut Model
Yunbo CAO, Huizhong DUAN, Chin-Yew LIN, Yong YU, and Hsiao-Wuen HON
Natural Language Computing GroupMicrosoft Research Asia
Community-based Q&A Service
Question Search
Other Aspects about Hamburg or Berlin
More Aspects (NOT DISCOVERED)How far is it from Berlin to Hamburg?Where to see between Hamburg and Berlin?…
Question Recommendation• The problem– You ask:
• Any cool clubs in Berlin or Hamburg?– We recommend:
• How far is it from Berlin to Hamburg?• Where to see between Hamburg and Berlin?• Any good hostels in Hamburg or Berlin?
• The principle of question recommendation– A good recommendation should be different from
the queried question in question focus but similar in question topic.
Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation
• Related work• Experimental results• Conclusions
Our ApproachThe Principle: A good recommendation should be different from the queried question in question focus but similar in question topic.
Query: Any cool clubs in Hamburg or Berlin?
Related question: where to see in Hamburg or Berlin
Topic terms: cool clubs, Hamburg, Berlin
different Same or closeHow can we discriminate question topic
from question focus?Topic terms: where to see, Hamburg, Berlin
Specificity – Weighing TermsTravel @Yahoo! Answers
Asia Pacific
Europe
…
China
Japan
…
Travel @Yahoo! Answers
Asia Pacific
Europe
…
China
Japan
…
China1. Anyone know where to see the Dragon Boat
Festival in Beijing? 2. Where is a good (Less expensive) place to shop in
Beijing? 3. What's the cheapest way to get from Beijing to
Hong Kong?
Europe1. How far is it from Berlin to Hamburg?2. What is the cheapest way from Berlin to
Hamburg?3. Where to see between Hamburg and Berlin?4. How long does it take from Hamburg to Berlin?n
the train?
The specificity of a topic term is the inverse entropy of the distribution of the topic term over the sub-categories.
Order Topic Terms by SpecificityQuery: Any cool clubs in Hamburg or Berlin?
Topic Terms: cool clubs, Hamburg, Berlin
Topic Terms: where to see, Hamburg, Berlin
Topic Chain: Hamburg Berlin cool clubs
Topic Chain: Hamburg Berlin where to see
Hamburg Berlincool clubswhere to seehow far
Related questions: Where to see in Hamburg or Berlin? How far is it from Berlin to Hamburg?
Hamburg Berlin how far
Question Topic
Question Focus
Scoring the Candidates• The recommendation score over a queried question
and a recommendation candidate is defined as
where
)|~( qqr q
q~
Question Topic Question Focus
c
c
qtqtc
cc ttPMItsq
qqsim11
22),(max)(
1)|( 211
1
12
))(|)~(()1())(|)~(()|~( qTqTsimqHqHsimqqr
• The MDL principle– Model description length: uniform prior– Parameter description length: number of parameters– Data description length: minus log likelihood
• The tree cut model (Li and Abe, 1998)
The MDL-based Tree Cut Model
11n
0n
12n 13n
21n 22n 23n 24n
Reduction of Topic Terms
Topic Term Frequencyhotel 3983
suite hotel 3embassy suite hotel 1
nice suite hotel 2western hotel 40
good western hotel 14inexpensive western hotel 12
beachfront hotel 5good beachfront hotel 3great beachfront hotel 3
nice hotel 224affordable hotel 48
Reduction of Topic Terms
hotel (3983)
nice (224) suite (3) western (40)
embassy (1)
beachfront (5) affordable (248)
nice (2)
good (14) inexpensive (12)
good (3) great (3)
hotel (3983)
nice (224) suite (6) western (66) beachfront (11) affordable (248)
good (14) inexpensive (12)
Determining the CutAny cool clubs in Berlin or Hamburg?
cool club
Hamburg
Berlin where to see
how fargood hostel
Where to see between Hamburg and Berlin?How far is it from Berlin to Hamburg?Any good hostels in Hamburg or Berlin?What are the best/most fun clubs in Hamburg?
fun club
Flow of Question Recommendation Query: any cool clubs in Berlin or Hamburg?
Index
STEP 1:Retrieve Related
Questions
Related Questions: 1. Where to see between Hamburg and Berlin? 2. How far is it from Berlin to Hamburg? 3. Any good hostels in Hamburg or Berlin? 4. What are the most/best fun club in Hamburg?
cool club
Hamburg
Berlin where to see
how fargood hostel
fun club
STEP 3:Rank Questions
on the basis of the cut
STEP 2: Discriminate Question Topic from Question Focus
Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation
• Related work• Experimental results• Conclusions
Related Work• Question search (Jeon et al., 2005; Sneiders, 2002; Lai et al., 2002;
Burke et al., 1997)– Find semantically equivalent questions given queries
• Satisfying different users’ needs when compared to question recommendation
• Query suggestion (Cuerzan & White, 2007; Jensen et al., 2006; Fonseca et al., 2003)– Suggest related queries through query log mining
• Query logs are usually absent for questions• Query substitution (Jones et al., 2006)– Generate queries by replacing query terms
• New queries are close to the original queries
Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation
• Related work• Experimental results• Conclusions
Data and Evaluation Measures• The data– The resolved question from Yahoo! Answers
314,616 about ‘travel’ and 210,785 about ‘computers & internet’
• The test set developed via human judgments# Queries # Returned # Relevant
TRAVEL TST 100 2,000 405COM-INT TST 100 2,000 386
Experimental Results (Basic)• Travel
• Computers & Internet
Methods R-Precision P@5 MAPVSM 0.235 0.226 0.321
PVSM 0.276 0.234 0.291Our approach 0.324 0.290 0.350
Methods R-Precision P@5 MAPVSM 0.216 0.200 0.307
PVSM 0.242 0.214 0.257Our approach 0.316 0.248 0.316
Experimental Results (Basic)
Methods Recommendations
VSM 1. What is a clean, cheap hotel near the downtown Chicago?2. What is a good cheap hotel/motel near Disneyland?3. What is a good cheap motel in Tapei?
PVSM 1. What is a clean, cheap hotel near the downtown Chicago?2. Is there any cheap hotel/motel in Alberta?3. What is there to do in downtown Chicago?
Our approach 1. What is there to do in downtown Chicago?2. What are some fun cheap/free things to do & see in
downtown Chicago?3. What's the cost of a cab in downtown Chicago?
What's a good but cheap hotel/motel/anything in downtown Chicago?
Effectiveness of MDL• The baseline methods
– First = our approach – the MDL-based reduction of topic terms– Second = our approach – the MDL-based discrimination bet. question
topic and question focus– Third = our approach – the MDL-based reduction of topic terms – the
MDL-based discrimination bet. question topic and question focus
• The use of the MDL is significant– The size of the vocabulary is 289,251 before the reduction of topic terms
and 173,202 after the reduction. The reduction is about 40%.– The contribution given by the MDL-based selection of substitution is
statistically significant
Methods R-Precision P@5 MAPFirst 0.302 0.296 0.330
Second 0.209 0.198 0.234Third 0.153 0.156 0.172
Our approach 0.324 0.290 0.350
Conclusions• Studied question recommendation by identifying
question topics and question foci• Used the MDL-based tree cut model for – Reducing the set of topic terms– Discriminating question topics from question foci
• Empirically verified the effectiveness of our approach to question recommendation
Questions and Discussions!