Recommending Questions Using the MDL-based Tree Cut Model

22
Recommending Questions Using the MDL-based Tree Cut Model Yunbo CAO, Huizhong DUAN, Chin-Yew LIN, Yong YU, and Hsiao- Wuen HON Natural Language Computing Group Microsoft Research Asia

description

Recommending Questions Using the MDL-based Tree Cut Model. Yunbo CAO , Huizhong DUAN, Chin-Yew LIN, Yong YU, and Hsiao-Wuen HON Natural Language Computing Group Microsoft Research Asia. Community-based Q&A Service. Question Search. Other Aspects about Hamburg or Berlin. - PowerPoint PPT Presentation

Transcript of Recommending Questions Using the MDL-based Tree Cut Model

Page 1: Recommending Questions Using the MDL-based Tree Cut Model

Recommending Questions Using the MDL-based Tree Cut Model

Yunbo CAO, Huizhong DUAN, Chin-Yew LIN, Yong YU, and Hsiao-Wuen HON

Natural Language Computing GroupMicrosoft Research Asia

Page 2: Recommending Questions Using the MDL-based Tree Cut Model

Community-based Q&A Service

Question Search

Other Aspects about Hamburg or Berlin

More Aspects (NOT DISCOVERED)How far is it from Berlin to Hamburg?Where to see between Hamburg and Berlin?…

Page 3: Recommending Questions Using the MDL-based Tree Cut Model

Question Recommendation• The problem– You ask:

• Any cool clubs in Berlin or Hamburg?– We recommend:

• How far is it from Berlin to Hamburg?• Where to see between Hamburg and Berlin?• Any good hostels in Hamburg or Berlin?

• The principle of question recommendation– A good recommendation should be different from

the queried question in question focus but similar in question topic.

Page 4: Recommending Questions Using the MDL-based Tree Cut Model

Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation

• Related work• Experimental results• Conclusions

Page 5: Recommending Questions Using the MDL-based Tree Cut Model

Our ApproachThe Principle: A good recommendation should be different from the queried question in question focus but similar in question topic.

Query: Any cool clubs in Hamburg or Berlin?

Related question: where to see in Hamburg or Berlin

Topic terms: cool clubs, Hamburg, Berlin

different Same or closeHow can we discriminate question topic

from question focus?Topic terms: where to see, Hamburg, Berlin

Page 6: Recommending Questions Using the MDL-based Tree Cut Model

Specificity – Weighing TermsTravel @Yahoo! Answers

Asia Pacific

Europe

China

Japan

Travel @Yahoo! Answers

Asia Pacific

Europe

China

Japan

China1. Anyone know where to see the Dragon Boat

Festival in Beijing? 2. Where is a good (Less expensive) place to shop in

Beijing? 3. What's the cheapest way to get from Beijing to

Hong Kong?

Europe1. How far is it from Berlin to Hamburg?2. What is the cheapest way from Berlin to

Hamburg?3. Where to see between Hamburg and Berlin?4. How long does it take from Hamburg to Berlin?n

the train?

The specificity of a topic term is the inverse entropy of the distribution of the topic term over the sub-categories.

Page 7: Recommending Questions Using the MDL-based Tree Cut Model

Order Topic Terms by SpecificityQuery: Any cool clubs in Hamburg or Berlin?

Topic Terms: cool clubs, Hamburg, Berlin

Topic Terms: where to see, Hamburg, Berlin

Topic Chain: Hamburg Berlin cool clubs

Topic Chain: Hamburg Berlin where to see

Hamburg Berlincool clubswhere to seehow far

Related questions: Where to see in Hamburg or Berlin? How far is it from Berlin to Hamburg?

Hamburg Berlin how far

Question Topic

Question Focus

Page 8: Recommending Questions Using the MDL-based Tree Cut Model

Scoring the Candidates• The recommendation score over a queried question

and a recommendation candidate is defined as

where

)|~( qqr q

q~

Question Topic Question Focus

c

c

qtqtc

cc ttPMItsq

qqsim11

22),(max)(

1)|( 211

1

12

))(|)~(()1())(|)~(()|~( qTqTsimqHqHsimqqr

Page 9: Recommending Questions Using the MDL-based Tree Cut Model

• The MDL principle– Model description length: uniform prior– Parameter description length: number of parameters– Data description length: minus log likelihood

• The tree cut model (Li and Abe, 1998)

The MDL-based Tree Cut Model

11n

0n

12n 13n

21n 22n 23n 24n

Page 10: Recommending Questions Using the MDL-based Tree Cut Model

Reduction of Topic Terms

Topic Term Frequencyhotel 3983

suite hotel 3embassy suite hotel 1

nice suite hotel 2western hotel 40

good western hotel 14inexpensive western hotel 12

beachfront hotel 5good beachfront hotel 3great beachfront hotel 3

nice hotel 224affordable hotel 48

Page 11: Recommending Questions Using the MDL-based Tree Cut Model

Reduction of Topic Terms

hotel (3983)

nice (224) suite (3) western (40)

embassy (1)

beachfront (5) affordable (248)

nice (2)

good (14) inexpensive (12)

good (3) great (3)

hotel (3983)

nice (224) suite (6) western (66) beachfront (11) affordable (248)

good (14) inexpensive (12)

Page 12: Recommending Questions Using the MDL-based Tree Cut Model

Determining the CutAny cool clubs in Berlin or Hamburg?

cool club

Hamburg

Berlin where to see

how fargood hostel

Where to see between Hamburg and Berlin?How far is it from Berlin to Hamburg?Any good hostels in Hamburg or Berlin?What are the best/most fun clubs in Hamburg?

fun club

Page 13: Recommending Questions Using the MDL-based Tree Cut Model

Flow of Question Recommendation Query: any cool clubs in Berlin or Hamburg?

Index

STEP 1:Retrieve Related

Questions

Related Questions: 1. Where to see between Hamburg and Berlin? 2. How far is it from Berlin to Hamburg? 3. Any good hostels in Hamburg or Berlin? 4. What are the most/best fun club in Hamburg?

cool club

Hamburg

Berlin where to see

how fargood hostel

fun club

STEP 3:Rank Questions

on the basis of the cut

STEP 2: Discriminate Question Topic from Question Focus

Page 14: Recommending Questions Using the MDL-based Tree Cut Model

Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation

• Related work• Experimental results• Conclusions

Page 15: Recommending Questions Using the MDL-based Tree Cut Model

Related Work• Question search (Jeon et al., 2005; Sneiders, 2002; Lai et al., 2002;

Burke et al., 1997)– Find semantically equivalent questions given queries

• Satisfying different users’ needs when compared to question recommendation

• Query suggestion (Cuerzan & White, 2007; Jensen et al., 2006; Fonseca et al., 2003)– Suggest related queries through query log mining

• Query logs are usually absent for questions• Query substitution (Jones et al., 2006)– Generate queries by replacing query terms

• New queries are close to the original queries

Page 16: Recommending Questions Using the MDL-based Tree Cut Model

Outline• Question recommendation• Our approach– A walk-through of our approach– The uses of the MDL-based tree cut model– The flow of question recommendation

• Related work• Experimental results• Conclusions

Page 17: Recommending Questions Using the MDL-based Tree Cut Model

Data and Evaluation Measures• The data– The resolved question from Yahoo! Answers

314,616 about ‘travel’ and 210,785 about ‘computers & internet’

• The test set developed via human judgments# Queries # Returned # Relevant

TRAVEL TST 100 2,000 405COM-INT TST 100 2,000 386

Page 18: Recommending Questions Using the MDL-based Tree Cut Model

Experimental Results (Basic)• Travel

• Computers & Internet

Methods R-Precision P@5 MAPVSM 0.235 0.226 0.321

PVSM 0.276 0.234 0.291Our approach 0.324 0.290 0.350

Methods R-Precision P@5 MAPVSM 0.216 0.200 0.307

PVSM 0.242 0.214 0.257Our approach 0.316 0.248 0.316

Page 19: Recommending Questions Using the MDL-based Tree Cut Model

Experimental Results (Basic)

Methods Recommendations

VSM 1. What is a clean, cheap hotel near the downtown Chicago?2. What is a good cheap hotel/motel near Disneyland?3. What is a good cheap motel in Tapei?

PVSM 1. What is a clean, cheap hotel near the downtown Chicago?2. Is there any cheap hotel/motel in Alberta?3. What is there to do in downtown Chicago?

Our approach 1. What is there to do in downtown Chicago?2. What are some fun cheap/free things to do & see in

downtown Chicago?3. What's the cost of a cab in downtown Chicago?

What's a good but cheap hotel/motel/anything in downtown Chicago?

Page 20: Recommending Questions Using the MDL-based Tree Cut Model

Effectiveness of MDL• The baseline methods

– First = our approach – the MDL-based reduction of topic terms– Second = our approach – the MDL-based discrimination bet. question

topic and question focus– Third = our approach – the MDL-based reduction of topic terms – the

MDL-based discrimination bet. question topic and question focus

• The use of the MDL is significant– The size of the vocabulary is 289,251 before the reduction of topic terms

and 173,202 after the reduction. The reduction is about 40%.– The contribution given by the MDL-based selection of substitution is

statistically significant

Methods R-Precision P@5 MAPFirst 0.302 0.296 0.330

Second 0.209 0.198 0.234Third 0.153 0.156 0.172

Our approach 0.324 0.290 0.350

Page 21: Recommending Questions Using the MDL-based Tree Cut Model

Conclusions• Studied question recommendation by identifying

question topics and question foci• Used the MDL-based tree cut model for – Reducing the set of topic terms– Discriminating question topics from question foci

• Empirically verified the effectiveness of our approach to question recommendation

Page 22: Recommending Questions Using the MDL-based Tree Cut Model

Questions and Discussions!