Finding question answer pairs from online forum
-
Upload
tk-loong -
Category
Data & Analytics
-
view
183 -
download
2
Transcript of Finding question answer pairs from online forum
![Page 1: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/1.jpg)
G. Cong, L. Wang, C. Lin, Y. Song, and Y. Sun. (2008)
Presenter: Tan Kent Loong r04944005
Finding Question-Answer Pairs from Online Forums
![Page 2: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/2.jpg)
![Page 3: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/3.jpg)
Motivation
● Forums contain a huge amount of user-generated content on a variety of topics.○ Knowledge of human○ Largely unstructured
![Page 4: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/4.jpg)
Flow Chart
Question Detection Answer DetectionThreads
Forum
![Page 5: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/5.jpg)
Question Detection● Rule based methods
○ 5W1H○ End with ' ? '
■ 30% questions do not end with question marks.● I am wondering where I can visit in Bangkok.● I am having doubt about changing tyre.
■ 9% are not questions● Like to enjoy a long walk while enjoying great
sights and tastes?● Only have three days to explore this city?
Not good!
![Page 6: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/6.jpg)
Labeled Sequential Pattern1. Pre-process each sentence into POS tags
“where can you find a job”→ “where can PRP VB DT NN”
2. Build sequence database.<a, d, e, f> → Q<a, f, e, f> → Q<d, a, f> → NQ
3. Calculate the support and confidence- <a, e, f> with support 66.7% and 100% confidence- <a, f> with support 66.7% and 66.7% confidence
1. Set minimum support threshold and minimum confidence threshold F1 score = 97.4%
![Page 7: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/7.jpg)
Answer Detection● Observation: Many-to-many
○ Multiple questions and answers within same thread.■ 1 question may have multiple replies.■ 1 post may contain answers to multiple
questions.
![Page 8: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/8.jpg)
● Treat as traditional document retrieval problem○ Cosine Similarity○ Query likelihood language model○ KL-divergence language model
● Classification method
Answer Detection
![Page 9: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/9.jpg)
Think of a “distance” between question language model and answer language model
p(w|Ma) :
p(w|Mq) :
Probability of keyword appeared in candidate answer.Probability of keyword appeared in question.
KL-divergence
![Page 10: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/10.jpg)
● Treat as traditional document retrieval problem○ Cosine Similarity○ Query likelihood language model○ KL-divergence language model
● Classification method
Cons: Do not consider the relationship of candidate answers and forum-specific features.
a1: world hotel is good but I prefer century hotel a2: world hotel has a very good restauranta2(generator) → a1(offspring)
Answer Detection
![Page 11: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/11.jpg)
PageRank (without hyperlinks)
![Page 12: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/12.jpg)
Graph-Based Propagation
![Page 13: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/13.jpg)
● Calculate weight based on○ Probability assigned by language model of
generating one candidate answer from the other candidate answer
○ The distance of candidate answer from question○ The authority of authors of candidate answer.
author(ag ; #reply2, #start)
Graph-Based Propagation
![Page 14: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/14.jpg)
Graph-Based Propagation
1. Propagation without initial score:
![Page 15: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/15.jpg)
Graph-Based Propagation
1. Propagation without initial score:
2. Propagation with initial score:
![Page 16: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/16.jpg)
Integration with other methods
1. Graph based propagation → classification2. Lexical mapping
e.g. “why → because”
![Page 17: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/17.jpg)
Evaluation
![Page 18: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/18.jpg)
Evaluation
![Page 19: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/19.jpg)
Evaluation
![Page 20: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/20.jpg)
Evaluation
![Page 21: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/21.jpg)
Summary
Question Detection(Labeled Sequence
Pattern)
Answer Detection(Enhance with Graph-based Propagation)
Threads
Forum
![Page 22: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/22.jpg)
Reference1. Finding question-answer pairs from online forum
http://research.microsoft.com/en-us/people/cyl/sigir2008-gao-msra.pdf
2. PageRank without hyperlinks: Structural re-ranking using links induced by language modelshttps://www.cs.cornell.edu/home/llee/papers/lmpagerank.home.html
![Page 23: Finding question answer pairs from online forum](https://reader035.fdocuments.us/reader035/viewer/2022070509/589bcf1c1a28ab92618b5533/html5/thumbnails/23.jpg)
Thank you