Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin...
-
Upload
aileen-dennis -
Category
Documents
-
view
217 -
download
0
Transcript of Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin...
![Page 1: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
Presenter: WU, MIN-CONG
Authors: Zhiyuan Liu, Wenyi Huang,
Yabin Zheng and Maosong Sun
2010, ACM
Automatic Keyphrase Extraction via Topic Decomposition
![Page 2: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
1
![Page 3: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
Motivation• Existing graph-based ranking methods for
keyphrase extraction just compute a single
importance score for each word via a single
random walk.
• Motivated by the fact that both documents and
words can be represented by a mixture of
semantic topics.2
![Page 4: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
Objectives• We thus build a Topical PageRank (TPR) on word graph
to measure word importance with respect to different
topics.
• we further calculate the ranking scores of words and
extract the top ranked ones as keyphrases.
3
![Page 5: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
Methodology-Building Topic Interpreters
1
α, β from: ex: Gibbs sampling
Pr(w|z) ∈ ϕ(z) ∈ ϕ
θ
Pr(z|d) ∈θ (d)∈ θ
Document-topicTopic-wordLDA output:
![Page 6: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
Methodology- Topical PageRank for Keyphrase Extraction
1
![Page 7: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
Methodology- Constructing Word Graph Slide window size = 3
The document is regarded as a word sequence
1
![Page 8: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
Methodology- Topical PageRank(PageRank)
Define:
weight of link (wi,wj) as e(wi,wj)
1
![Page 9: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
Methodology- Topical PageRank(PageRank)
out-degree of vertex
equal probabilities of randomjump to all vertices.
1
![Page 10: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
Methodology- Topical PageRank
From LDA
1
=pr(w)*pr(z)/pr(z) focuses on word
=pr(z)*pr(w)/pr(w) focuses on topic
(Cohn and Chang, 2000).
![Page 11: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
Methodology- Extract Keyphrases Using Ranking Scores
1
Step1. annotate the document with POS tags.
Step2. select noun phrases.
Step3. compute the ranking scores of candidate keyphrases separately for each topic.
PageRank Topic PageRank
Step4. integrate topic-specific rankings of candidate keyphrases into a final ranking.
![Page 12: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
Experiment- Datasets Dataset:
1
Article keyphrases
NEWS 308 2488
RESEARCH 2000 19254
Topic model:build topic interpreters with LDA.
corpus Web page word topic
Wikipedia snapshot at March 2008
2122618 20000 50 to 1500
![Page 13: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
Experiment- Evaluation Metrics
1
However, precision/recall/F-measure does not take the order of extracted keyphrases into account.
The large value is better than small values.
The values is between 0 and 1.
![Page 14: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
Experiment- Influences of Parameters to TPR
1
Window Size W
The Number of Topics K
![Page 15: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
Experiment - Influences of Parameters to TPR
1
Damping Factor λ
Preference Values
=pr(w)*pr(z)/pr(z) focuses on word
=pr(z)*pr(w)/pr(w) focuses on topic
Ex.he 、 she
![Page 16: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
Experiment - Comparing with Baseline Methods
1
do not use topic information
TPR enjoys the advantages of both LDA and TFIDF/PageRank
![Page 17: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
Experiment - Extracting Example
1
![Page 18: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
Conclusions• Experiments on two datasets show that TPR achieves
better performance than other baseline methods.
1
![Page 19: Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Wenyi Huang, Yabin Zheng and Maosong Sun 2010, ACM Automatic Keyphrase Extraction.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649edb5503460f94bebb24/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
Comments• Advantages
– TPR incorporates topic information within random walk for keyphrase extraction.
• Applications– Automatic Keyphrase Extraction.
1