TopicRank - Graph-Based Topic Ranking for Keyphrase...

75
TopicRank Graph-Based Topic Ranking for Keyphrase Extraction Adrien Bougouin Florian Boudin Béatrice Daille Université de Nantes, LINA, France 16 October 2013

Transcript of TopicRank - Graph-Based Topic Ranking for Keyphrase...

Page 1: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRankGraph-Based Topic Ranking for Keyphrase Extraction

Adrien Bougouin Florian Boudin Béatrice Daille

Université de Nantes, LINA, France

16 October 2013

Page 2: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionProblem statement

Keyphrases

� Word or multi-word expressions

� Overview of a document’s content

Applications

� Document indexing

� Document clustering

� Text summarization

� Query expansion

� Targeted advertising

� etc.

Lack of annotated documentsMany documents have no associated keyphrases.

1

Page 3: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionProblem statement

Keyphrases

� Word or multi-word expressions

� Overview of a document’s content

Applications

� Document indexing

� Document clustering

� Text summarization

� Query expansion

� Targeted advertising

� etc.

Lack of annotated documentsMany documents have no associated keyphrases.

1

Page 4: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionProblem statement

Keyphrases

� Word or multi-word expressions

� Overview of a document’s content

Applications

� Document indexing

� Document clustering

� Text summarization

� Query expansion

� Targeted advertising

� etc.

Lack of annotated documentsMany documents have no associated keyphrases.

1

Page 5: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionAutomatic keyphrase extraction

document Linguistic Preprocessing

Candidate Extraction

Candidate Classification

Ranking

Keyphrase Selection keyphrases

2

Page 6: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionAutomatic keyphrase extraction

document Linguistic Preprocessing

Candidate Extraction

Candidate Classification

Ranking

Keyphrase Selection keyphrases

supervised

unsupervised

2

Page 7: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionAutomatic keyphrase extraction

document Linguistic Preprocessing

Candidate Extraction

Candidate Classification

Ranking

Keyphrase Selection keyphrases

supervised

unsupervisedunsupervised

2

Page 8: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

IntroductionExample

Project Euclid and the role of research libraries in scholarly

publishing

Project Euclid, a joint electronic journal publishing initiativeof Cornell University Library and Duke University Press is dis-cussed in the broader contexts of the changing patterns of scholarlycommunication and the publishing scene of mathematics. Spe-cific aspects of the project such as partnerships and the creation ofan economic model are presented as well as what it takes to bea publisher. Libraries have gained important and relevant experiencethrough the creation and management of digital libraries, but theyneed to develop further skills if they want to adopt a new role in thelife cycle of scholarly communication.

3

Page 9: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

4

Page 10: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Tomokiyo and Hurst, 2003)

4

Page 11: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Liu et al., 2009)

4

Page 12: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Mihalcea and Tarau, 2004, TextRank)

4

Page 13: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Wan and Xiao, 2008; Tsatsaronis et al., 2010)

4

Page 14: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Wan and Xiao, 2008)

4

Page 15: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkUnsupervised methods

Mostly ranking technics using:

� language models

� clusters� or graphs of word co-occurrences

◮ weighted with co-occurrence number or semanticmeasure

◮ refined with similar documents◮ biased with topic probabilities

(Liu et al., 2010)

4

Page 16: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

Project Euclid and the role of research libraries in scholarly publishing

Project Euclid, a joint electronic journal publishing initiative of Cor-nell University Library and Duke University Press is discussed in thebroader contexts of the changing patterns of scholarly communica-tion and the publishing scene of mathematics. Specific aspects of theproject such as partnerships and the creation of an economic modelare presented as well as what it takes to be a publisher. Librarieshave gained important and relevant experience through the creationand management of digital libraries, but they need to develop furtherskills if they want to adopt a new role in the life cycle of scholarlycommunication.

5

Page 17: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

university

dukelibrary

press cornell

further

skills

scholarly

communication publishing

scene

initiative

journal

electronic

joint

new

role

digital

libraries research

relevant

experiencemodel

economic

specific

aspects

lifecycle

euclid

project

such

university

dukelibrary

press cornell

further

skills

scholarly

communication publishing

scene

initiative

journal

electronic

joint

new

role

digital

libraries research

relevant

experiencemodel

economic

specific

aspects

lifecycle

euclid

project

such

Generated Keyphrase

electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole

PageRank’s “voting” concept

High-scoring words contribute more to the score of theirconnected words.

5

Page 18: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

Generated Keyphrase

electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole

PageRank’s “voting” concept

High-scoring words contribute more to the score of theirconnected words.

5

Page 19: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

Generated Keyphrase

electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole

PageRank’s “voting” concept

High-scoring words contribute more to the score of theirconnected words.

5

Page 20: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

Generated Keyphrase

electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole

PageRank’s “voting” concept

High-scoring words contribute more to the score of theirconnected words.

5

Page 21: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

university2.378

duke0.655 library

0.655

press0.655

cornell0.655

further1.000

skills1.000

scholarly1.140

communication0.634

publishing2.121

scene0.601

initiative0.601

journal1.095

electronic1.163

joint0.644

new1.000

role1.000

digital0.770 libraries

1.459

research0.770

relevant1.000

experience1.000

model1.000

economic1.000

specific1.000

aspects1.000

life1.000

cycle1.000

euclid0.770

project1.459

such0.770

Generated Keyphrase

electronic journal publishingscholarly publishinglibrariesuniversityprojecteconomicrelevantrole

PageRank’s “voting” concept

High-scoring words contribute more to the score of theirconnected words.

5

Page 22: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Related WorkGraph-based approach: TextRank

Limitations� Word nodes

� Co-occurence window

� Several nodes for one topic

6

Page 23: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

This Work

Limitations of previous work

� Word nodes

� Co-occurence window

� Several nodes for one topic

Proposal

1 Topic nodes

2 Complete graph construction

7

Page 24: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

This Work

Limitations of previous work

� Word nodes

� Co-occurence window

� Several nodes for one topic

Proposal

1 Topic nodes

2 Complete graph construction

7

Page 25: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

This Work

Limitations of previous work

� Word nodes

� Co-occurence window

� Several nodes for one topic

Proposal

1 Topic nodes

2 Complete graph construction

7

Page 26: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Plan

1 TopicRank

2 Evaluation

3 Conclusion and Future Work

8

Page 27: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Plan

1 TopicRank

2 Evaluation

3 Conclusion and Future Work

9

Page 28: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

Project Euclid and the role of researchlibraries in scholarly publishing

Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.

10

Page 29: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

⇒ (NOUN|ADJ)+

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

Project Euclid and the role of researchlibraries in scholarly publishing

Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.

10

Page 30: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

⇒ (NOUN|ADJ)+

no linguistic knowledge

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

Project Euclid and the role of researchlibraries in scholarly publishing

Project Euclid, a joint electronic journal pub-lishing initiative of Cornell University Libraryand Duke University Press is discussed in thebroader contexts of the changing patterns ofscholarly communication and the publishingscene of mathematics. Specific aspects of theproject such as partnerships and the creationof an economic model are presented as wellas what it takes to be a publisher. Librarieshave gained important and relevant experi-ence through the creation and managementof digital libraries, but they need to developfurther skills if they want to adopt a new rolein the life cycle of scholarly communication.

10

Page 31: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

⇒ Hierarchical clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

ID Topic

C01 cornell university library; digital libraries;research libraries; libraries

C02 project euclid; project suchC03 publishing scene; scholarly publishing;

publisher

C04 role; new role ←− stem overlap≥ 1

4

C05 importantC06 scholarly communicationC07 further skillsC08 partnershipsC09 mathematicsC10 joint electronic journal publishing initiativeC11 contextsC12 specific aspectsC13 economic modelC14 duke university pressC15 relevant experienceC16 creationC17 life cycleC18 patternsC19 management

10

Page 32: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

⇒ Hierarchical clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

ID Topic

C01 cornell university library; digital libraries;research libraries; libraries

C02 project euclid; project suchC03 publishing scene; scholarly publishing;

publisher

C04 role; new role ←− stem overlap≥ 1

4

naive topic similarity

C05 importantC06 scholarly communicationC07 further skillsC08 partnershipsC09 mathematicsC10 joint electronic journal publishing initiativeC11 contextsC12 specific aspectsC13 economic modelC14 duke university pressC15 relevant experienceC16 creationC17 life cycleC18 patternsC19 management

10

Page 33: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

⇒ Complete graph

4 Topic ranking

5 Keyphrase selection

C01

C02

C03

C04

C05C06

C07

C08

C09

C10

C11

C12

C13

C14

C15C16

C17

C18

C19offset

positionweighting

C01

C02

C03

C04

C05C06

C07

C08

C09

C10

C11

C12

C13

C14

C15C16

C17

C18

C19

10

Page 34: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

⇒ PageRank’s scoring

5 Keyphrase selection

C01

2.673

C02

2.237

C03

2.285

C04

1.451C05

0.612

C06

1.017

C07

0.405

C08

0.717

C09

0.600

C10

0.749

C11

0.600

C12

0.750

C13

0.575

C14

0.669 C15

0.615

C16

1.112

C17

0.455

C18

0.697

C19

0.600

s ore(Ci) = (1−λ )+λ ×∑Cj 6=Ci

weight(Cj ,Ci)×score(Cj)

∑Ck 6=Cjweight(Cj ,Ck)

C01

2.673

C02

2.237

C03

2.285

C04

1.451C05

0.612

C06

1.017

C07

0.405

C08

0.717

C09

0.600

C10

0.749

C11

0.600

C12

0.750

C13

0.575

C14

0.669 C15

0.615

C16

1.112

C17

0.455

C18

0.697

C19

0.600

10

Page 35: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 36: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Project Euclid and the role ofresearch libraries in scholarlypublishing

Project Euclid, a joint elec-tronic journal publishing initia-tive of Cornell University Libraryand Duke University Press is dis-cussed in the broader contexts ofthe changing patterns of scholarlycommunication and the publishingscene of mathematics. [. . . ] Li-braries have gained important andrelevant experience through thecreation and management of digi-tal libraries, but they need to de-velop further skills if they want toadopt a new role in the life cycleof scholarly communication.

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 37: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 38: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Project Euclid and the role ofresearch libraries in scholarlypublishing

Project Euclid, a joint elec-tronic journal publishing initia-tive of Cornell University Libraryand Duke University Press is dis-cussed in the broader contexts ofthe changing patterns of scholarlycommunication and the publish-ing scene of mathematics. Spe-cific aspects of the project suchas partnerships and the creation ofan economic model are presentedas well as what it takes to be apublisher. [. . . ]

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 39: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 40: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Project Euclid and the role ofresearch libraries in scholarlypublishing

[. . . ] Specific aspects of theproject such as partnerships andthe creation of an economic modelare presented as well as what ittakes to be a publisher. [. . . ]

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 41: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 42: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Project Euclid and the role ofresearch libraries in scholarlypublishing

[. . . ] Libraries have gained im-portant and relevant experiencethrough the creation and manage-ment of digital libraries, but theyneed to develop further skills ifthey want to adopt a new role inthe life cycle of scholarly commu-nication.

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 43: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 44: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Rank ID Topic

01 C01 cornell university library; digital libraries;research libraries; libraries

02 C03 publishing scene; scholarly publishing;publisher

03 C02 project euclid; project such04 C04 role; new role05 C16 creation06 C06 scholarly communication07 C09 mathematics08 C12 specific aspects09 C10 joint electronic journal publishing initiative10 C08 partnerships. . . . . .

10

Page 45: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Keyphrase

research librariesscholarly publishingproject euclidrolecreationscholarly communicationmathematicsspecific aspectsjoint electronic journal publishing initiativepartnerships. . .

10

Page 46: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

TopicRank

1 Candidate extraction

2 Candidate clustering

3 Graph construction

4 Topic ranking

5 Keyphrase selection

⇒ First appearing one

Keyphrase

research librariesscholarly publishingproject euclidrolecreationscholarly communicationmathematicsspecific aspectsjoint electronic journal publishing initiativepartnerships. . .

10

Page 47: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Plan

1 TopicRank

2 Evaluation

3 Conclusion and Future Work

11

Page 48: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationDatasets

Two English datasets:� Inspec contains 500 abstracts of journal papers

◮ 136.3 tokens/document

� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document

Two French datasets:� WikiNews contains 100 news articles

◮ 309.6 tokens/document

� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document

12

Page 49: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationDatasets

Two English datasets:� Inspec contains 500 abstracts of journal papers

◮ 136.3 tokens/document

� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document

Two French datasets:� WikiNews contains 100 news articles

◮ 309.6 tokens/document

� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document

12

Page 50: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationDatasets

Two English datasets:� Inspec contains 500 abstracts of journal papers

◮ 136.3 tokens/document

� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document

Two French datasets:� WikiNews contains 100 news articles

◮ 309.6 tokens/document

� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document

12

Page 51: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationDatasets

Two English datasets:� Inspec contains 500 abstracts of journal papers

◮ 136.3 tokens/document

� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document

Two French datasets:� WikiNews contains 100 news articles

◮ 309.6 tokens/document

� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document

12

Page 52: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationDatasets

Two English datasets:� Inspec contains 500 abstracts of journal papers

◮ 136.3 tokens/document

� SemEval (2010) contains 100 scientific papers◮ 5179.6 tokens/document

Two French datasets:� WikiNews contains 100 news articles

◮ 309.6 tokens/document

� DEFT (2012) contains 93 scientific papers◮ 6844.0 tokens/document

12

Page 53: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationBaselines

� TF-IDF weighting� TextRank

◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)

� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)

13

Page 54: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationBaselines

� TF-IDF weighting� TextRank

◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)

� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)

13

Page 55: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationBaselines

� TF-IDF weighting� TextRank

◮ Word co-occurrence graph with a window of 2◮ Keyphrase generation based on keywords (10-bests)

� SingleRank◮ Word co-occurrence graph with a window of 10◮ Candidate keyphrases scored by their words’ score (sum)

13

Page 56: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationMeasures

� Cut-off at 10 keyphrases

� F-score ⇒ compromise between precision and recall

f-score = (1+β 2)×precision× recall

(β 2×precision)+ recall

β = 1

� Problem of dealing with gold standard

⇒ Stemmed form comparisons

14

Page 57: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationMain results

Method Inspec SemEval WikiNews DEFT

TF-IDF 33.4 10.5 34.3 13.2TextRank 12.7 5.6 8.6 5.7

SingleRank 35.2 3.7 19.7 5.9TopicRank 27.9 12.1 35.6 15.1

� Improvement over TF-IDF

� Significant improvement over graph-based methods

� Performance loss on Inspec

15

Page 58: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationIndividual contributions

Method Inspec SemEval WikiNews DEFT

SingleRank 35.2 3.7 19.7 5.9

+phrases 22.1 8.0 28.9 13.5+topics 26.8 11.9 31.4 14.8

+complete 35.5 4.4 20.3 5.8

TopicRank 27.9 12.1 35.6 15.1

� Nodes: Topics > candidates > words� Complete graph ≥ co-occurrence graph� Contribution improve performances� The above statements are false on Inspec

16

Page 59: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

EvaluationKeyphrase selection

Keyphrase selection Inspec SemEval WikiNews DEFT

First position 27.9 12.1 35.6 15.1Frequency 26.8 1.4 26.2 2.5

Centroid 24.7 1.5 28.5 3.4

Upper bound 35.6 30.3 42.9 19.3

� Still room for improvement

17

Page 60: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Plan

1 TopicRank

2 Evaluation

3 Conclusion and Future Work

18

Page 61: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Conclusion and Future Work

What we have done:

� Proposed TopicRank

� Topic ranking instead of word ranking

� Complete graph

� Experiments conducted of four standard datasets

� Good results

� Promising upper bound results

Still to do:

� Experiment various topic identifications

� Provide a keyphrase selection strategy getting closerto the upper bound

19

Page 62: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Conclusion and Future Work

What we have done:

� Proposed TopicRank

� Topic ranking instead of word ranking

� Complete graph

� Experiments conducted of four standard datasets

� Good results

� Promising upper bound results

Still to do:

� Experiment various topic identifications

� Provide a keyphrase selection strategy getting closerto the upper bound

19

Page 63: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

Thank you

20

Page 64: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsCandidate Extraction

� Focusing on nounsand adjectives is“enough” for English

� Prepositions anddeterminers shouldalso be consideredfor French

StatisticCorpus

SemEval DEFT

Containing nouns 95.9% 79.3%Containing proper nouns 5.8% 16.8%

Containing adjectives 40.5% 28.8%Containing verbs 3.4% 0.5%

Containing adverbs 0.6% 0.5%Containing prepositions 1.2% 12.7%Containing determiners 0.0% 8.1%

Containing others 2.1% 5.8%

21

Page 65: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsCandidate Clustering

The hierarchical clustering is an iterative algorithm:

� Initial state: candidates keyphrases are clusters

� Clusters with the highest similarity are mergedtogether

� Clusters similarity is the average similarity betweentheir candidates ci :

similarity(c1,c2) =||stem(c1)∩ stem(c2)||

||stem(c1)∪ stem(c2)||

� A similarity threshold is set to 0.25

22

Page 66: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsGraph Construction

� Nodes are topics

� Every nodes are connected to each other

� Connections between topics are weighted by thesemantic strength between them

� Topics appearing close to each other have a highsemantic strength:

weight(ti , tj) = ∑ci∈ti

∑cj∈tj

dist(ci ,cj )

dist(ci ,cj) = ∑pi∈pos(ci )

∑pj∈pos(cj )

1

|pi −pj |

23

Page 67: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsGraph Construction

Inspec SemEval WikiNews DEFT

clusters/documents 20.9 272.4 52.4 546.5

24

Page 68: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsTopic Ranking

PageRank’s “voting” concept

High-scoring topics contribute more to the score of theirconnected topics.

score(Ci) = (1−λ )+λ × ∑Cj 6=Ci

weight(Ci ,Cj)× score(Cj)

∑Ck 6=Cj

weight(Cj ,Ck)

λ = 0.85

25

Page 69: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsMain Results

MethodInspec SemEval WikiNews DEFT

P R F P R F P R F P R F

TF-IDF 32.7 38.6 33.4 13.2 8.9 10.5 33.9 35.9 34.3 10.3 19.1 13.2TextRank 14.2 12.5 12.7 7.9 4.5 5.6 9.3 8.3 8.6 4.9 7.1 5.7

SingleRank 34.8 40.4 35.2 4.6 3.2 3.7 19.4 20.7 19.7 4.5 9.0 5.9TopicRank 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1

26

Page 70: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsContributions Evaluation

MethodInspec SemEval WikiNews DEFT

P R F P R F P R F P R F

SingleRank 34.8 40.4 35.2 4.6 3.2 3.7 19.4 20.7 19.7 4.5 9.0 5.9

+phrases 21.5 25.9 22.1 9.6 7.0 8.0 28.6 30.1 28.9 10.5 19.7 13.5+topics 26.6 30.2 26.8 14.7 10.2 11.9 31.0 32.8 31.4 11.5 21.4 14.8

+complete 34.9 41.0 35.5 5.5 3.8 4.4 20.0 21.4 20.3 4.4 9.0 5.8

TopicRank 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1

27

Page 71: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

BackupsKeyphrase Selection Evaluation

Keyphrase selectionInspec SemEval WikiNews DEFT

P R F P R F P R F P R F

First position 27.6 31.5 27.9 14.9 10.3 12.1 35.0 37.5 35.6 11.7 21.7 15.1Frequency 26.7 30.2 26.8 1.7 1.2 1.4 25.7 27.6 26.2 1.9 3.8 2.5

Centroid 24.5 28.0 24.7 1.9 1.2 1.5 28.1 29.9 28.5 2.6 5.0 3.4

Upper bound 36.4 39.0 35.6 37.6 25.8 30.3 42.5 44.8 42.9 14.9 28.0 19.3

28

Page 72: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

References

Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun.Clustering to Find Exemplar Terms for KeyphraseExtraction. In Proceedings of the 2009 Conference onEmpirical Methods in Natural Language Processing:Volume 1, pages 257–266, Stroudsburg, PA, USA, 2009.Association for Computational Linguistics. ISBN978-1-932432-59-6. URLhttp://dl.acm.org/citation.cfm?id=1699510.169954

29

Page 73: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

ReferencesZhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong

Sun. Automatic Keyphrase Extraction Via TopicDecomposition. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Processing,pages 366–376, Stroudsburg, PA, USA, 2010.Association for Computational Linguistics. URLhttp://dl.acm.org/citation.cfm?id=1870658.187069

Rada Mihalcea and Paul Tarau. TextRank: BringingOrder Into Texts. In Dekang Lin and Dekai Wu, editors,Proceedings of the 2004 Conference on EmpiricalMethods in Natural Language Processing, pages404–411, Barcelona, Spain, July 2004. Association forComputational Linguistics.

30

Page 74: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

References

Takashi Tomokiyo and Matthew Hurst. A LanguageModel Approach to Keyphrase Extraction. InProceedings of the ACL 2003 Workshop on MultiwordExpressions: Analysis, Acquisition and Treatment -Volume 18, pages 33–40, Stroudsburg, PA, USA, 2003.Association for Computational Linguistics. URLhttp://dx.doi.org/10.3115/1119282.1119287.

George Tsatsaronis, Iraklis Varlamis, and Kjetil Nørvåg.SemanticRank: Ranking Keywords and Sentences UsingSemantic Graphs. In Proceedings of the 23rdInternational Conference on Computational Linguistics,pages 1074–1082, Stroudsburg, PA, USA, 2010.

31

Page 75: TopicRank - Graph-Based Topic Ranking for Keyphrase Extractionadrien-bougouin.github.io/publications/2013/topicrank_ijcnlp... · TopicRank Graph-Based Topic Ranking for Keyphrase

References

Association for Computational Linguistics. URLhttp://dl.acm.org/citation.cfm?id=1873781.187390

Xiaojun Wan and Jianguo Xiao. Single DocumentKeyphrase Extraction Using Neighborhood Knowledge.In Proceedings of the 23rd National Conference onArtificial Intelligence - Volume 2, pages 855–860. AAAIPress, 2008. ISBN 978-1-57735-368-3. URLhttp://dl.acm.org/citation.cfm?id=1620163.162020

32