TextRank: Bringing Order into Texts
-
Upload
sharath-ts -
Category
Data & Analytics
-
view
91 -
download
0
Transcript of TextRank: Bringing Order into Texts
![Page 1: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/1.jpg)
TextRank: Bringing Order into Texts
Rada Mihalcea and Paul Tarau
Presented by :
Sharath T.S
Shubhangi Tandon
![Page 2: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/2.jpg)
The TextRank Algorithm
1. Identify text units that best define the task at hand,and add them as vertices in the graph.
2. Identify relations that connect such text units, and use these relations to draw edges between vertices in the graph. Edges can be directed or undirected, weighted or unweighted.
3. Iterate the graph-based ranking algorithm until convergence.
4. Sort vertices based on their final score. Use the values attached to each vertex for ranking/selection decisions.
![Page 3: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/3.jpg)
The TextRank Model
■ G = (V, E)■ V = Set of vertices , E = Set of Edges■ V(in) = Set of incoming edges■ V(out) = Set of outgoing edges■ d = damping factor■ In addition, W = set of edge weights ■ Note : For undirected graphs, V(in) = V(out)
![Page 4: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/4.jpg)
ConvergenceConvergence of 4 different kinds of graphs
with respect to directed/undirected and
weighted unweighted.
![Page 5: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/5.jpg)
KeyWord ExtractionHow is the graph built?
● Each word(lexical unit) is a node.● A co-occurrence relation, two vertices are connected if their
corresponding lexical units co-occur within a window of maximum words, where it can be set anywhere from 2 to 10 words.
![Page 6: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/6.jpg)
Example
![Page 7: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/7.jpg)
Results for Keyword Extraction
![Page 8: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/8.jpg)
Sentence Extraction
● Goal is to rank entire sentences, vertex = sentence. ● Co-occurrence cannot be used. Why ?● We need a new relation for our edges : Similarity. ● Measured as content overlap between two sentences( nodes).
![Page 9: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/9.jpg)
Evaluation● Single Document Summarisation ● Data : DUC (2002) , 567 news articles● Evaluation metrics :ROUGE ● Compared against 15 systems , including baseline provided by DUC
![Page 10: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/10.jpg)
Results● Highly Dense Graph● Output compared to human
summaries
![Page 11: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/11.jpg)
Comparison - TextRank and Opinosis● Both are unsupervised graphical algorithms● Both try to identify the regions most traversed node/path in a
graph(topics, content described most about)● TextRank uses node importances(as a word and sentence) for KeyWord
extraction and summarization whereas Opinosis uses path weights across nodes(words) to generate fine-grained summaries.
![Page 12: TextRank: Bringing Order into Texts](https://reader031.fdocuments.us/reader031/viewer/2022022202/58b893521a28ab3e3a8b6239/html5/thumbnails/12.jpg)
Observations1. Common pattern : usage of text-unit co-occurrence as a feature in all
supervised topic modelling algorithms ( LDA, BTM, TextRank )2. Future work : http://web.fi.uba.ar/~fbarrios/tprofesional/articulo-en.pdf3. Industry started :Included as a module in gensim