Lecture: Semantic Word Clouds
-
Upload
marina-santini -
Category
Education
-
view
493 -
download
4
Transcript of Lecture: Semantic Word Clouds
![Page 1: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/1.jpg)
Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2014/sais_2014.htm
Semantic Word Clouds
Marina San(ni [email protected]
Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden
Autumn 2014
1 Lect 10: Seman(c Word Clouds
![Page 2: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/2.jpg)
Acknowledgements
• Some slides borrowed from Sergey Pupyrev.
Lect 10: Seman(c Word Clouds 2
![Page 3: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/3.jpg)
Outline
• Word Clouds • 3 early algorithms • 3 new algorithms • Metrics & Quan(ta(ve Evalua(on
Lect 10: Seman(c Word Clouds 3
![Page 4: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/4.jpg)
Word Clouds
• Word clouds have become a standard tool for abstrac(ng, visualizing and comparing texts…
• We could apply the same or similar techniques to the huge amonts of tags produced by users interac(ng in the social networks
Lect 10: Seman(c Word Clouds 4
![Page 5: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/5.jpg)
Comparison & conceptualiza(on Tool
Lect 10: Seman(c Word Clouds 5
• Word Clouds as a tool for ”conceptualizing” documents. Cf Ontologies
• Ex: 2008, comparison of speeches: Obama vs McCain
![Page 6: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/6.jpg)
Word Clouds and Tag Clouds…
• … are oVen used to represent importance among terms (ex, band popularity) or serve as a naviga(on tool (ex, Google search results).
Lect 10: Seman(c Word Clouds 6
![Page 7: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/7.jpg)
The Problem…
• How to compute seman(c-‐preserving word clouds in which seman(cally-‐related words are close to each other.
Lect 10: Seman(c Word Clouds 7
![Page 8: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/8.jpg)
Wordle h^p://www.wordle.net
• Prac(cal tools, like Wordle, make word cloud visualiza(on easy.
• Shortoming: they do not capture the rela(onships between words in any way
Lect 10: Seman(c Word Clouds 8
![Page 9: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/9.jpg)
Many word clouds are arranged randomly (look also at the sca^ered colours)
Lect 10: Seman(c Word Clouds 9
![Page 10: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/10.jpg)
Seman(c Pa^erns
• Humans ins(nc(vely tend to pick up pa^erns
• Ins(nc(vely, one could say that two words that are close to each other in a word cloud are seman(cally related.
Lect 10: Seman(c Word Clouds 10
![Page 11: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/11.jpg)
So, it makes sense to place such related words close to each other (look also at the color distribu(on)
Lect 10: Seman(c Word Clouds 11
![Page 12: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/12.jpg)
In linguis(cs and in LT…
• … if a pair of words oVen appear together in a sentence, then we can assume that this pair of words is related seman(cally.
Lect 10: Seman(c Word Clouds 12
![Page 13: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/13.jpg)
Seman(c word clouds have higher user sa(sfac(on compared to other layouts…
Lect 10: Seman(c Word Clouds 13
![Page 14: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/14.jpg)
All recent word cloud visualiza(on tools aim to incoprorate seman(cs in the layout…
Lect 10: Seman(c Word Clouds 14
![Page 15: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/15.jpg)
… but none of them provide any guarantee about the quality of the layout in terms of seman(cs
Lect 10: Seman(c Word Clouds 15
![Page 16: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/16.jpg)
Early algorithms: Force-‐Directed Graph
• Most of the exis(ng algorithms are based on force-‐directed graph layout.
• Force-‐directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthe(cally pleasing way
– A^rac(ve forces between pairs to reduce empty space
– Repulsive forces ensure that words do not overlap
– Final force preserve seman(c rela(ons between words.
Lect 10: Seman(c Word Clouds 16
Force-‐directed graph drawing algorithms assign forces among the set of edges and the set of nodes of a graph drawing. Typically, spring-‐like a^rac(ve forces based on Hooke's law are used to a^ract pairs of endpoints of the graph's edges towards each other, while simultaneously repulsive forces like those of electrically charged par(cles based on Coulomb's law are used to separate all pairs of nodes.
![Page 17: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/17.jpg)
Newer Algorithms: rectangle representa(on of graphs
• Vertex-‐weighted and edge-‐weighed graph: – The ver(ces of the graph are the words
• Their weight correspond to some measure of importance (eg. word frequencies)
– The edges capture the seman(c relatedness of pair of words (eg. co-‐occurrence) • Their weight correspond to the strength of the rela(on
– Each vertex can be drawn as a box (rectangle) with a dimension determing by its weight
– A realized adjacency is the sum of the edge weights for all pairs of touching boxes.
– The goal is to maximize the realized adjacencies.
Lect 10: Seman(c Word Clouds 17
![Page 18: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/18.jpg)
Experimental Setup: 1) Term Extrac(on 2) Ranking 3) Similarity Conputa(on
Lect 10: Seman(c Word Clouds 18
![Page 19: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/19.jpg)
Early Algorithms
1. Wordle (Random) 2. Context-‐Preserving Word Cloud Visualiza(on
(CPWCV) 3. Seam Carving
Lect 10: Seman(c Word Clouds 19
![Page 20: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/20.jpg)
Wordle à Random
• The Wordle algorithm places one word at a (me in a greedy fashion, aiming to use space as efficiently as possible.
• First the words are sorted by weight in decreasing order.
• Then for each word in the order, a posi(on is picked at random.
Lect 10: Seman(c Word Clouds 20
![Page 21: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/21.jpg)
1: Random
Lect 10: Seman(c Word Clouds 21
![Page 22: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/22.jpg)
2: Random
Lect 10: Seman(c Word Clouds 22
![Page 23: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/23.jpg)
3: Random
Lect 10: Seman(c Word Clouds 23
![Page 24: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/24.jpg)
4: Random
Lect 10: Seman(c Word Clouds 24
![Page 25: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/25.jpg)
5: Random
Lect 10: Seman(c Word Clouds 25
![Page 26: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/26.jpg)
6: Random
Lect 10: Seman(c Word Clouds 26
![Page 27: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/27.jpg)
Context-‐Preserving Word Cloud Visualiza(on (CPWCV)
• First, a dissimilarity matrix is computed and Mul(dimensional Scaling (MDS) is performed
• Second, effort to create a compact layout
Lect 10: Seman(c Word Clouds 27
Mul(dimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset.
![Page 28: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/28.jpg)
1: Context-‐Preserving
Lect 10: Seman(c Word Clouds 28
![Page 29: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/29.jpg)
2: Context-‐Preserving : repulsive force
Lect 10: Seman(c Word Clouds 29
![Page 30: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/30.jpg)
3: Context-‐Preserving : a^rac(ve force
Lect 10: Seman(c Word Clouds 30
![Page 31: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/31.jpg)
Seam Carving
• Seam carving is a content-‐aware image resizing technique
• Basically, an algorithm for image resizing
• It was invented at Mitsubishi’s
Lect 10: Seman(c Word Clouds 31
![Page 32: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/32.jpg)
1: Seam Carving
Lect 10: Seman(c Word Clouds 32
![Page 33: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/33.jpg)
2: Seam Carving : space is divided into regions
Lect 10: Seman(c Word Clouds 33
![Page 34: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/34.jpg)
3: Seam Carving : empty paths trimmed out itera(vely
Lect 10: Seman(c Word Clouds 34
![Page 35: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/35.jpg)
4: Seam Carving
Lect 10: Seman(c Word Clouds 35
![Page 36: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/36.jpg)
5: Seam Carving
Lect 10: Seman(c Word Clouds 36
![Page 37: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/37.jpg)
6: Seam Carving: space divided into regions
Lect 10: Seman(c Word Clouds 37
![Page 38: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/38.jpg)
7: Seam Carving
Lect 10: Seman(c Word Clouds 38
![Page 39: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/39.jpg)
3 New Algorithms
1. Inflate and Push 2. Star Forest 3. Cycle Cover
Lect 10: Seman(c Word Clouds 39
![Page 40: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/40.jpg)
Inflate-‐and-‐Push
• Simple heuris(c method for word layout, which aims to preserve seman(c rela(ons between pair of words.
Lect 10: Seman(c Word Clouds 40
![Page 41: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/41.jpg)
1: Inflate
Lect 10: Seman(c Word Clouds 41
![Page 42: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/42.jpg)
2: Inflate : scaling down
Lect 10: Seman(c Word Clouds 42
![Page 43: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/43.jpg)
3: Inflate : seman(cally-‐related words are placed close to each other
Lect 10: Seman(c Word Clouds 43
![Page 44: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/44.jpg)
4: Inflate : repulsive force to resolve overlaps
Lect 10: Seman(c Word Clouds 44
![Page 45: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/45.jpg)
5: Inflate
Lect 10: Seman(c Word Clouds 45
![Page 46: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/46.jpg)
Star Forest
• A star is a tree and a star forest is a forest whose connected components are all stars.
Lect 10: Seman(c Word Clouds 46
![Page 47: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/47.jpg)
Star Forest : star = graph • Dissimilarity matrix à disjoint stars = star forest • A^rac(ve force to get a compact layout
Lect 10: Seman(c Word Clouds 47
![Page 48: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/48.jpg)
Cycle Cover • This algorithm is based on a similarity matrix. • First, a similarity path(=cycle) is created • Then, the op(mal level of compact-‐ness is computed
Lect 10: Seman(c Word Clouds 48
![Page 49: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/49.jpg)
Quan(ta(ve Metrics
Lect 10: Seman(c Word Clouds 49
![Page 50: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/50.jpg)
Criteria 1. Realized Adjacenies – how close are similar words to each other?
2. Distor(on – how distant are dissimilar words?
3. Comptactness – how well u(lized is the drawing area?
4. Uniform Area U(liza(on – uniformity of the distribu(on (overpopulated vs sparse areas
in the word cloud) 5. Aspect Ra(o – width and height of the bounding box
6. Running Time – execu(on (me
Lect 10: Seman(c Word Clouds 50
![Page 51: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/51.jpg)
2 datasets
(1) WIKI , a set of 112 plain-‐text ar(cles extracted from the English Wikipedia, each consis(ng of at least 200 dis(nct words (2) PAPERS , a set of 56 research papers published in conferences on experimental algorithms (SEA and ALENEX) in 2011-‐2012.
Lect 10: Seman(c Word Clouds 51
![Page 52: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/52.jpg)
Cycle Cover wins
Lect 10: Seman(c Word Clouds 52
![Page 53: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/53.jpg)
Seam Carving wins
Lect 10: Seman(c Word Clouds 53
![Page 54: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/54.jpg)
Random wins
Lect 10: Seman(c Word Clouds 54
![Page 55: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/55.jpg)
Inflate wins
Lect 10: Seman(c Word Clouds 55
![Page 56: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/56.jpg)
Random and Seam Carving win
Lect 10: Seman(c Word Clouds 56
![Page 57: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/57.jpg)
All ok except Seam Carving
Lect 10: Seman(c Word Clouds 57
![Page 58: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/58.jpg)
Demo
Lect 10: Seman(c Word Clouds 58
![Page 59: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/59.jpg)
Final Words
Lect 10: Seman(c Word Clouds 59
![Page 60: Lecture: Semantic Word Clouds](https://reader033.fdocuments.us/reader033/viewer/2022052912/55a203461a28ab33268b4826/html5/thumbnails/60.jpg)
The end
60 Lect 10: Seman(c Word Clouds