Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree...
Transcript of Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree...
![Page 1: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/1.jpg)
IFCS 2009Dresden – 17/03/2009
Visualising a text with a tree cloud
Philippe Gambette, Jean Véronis
![Page 2: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/2.jpg)
• Tag and word clouds
Outline
• Enhanced tag clouds• Tree clouds• Construction steps• Quality control
![Page 3: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/3.jpg)
• Built from a set of tags• Font size related to frequency
Tag clouds
What is considered the first tag cloud, from
D. Coupland: Microserfs, HarperCollins, Toronto,
1995
![Page 4: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/4.jpg)
• Gained popularity with Flickr
• Built from a set of tags• Font size related to frequency
Tag clouds
Flickr's all time most popular tags
![Page 5: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/5.jpg)
• Gained popularity with Wordle
• Built from a set of words from a text• Font size related to frequency
Word clouds
![Page 6: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/6.jpg)
• Gained popularity with Wordle
• Built from a set of words from a text• Font size related to frequency
Word clouds
GoogleImage(obama inaugural address wordle)
![Page 7: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/7.jpg)
• Gained popularity with Wordle
• Built from a set of words from a text• Font size related to frequency
Word clouds
GoogleImage(obama inaugural address wordle)
![Page 8: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/8.jpg)
• Gained popularity with Wordle
• Built from a set of words from a text• Font size related to frequency
Word clouds
GoogleImage(obama inaugural address wordle)
![Page 9: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/9.jpg)
• shared tags in red on del.icio.us
Add information from the text:• color intensity to express recency in Amazon
Enhanced tag / word clouds
• optimize blank space and semantic proximityKaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrenceFujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
• group together cooccurring tags on the same lineHassan-Montero & Herrero-Solana, InScit'06
![Page 10: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/10.jpg)
• shared tags in red on del.icio.us
Add information from the text:• color intensity to express recency in Amazon
Enhanced tag / word clouds
• optimize blank space and semantic proximityKaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrenceFujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
• group together cooccurring tags on the same lineHassan-Montero & Herrero-Solana, InScit'06
![Page 11: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/11.jpg)
• shared tags in red on del.icio.us
Add information from the text:• color intensity to express recency in Amazon
Enhanced tag / word clouds
• optimize blank space and semantic proximityKaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrenceFujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
• group together cooccurring tags on the same lineHassan-Montero & Herrero-Solana, InScit'06
![Page 12: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/12.jpg)
• optimize blank space and semantic proximityKaser & Lemire, WWW'07
• shared tags in red on del.icio.us
Add information from the text:• color intensity to express recency in Amazon
Enhanced tag / word clouds
• “topigraphy”: 2D placement according to cooccurrenceFujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
• group together cooccurring tags on the same lineHassan-Montero & Herrero-Solana, InScit'06
![Page 13: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/13.jpg)
• optimize blank space and semantic proximityKaser & Lemire, WWW'07
• shared tags in red on del.icio.us
Add information from the text:• color intensity to express recency in Amazon
Enhanced tag / word clouds
• “topigraphy”: 2D placement according to cooccurrenceFujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
• group together cooccurring tags on the same lineHassan-Montero & Herrero-Solana, InScit'06
![Page 14: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/14.jpg)
Extract semantic information from a text
• natural language processing:sense desamibiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
![Page 15: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/15.jpg)
Extract semantic information from a text
• natural language processing:sense desamibiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
Mayaffre, Quand travail, famille, et patrie cooccurrent dans le
discours de NicolasSarkozy, JADT'08
![Page 16: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/16.jpg)
Extract semantic information from a text
• natural language processing:sense desamibiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
Brunet, Les séquences (suite),JADT'08
![Page 17: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/17.jpg)
Extract semantic information from a text
• natural language processing:word sense disamibiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
Barry, Viprey,Approche comparative
des résultats d'exploration textuelle des discours
de deux leaders africainsKeita et Touré,
JADT'08
![Page 18: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/18.jpg)
Extract semantic information from a text
• natural language processing:word sense disamibiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
Peyrat-Guillard,Analyse du discours
syndical sur l’entreprise, JADT'08
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
![Page 19: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/19.jpg)
Extract semantic information from a text
• natural language processing:word sense disambiguation
Véronis (Hyperlex)
• literature analysis:philological approach: only consider the text
Brody
• text mining:semantic graph
Grimmer (Wordmapper)
Disambiguation of word “barrage”: dam, play-off,
roadblock, police cordon.
Véronis, HyperLex:Lexical Cartography for
Information Retrieval, 2004
• discourse analysis:tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
![Page 20: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/20.jpg)
Tag cloud + tree = tree cloud
GPL-licensed Treecloud in Python,available at http://www.treecloud.org
Built with TreeCloud and
SplitsTree: Huson 1998, Huson & Bryant 2006
![Page 21: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/21.jpg)
The first tree cloud
Tree cloud of the blog posts containing “Laurence Ferrari”from 25/11/2007 to 10/12/2007, by Jean Véronis
http://aixtal.blogspot.com/2007/12/actu-une-ferrari-dans-un-arbre.html
![Page 22: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/22.jpg)
Building a tree cloud – extracting the words
Extract words with frequency:
• stoplist?without stoplist with stoplist
![Page 23: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/23.jpg)
Building a tree cloud – extracting the words
Extract words with frequency:
• lemmatization?
• groups words with similar meaning?
70 most frequent words inObama's campaign speeches,
winsize=30, distance=dice, NJ-tree.
no lemmatization...sometimes interesting
![Page 24: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/24.jpg)
Building a tree cloud – dissimilarity matrix
Many semantic distance formulas based on cooccurrence
![Page 25: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/25.jpg)
Building a tree cloud – dissimilarity matrix
Many semantic distance formulas based on cooccurrence
slidingwindow S
Text
width wsliding step s
cooccurrence matricesO
11, O
12, O
21, O
22
semantic dissimilarity matrixchi squared, mutual information, liddel, dice, jaccard, gmean, hyperlex, minimum sensitivity, odds ratio, zscore, log likelihood, poisson-stirling...
Evert,Statistics of words cooccurrences,
PhD Thesis, 2005
![Page 26: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/26.jpg)
Building a tree cloud – dissimilarity matrix
Transformations needed on the dissimilarity:
• transform similarity into dissimilarity
• linear normalization for positive matrices to get distances in [0,1]
• affine normalization for matrices with positive or negative numbers, to get distances in [α,1] (for example α=0.1)
![Page 27: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/27.jpg)
Building a tree cloud – tree reconstruction
Many existing methods:
• Neighbor-JoiningSaitou & Nei, 1987
• Addtree variantsBarthelemy & Luong, 1987
• Quartet heuristicCilibrasi & Vitanyi, 2007
![Page 28: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/28.jpg)
Building a tree cloud – tree decoration
Choice of word sizes:
• computed directly from frequency (apply a log!)
or
• computed from frequency ranking (exponential distribution)
or
• statistical significance with respect to a reference corpus
![Page 29: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/29.jpg)
Building a tree cloud – tree decoration
Colors: chronology
150 most frequent words inObama's campaign speeches, winsize=30, distance=oddsratio, color=chronology, NJ-tree.
oldrecent
Built with TreeCloud and
![Page 30: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/30.jpg)
Building a tree cloud – tree decoration
Colors: dispersion
standarddeviationof the position
150 most frequent words inObama's campaign speeches, winsize=30, distance=oddsratio, color=dispersion, NJ-tree.
sparsedense
too blue?
![Page 31: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/31.jpg)
Building a tree cloud – tree decoration
Colors: dispersion
standarddeviationof the position word frequency
150 most frequent words inObama's campaign speeches, winsize=30, distance=oddsratio, color=norm-dispersion, NJ-tree.
sparsedense
too red?
![Page 32: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/32.jpg)
Building a tree cloud – tree decoration
Edge color or thickness:quality of the inducedcluster.
150 most frequent words inObama's campaign speeches, winsize=30, distance=oddsratio, color=chronology, NJ-tree.
![Page 33: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/33.jpg)
Quality control
Is there an objective quality measure of tree clouds?
![Page 34: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/34.jpg)
Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
![Page 35: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/35.jpg)
Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
Tree cloud variations if small changes?bootstrap to evaluate:
- stability of the result- robustness of the method
![Page 36: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/36.jpg)
Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
Tree cloud variations if small changes?bootstrap to evaluate:
- stability of the result- robustness of the method
Still, is there a more direct method?arboricity to show whether the distance matrix fits with a tree, which should imply stability?
Guénoche & Garreta, 2001Guénoche & Darlu, 2009
![Page 37: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/37.jpg)
Quality control – bootstrap
• Randomly delete words with probability 50%.
• Built tree cloud of original text, and altered text.
• Compute similarity of both trees (1-normalized RobinsonFoulds)
chisquaredmi
liddelldice
jaccardgmean
hyperlexms
oddsratiozscore
loglikelihoodpoissonstirling
0,5
0,55
0,6
0,65
0,7
0,75
0,8
0,85
0,9
0,95
similarity
4 altered versions of 10 Obama's speeches,
3000 words in average,width=30, NJ-tree.
![Page 38: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/38.jpg)
Quality control – arboricity
Relationship between “bootstrap quality” and arboricity:
0,52 0,54 0,56 0,58 0,6 0,62 0,64 0,66 0,68 0,70
10
20
30
40
50
60
70
80
arboricity (%)
bootstrap quality
![Page 39: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/39.jpg)
Quality control – arboricity
Relationship between “bootstrap quality” and arboricity:
0,52 0,54 0,56 0,58 0,6 0,62 0,64 0,66 0,68 0,70
10
20
30
40
50
60
70
80
correlation coefficient: 0.64
arboricity below 50%: danger!
arboricity (%)
bootstrap quality
![Page 40: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/40.jpg)
Perspectives
• Make the tool available on a web interface
• Evaluate tree clouds for discourse analysis
• Build the daily tree cloud of people popular on blogs,
with
http://labs.wikio.net
http://www.treecloud.org
![Page 41: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/41.jpg)
Thank you for your attention!
http://www.treecloud.org - http://www.splitstree.org
Tree cloud of the words appearingtwice or more in the IFCS 2009 call for paper
lemmatization, width=20, distance=dice, NJ-tree.
![Page 42: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/42.jpg)
Tree clouds focused on one word
http://www.treecloud.org - http://www.splitstree.org
Tree cloud of the neighborhood of “McCain” in Obama's campaign speeches
![Page 43: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/43.jpg)
Tree clouds focused on one word
http://www.treecloud.org - http://www.splitstree.org
Tree cloud of the neighborhood of “Bush” in Obama's campaign speeches
![Page 44: Visualising a text with a tree cloud - IGMigm.univ-mlv.fr/~gambette/Re20090317.pdf• Addtree variants Barthelemy & Luong, 1987 • Quartet heuristic Cilibrasi & Vitanyi, 2007 Building](https://reader036.fdocuments.us/reader036/viewer/2022071003/5fbfe5b958086349120abfab/html5/thumbnails/44.jpg)
Tree clouds focused on one word
http://www.treecloud.org - http://www.splitstree.org
Tree cloud of the neighborhood of “world” in Obama's campaign speeches