LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs...

37
Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs Victoria Anugrah Lestari & Ruli Manurung Faculty of Computer Science Universitas Indonesia [email protected], [email protected] Beijing, China 30 July 2015

Transcript of LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs...

Page 1: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs

Victoria Anugrah Lestari & Ruli Manurung Faculty of Computer Science Universitas Indonesia [email protected], [email protected]

Beijing, China 30 July 2015

Page 2: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Folktales

Folktales are a characteristically anonymous, timeless, and placeless tale circulated orally among a people.

http://onceuponatime.wikia.com/wiki/Rumpelstiltskin_(Fairytale)

http://indonesianfolklore.blogspot.com/2007/10/lutung-kasarung-folklore-from-west-java.html

http://indonesianfolklore.blogspot.com/2007/10/keong-emas-golden-snail-prince-raden.html 2/24

Page 3: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Humanities work on folktales

• Vladimir Propp (1928): Morphology of the (Russian) folktale story grammars

• Aarne-Thompson-Uther (ATU) index (1910, 1961, 2004): story motifs, hierarchy of folktale types

3/24

Page 4: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Computational work on folktales

• Vaz Lobo & de Matos (2010): latent semantic mapping + clustering 453 fairy tales from Gutenberg.

• Nguyen et al. (2012): classification based on genre, e.g. legend, fairytale, jokes, puzzle, urban legend, etc. using lexical, POS, NE, metadata.

• Nguyen et al. (2013): Ranking based on story types (ATU, Brunvand) using IR, lexical, SVO triplets.

• Karsdorp & van den Bosch (2013): Topic modelling (L-LDA) for multiple labelling of ATU motifs (defined by types).

4/24

Page 5: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Folktales as narratives

• Narratives: Focus on sequence of related events structure

• Models of narrative: Turner (1994), Mateas & Stern (2003), Pérez y Pérez & Sharples (2004), etc.

5/24

Page 6: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Folktales as narratives

• Narratives: Focus on sequence of related events structure

• Models of narrative: Turner (1994), Mateas & Stern (2003), Pérez y Pérez & Sharples (2004), etc.

• However: Fisseni & Löwe (2012): People tend to focus on motifs & content, less on structure.

5/24

Page 7: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Plot graphs (McIntyre & Lapata, 2010)

6/24

Page 8: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Goals of this work

• Construct representations that capture structural & conceptual properties.

• Define similarity metric, use to organize folktales.

• Compare to BoW-based methods wrt. ATU.

7/24

Page 9: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Representing folktales as plot graphs

8/24

Page 10: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Representing folktales as plot graphs

Action nodes: Action edges:

8/24

Page 11: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Representing folktales as plot graphs

Action nodes:

Child nodes:

Action edges:

Action-Child edges:

8/24

Page 12: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Representing folktales as plot graphs

Action nodes:

Child nodes:

Entity nodes:

Action edges:

Action-Child edges:

Entity edges:

8/24

Page 13: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Representing folktales as plot graphs

Note that the core structure is linear.

Action nodes:

Child nodes:

Entity nodes:

Action edges:

Action-Child edges:

Entity edges:

8/24

Page 14: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example

9/24

Page 15: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example

live

lion forest

subj in

9/24

Page 16: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example

live sleep

lion forest it tree

subj in subj under

9/24

Page 17: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example

live sleep come

lion forest it tree mouse

subj in subj under subj

9/24

Page 18: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example

live sleep come play

lion forest it tree mouse lion it

subj in subj under subj subj on

9/24

Page 19: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Automatic construction

Stanford CoreNLP SemanticGraph (a.k.a. dependency parse)

10/24

Page 20: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

From SemanticGraph to plot graph

Some observation-based heuristics on selecting relations: • Governors of nsubj (nominal subject), expl (expletive “there”), and aux (auxiliary) • Add child if relation(parent,child) not conj, comp, adv, aux, cop, dep, expl, mark

11/24

Page 21: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Construction example

12/24

Page 22: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Construction example

CoreNLP CorefChain (length >1)

12/24

Page 23: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Construction example

CoreNLP CorefChain (length >1)

12/24

Page 24: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Construction example

CoreNLP CorefChain (length >1)

12/24

Page 25: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Final result

13/24

Page 26: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Measuring plot graph similarity

A lion lives in the forest. One day it sleeps under a tree. Then a mouse plays on the lion and disturbs its sleep.

A lion eats meat. A lion lives in the jungle. One day it rests under a tree.

14/24

Page 27: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Measuring plot graph similarity

A lion lives in the forest. One day it sleeps under a tree. Then a mouse plays on the lion and disturbs its sleep.

A lion eats meat. A lion lives in the jungle. One day it rests under a tree.

14/24

Page 28: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Alignment of event sequence

Needleman-Wunsch

15/24

Page 29: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Conceptual similarity: Wu-Palmer

Measure path distance between 2 words based on WordNet taxonomy

Word pairs Similarity

sleep, live 0.25

disturb, rest 0.33

live, eat 0.29

prince, king 0.94

jungle, forest 0.31

palace, house 0.91

16/24

Page 30: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Example mapping

eat live rest live 0.29 1 0.33

sleep 0.22 0.25 0.43 play 0.29 0.33 0.43

disturb 0.29 0.33 0.33

eat live rest

0 -1 -2 -3

live -1 0.29 0 -1

sleep -2 -0.71 0.54 1

play -3 -1.71 -0.38 0.96

disturb -4 -2.71 -1.38 -0.04 Wu-Palmer similarity

Alignment scoring & traceback matrix

17/24

Page 31: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Folktale similarity measurement

p1 & p2 = the two plot graphs being compared α = weighting for action node similarity β = weighting for child node similarity (a1i ,a2i ) = pair of action nodes from alignment of p1 and p2

g = gap penalty (c1i ,c2i ) = pair of child nodes from alignment of p1 and p2

n = alignment length of p1 and p2

18/24

Page 32: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Initial experiment

• Determining values for α, β, and g

• For each story, 5 paraphrases manually created: word replacement, sentence structure change, insertion/deletion of phrases & sentences

• Measure similarity between paraphrases & across stories. Maximize difference.

No. Title #Words

1 A friend in need is a friend indeed 133

2 Honesty is the best policy 129

3 A town mouse and a country mouse 260

4 How to tell a true princess 382

5 The butterfly lovers 572

6 Rumpelstiltskin 1106

http://www.english-for-students.com/Simple-Short-Stories.html 19/24

Page 33: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Similarity scores using various parameters

g=

α = 0.7, β = 0.3 α = 0.5, β = 0.5 α = 0.3, β = 0.7 -1 -0.5 0 -1 -0.5 0 -1 -0.5 0

Between paraphrases

Avg 0.83 0.80 0.74 0.83 0.80 0.73 0.83 0.79 0.71 Min 0.69 0.61 0.53 0.69 0.60 0.49 0.68 0.58 0.45

Across stories

Avg 0.37 0.30 0.15 0.41 0.32 0.12 0.45 0.33 0.09 Max 0.55 0.45 0.25 0.55 0.43 0.20 0.55 0.42 0.16

BP min - AS max 0.14 0.16 0.28 0.14 0.17 0.29 0.13 0.16 0.29 Diff. between avgs 0.46 0.50 0.59 0.42 0.48 0.61 0.38 0.46 0.62

20/24

Page 34: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Main experiment: BoW comparison

24 fairy tales from Fairy Books of Andrew Lang, grouped into 5 clusters under ATU (fairy tales):

• Supernatural Adversaries — Bluebeard; Hansel and Gretel; Jack and the Beanstalk; Rapunzel; The Twelve Dancing Princesses.

• Supernatural or Enchanted Relatives — Beauty and the Beast; Brother and Sister; East of the Sun, West of the Moon; Snow White and Rose Red; The Bushy Bride; The Six Swans; The Sleeping Beauty.

• Supernatural Helpers — Cinderella; Donkey Skin; Puss in Boots; Rumpelstiltskin; The Goose Girl; The Story of Sigurd.

• Magic Objects — Aladdin and the Wonderful Lamp; Fortunatus and His Purse; The Golden Goose; The Magic Ring.

• Other Stories of the Supernatural — Little Thumb; The Princess and the Pea.

Measure similarity between clusters & across clusters.

http://www.gutenberg.org/ebooks/30580 21/24

Page 35: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments Story type Story Plot graph Bag of words Combination

Within Across Within Across Within Across

Supernatural adversaries

Bluebeard 0.1000 0.1037 0.8629 0.8618 0.4814 0.4586 Hansel and Gretel 0.1075 0.1157 0.8492 0.8630 0.4783 0.4894 Jack and the Beanstalk 0.1050 0.1110 0.9050 0.8891 0.5050 0.5001 Rapunzel 0.1000 0.1047 0.8790 0.8575 0.4895 0.4571 The Twelve Dancing Princesses 0.1125 0.1073 0.8808 0.8631 0.4966 0.4610

Supernatural or enchanted

relatives

Beauty and the Beast 0.0767 0.0705 0.8803 0.8605 0.4785 0.4397 Brother and Sister 0.1233 0.1135 0.8881 0.8722 0.5057 0.4654 East of the Sun, West of the Moon 0.1117 0.1012 0.8914 0.8571 0.5015 0.4525 Snow White and Rose Red 0.1200 0.1165 0.8650 0.8566 0.4925 0.4865 The Bushy Bride 0.1200 0.1182 0.8862 0.8739 0.5031 0.4960 The Six Swans 0.0925 0.1100 0.9006 0.8662 0.5020 0.4881 The Sleeping Beauty 0.1125 0.1194 0.8990 0.8918 0.5087 0.5056

Supernatural helpers

Cinderella 0.1180 0.1144 0.8150 0.8306 0.4665 0.4725 Donkey Skin 0.1040 0.1122 0.8873 0.9025 0.4956 0.5074 Puss in Boots 0.1175 0.1095 0.8170 0.8486 0.4672 0.4551 Rumpelstiltskin 0.0750 0.0858 0.8467 0.8569 0.4609 0.4478 The Goose Girl 0.1240 0.1178 0.8617 0.8624 0.4928 0.4643 The Story of Sigurd 0.1080 0.1178 0.8516 0.8670 0.4800 0.4664

Magic objects

Aladdin and the Wonderful Lamp 0.0975 0.0910 0.8958 0.8664 0.4946 0.4559 Fortunatus and His Purse 0.1133 0.1185 0.8945 0.8306 0.5039 0.4519 The Golden Goose 0.1033 0.1155 0.9006 0.8529 0.5012 0.4611 The Magic Ring 0.1033 0.1040 0.9120 0.8960 0.5077 0.4762

Other stories Little Thumb 0.0300 0.1214 0.7444 0.8562 0.3872 0.4675 The Princess and the Pea 0.0300 0.0405 0.7444 0.7844 0.3872 0.3945

# Similarity within > across 10 (41.67%) 15 (62.50%) 19 (79.16%)

Page 36: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

Analysis & Discussion

• Errors in automatic construction (dependency parses aren’t really semantic graphs), e.g.: “along came a mouse” vs. “a mouse came”, coreference errors.

• Consistent with Fisseni & Löwe (2012) findings: focus more on content & motifs?

• Combination of plot graph + BoW yields best results.

23/24

Page 37: LaTeCH 2015: Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs (Lestari & Manurung)

Beijing 30 July ‘15

Folktales Plot graphs Similarity Experiments

THANK YOU

24/24