Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

23
Sándor Darányi, Peter Wittek & László Forró† Swedish School of Library and Information Science University of Borås 50190 Borås, Allégatan 1, Sweden †8220 Balatonalmádi, Remetevölgyi út 27, Hungary Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

description

The Aarne-Thompson-Uther Tale Type Catalog (ATU) is a bibliographic tool which uses metadata from tale content, called motifs, to define tale types as canonical motif sequences. The motifs themselves are listed in another bibliographic tool, the Aarne-Thompson Motif Index (AaTh). Tale types in ATU are defined in an abstracted fashion and can be processed like a corpus. We analyzed 219 types with 1202 motifs from the “Tales of magic” (types 300-749) segment to exemplify that motif sequences show signs of recombination in the storytelling process. Compared to chromosome mutations in genetics, we offer examples for insertion/deletion, duplication and, possibly, transposition, whereas the sample was not sufficient to find inverted motif strings as well. These initial findings encourage efforts to sequence motif strings like DNA in genetics, attempting to find for instance the longest common motif subsequences in tales. Expressing the network of motif connections by graphs suggests that tale plots as consolidated pathways of content help one memorize culturally engraved messages. We anticipate a connection between such networks and Waddington’s epigenetic landscape.

Transcript of Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Page 1: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Sándor Darányi, Peter Wittek & László Forró†

Swedish School of Library and Information

Science University of Borås

50190 Borås, Allégatan 1, Sweden

†8220 Balatonalmádi, Remetevölgyi út 27, Hungary

Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Page 2: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Acknowledgements

• AMICUS project 2009-2012 (Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts, Netherlands Organization for Scientific Research, NWO Humanities)

2 CMN-12 Istanbul -- May 26, 2012

Page 3: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Structure of presentation

I. Frame of thought, concepts used

II. Experiment design

III. Results

IV. Future research directions

3 CMN-12 Istanbul -- May 26, 2012

Page 4: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

I. Frame of thought, concepts used

• Examples of formulaity

• Standard tools in tale research

• What is a motif?

• The genetic-memetic parallel

4 CMN-12 Istanbul -- May 26, 2012

Page 5: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Examples of formulaity (= structure) in narrative research

• Propp (1929): Russian fairy tales have 7 actors (dramatis personae), 31 functions (types of actions) and 150 narrative elements

• Lévi-Strauss (1954): both narrative segments in myths, and myth variants, manifest canonical content transformations

• Harris (1998): Disciplinary scientific content can be expressed by sentences of abstract concepts

5 CMN-12 Istanbul -- May 26, 2012

Page 6: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Standard reference works for tale research

ATU: Uther, H. J. 2004. The Types of International Folktales. A Classification and Bibliography. Based on the System of Antti Aarne and Stith Thompson 1–3 (FFC 284–286). Academia Scientiarum Fennica, Helsinki.

• TALES OF MAGIC, SUPERNATURAL ADVERSARIES 300-399

• Tale type 300: The Dragon-Slayer. “A youth acquires (e.g. by exchange) three wonderful dogs [B421, B312.2]. He comes to a town where people are mourning and learns that once a year a (seven-headed) dragon [B11.2.3.1] demands a virgin as a sacrifice [B11.10, S262]. In the current year, the king's daughter has been chosen to be sacrificed, and the king offers her as a prize to her rescuer [T68.1]. The youth goes to the appointed place. While waiting to fight with the dragon, he falls into a magic sleep [D1975], during which the princess twists a ring (ribbons) into his hair; only one of her falling tears can awaken him [D1978. 2]. (…)”

AaTh: Thompson, S. 1955-1958. Motif-Index of Folk-Literature 1–6. Indiana University Press, Bloomington.

• B312. /Helpful animals obtained by purchase or gift./

• B312.1. /Helpful animals a gift./ German Grimm No. 60, 126; Irish myth: Cross; Spanish: Boggs FFC XC 40 No. 300; Icel.: Boberg, Þiðriks saga I 314--18; India: Thompson-Balys; Japanese: Ikeda.

• B312.2. /Helpful animals obtained by exchange./ *Type 300; *Hartland Perseus III 195; De Gubernatis Zool. Myth. III 36 n.--N. A. Indian: Thompson CColl II 329ff.

• B312.3. /Helpful animal(s) bequeathed to hero./ Italian Novella: Rotunda; India: Thompson-Balys; Africa (Hausa): Best Black Folk Tales 71ff., Tremearne Hausa Superstitions and Customs 374ff. No. 79; Madagascar: (Marofotsy) Renel Contes de Madagascar I 65ff. No. 9. (…)

6 CMN-12 Istanbul -- May 26, 2012

Page 7: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

What is a motif ?

7 CMN-12 Istanbul -- May 26, 2012

Page 8: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Argumentation

• From ”narrative DNA” (Bruce 1996, cf. postmoderns) to ”narrative genomics ”(Malec)

• Perceived formal similarities between genetic code and ”memetic code” in tale types – Memes (Dawkins 1976):

• An idea, behavior or style that spreads from person to person within a culture

• Self-replicating unit of cultural transmission with potential significance in explaining human behavior and cultural evolution

– ”Memetic pathway”: memory engraving by frequency- (repetition-)-based content

• If pertinent, the above make DNA sequencing techniques applicable to motif sequences

• Benefits: – Memetics is short of measurable evidence – Narrative analysis is short of evolutionary modelling tools – Tale types as metadata may bridge the gap

8 CMN-12 Istanbul -- May 26, 2012

Page 9: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Ingredients: tale types as motif sequences

300 The Dragon-Slayer. B421 B312.2 B11.2.3.1 B11.10 S262 T68.1 D1975 D1978.2 B11.11

300A The Fight on the Bridge. T511.5.1 F601 B631 B11.2.3.2 B11.2.3.3 B11.2.3.5 B11.2.3 B11.11 B401

301 The Three Stolen Princesses. H1385.1 F102.1 N773 B631 T615 F601 F451.5.2 F92 F96

301D* The Princess's Ring T68.1 B11.11 K1933 C611 H94 L161

302 The Ogre's (Devil's)Heart in the Egg. B393 B500 D1834 R11.1 D152.2 D182.2 E710 K975.2 E711.1

302B Life Dependent on a Sword. T510 F601 T11.2 E711.10

302C* The Magic Horse. C611

303 The Twins Or Blood-Brothers. T511.5.1 T511.1.1 T512 T589.7.1 E761 R111.1.3 K1932 H83 L161

303A Brothers Seek Sisters as Wives. T69.1 D231 R11.1 E715.1 R155.1

304 The Dangerous Night-Watch. F666.1 K912 H83 N711.2 H81.1 H81.1.1 T475.2 Q481 H11.1.1

305 The Dragon's Heart-Blood as Remedy. D1500.1.7.3.3K1935

306 The Danced-out Shoes. H508.2 D1980 K625.1 D2131 F1015.1.1 H80 L161 F87

307 The Princess in the Coffin. C758.1 S223 E251 N825.2 D791.1.7 L162

310 The Maiden in the Tower (Petrosinella, Rapunzel). G279.2 S222.1 G204 R41.2 F848.1 F555 N455 D642.7 L162

311 Rescue by the Sister. R11.1 T721.5 C611 C227 C913 C920 R157.1 G561 K525

311B* The Singing Bag. K526

312 Maiden-Killer (Bluebeard). S62.1 C611 C920 K551 G551.1 G652

312D Rescue by the Brother. T511.3 F611.1

313 The Magic Flight. B261 S222 S222 S240 G465 H1104 H1113 H1154.8 H335.0.1

314 Goldener. S211 G462 B316 C611 C912 D672 B184.1.6 G461 C912

314A The Shepherd and the Three Giants. D817 L113.1.4 G500 B184.1 R222 L161

9 CMN-12 Istanbul -- May 26, 2012

Page 10: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Mutant screening in genetics:

Kinds of mutation 1 (DNA)

10 CMN-12 Istanbul -- May 26, 2012

Page 11: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Kinds of mutation 2 (chromosomes)

11 CMN-12 Istanbul -- May 26, 2012

Page 12: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Kinds of mutation 3 (chromosomes)

With tale types as motif sequences, we can answer questions like:

– Are there repeated motif substrings?

– Are there inverted motif substrings?

– Etc.

These are important for plot formation.

12 CMN-12 Istanbul -- May 26, 2012

Page 13: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

The genetic-memetic parallel

Alphabet Buildup Result

Character sequence

Lexeme (?)

Lexeme (?) sequence

Meme/motif

Meme/motif sequence

Story/tale

Story/tale set (inherent but not

sufficient?)

Corpus

Alphabet Buildup Result

Nucleotide sequence

Amino acid

Amino acid sequence

Gene

Gene sequence

Chromosome

Chromosome set

(inherent but not

sufficient)

Cell

13 CMN-12 Istanbul -- May 26, 2012

Page 14: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

II. Experiment design

• Instead of natural language (original) texts, metadata used: tale types from ATU and motifs from AaTh

– Zooming in = decreased content granularity

• Tale types as motif strings, processed as a corpus

• Binary vs. frequency-based matrix of 219 types x 1202 motifs (“Tales of magic” (types 300-749))

• Block (2-mode) clustering for motif co-occurrence analysis (HCE-3)

• Manual screening of motif strings for mutation types

• Network analysis and visualization

14 CMN-12 Istanbul -- May 26, 2012

Page 15: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

III. Results

• Findings are indicative, however:

– Top observation unit not motifs but collocated motif co-occurrences (multiplets)

– Motif sequences show signs of recombination in the storytelling process with chromosome mutation types mostly there in a limited sample

– Motif strings form highly complex networks

CMN-12 Istanbul -- May 26, 2012 15

Page 16: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Motif multiplets

• Tale types are a shorthand for the originals

• Motifs sometimes co-occur in tale types, i.e. motifs are not the ultimate observation units

16 CMN-12 Istanbul -- May 26, 2012

Page 17: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

17 CMN-12 Istanbul -- May 26, 2012

Page 18: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Tale type and chromosome mutations are

similar

• On a 219 types x 1202 motifs sample, hints found at insertion/deletion, duplication and, possibly, transposition, whereas the sample was not sufficient to find inversion as well

• Screening on a larger sample is necessary

18 CMN-12 Istanbul -- May 26, 2012

Page 19: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Tale mutant screening: The needle in the haystack

19 CMN-12 Istanbul -- May 26, 2012

Page 20: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

IV. Future research directions

• Sequence mining is feasible - find longest common motif subsequences in motif strings as if they were DNA

20 CMN-12 Istanbul -- May 26, 2012

Page 21: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Motif network processing and visualization

• Network points to fitness landscape, and to evolutionary algorithms, GA and GP

• The network of motif connections by directed graphs suggests that tale plots as consolidated pathways of content help one memorize culturally engraved messages

• Identifying plot direction with graph direction, we anticipate a connection between such networks and fitness landscapes (e.g. Waddington’s epigenetic landscape)

CMN-12 Istanbul -- May 26, 2012 21

Page 22: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Toward storytelling as a landscape

Translate motif chains to a directed graph

Convert the directed graph into a fitness landscape

22 CMN-12 Istanbul -- May 26, 2012

Page 23: Toward Sequencing “Narrative DNA”: Tale Types, Motif Strings and Memetic Pathways

Thank you for your attention!

Borås, 11.05.2012