Post on 17-Dec-2015
Shyamala Doraisamy Stefan Rüger
Faculty of Computer Science and Information Technology Knowledge Media Institute
University Putra Malaysia The Open University
Malaysia United Kingdom
Towards Automatic Topic Detection of Folksongs
THEMATIC CATEGORIES
AUTOMATIC TOPIC MODELLING
PRELIMINARY RESULTS
ID Topic
T1 Child Ballads
T2 Other Ballads and Narrative Songs
T3 Thwarted love
T4 Love and Courtship
T5 Lover’s Farewell
T6 Lover’s Return
T7 False Hearted Lovers and Seducers
T8 Cuckolds
T9 Burdens of Single and Married Life
T10 Adventurous and Crafty Maidens
T11 Rakes, Robbers and Highwaymen
T12 Country Life
T13 Sports and Pastimes
T14 Sailors and Sea Songs
T15 Soldiers and War Songs
T16 Humorous Songs
T17 Nonsense and Nursery Songs
T18 Cumulative and Enumerative Songs
T19 Carols, Religious and Wassail Songs
T20 Various
T21 Fragments
REFERENCES
•Experimentation
•940 folksongs were obtained from www.folkinfo.org in abc music notation format
•Pre-processed to remove notation tags, hyphens and punctuations marks
•Topic analysis performed using the GibbsLDA++ package [4]
•Number of topics for analysis were set to 5, 10, 15, 20 and 25
•Results
•Topics output were analysed for mapping based on topics identified from Cecil Sharp’s Collection of Folk Songs [2]
•With 10 topics, approximate mapping was able to be performed as shown in Results Table
•With more than 10 topics, too many junk and insignificant topics were identified
Topic Number Output Manual Label Mapped to listed topics
0 Carol Carols, Religious Songs and Wassail Songs (T19)
1 Sailors Sailors and Sea Songs, Soldiers and War Songs (T14,15)
2 Ballads Rakes, Robbers and Highwaymen and Country Life (T3,4)
3 Hunting Songs Sports and Pastimes (T13)
4 Land/country life Lover’s farewell, Lover’s return, Country Life (T5, 6, 12)
5 Difficult Life Burdens of Single and Married Life (T9)
6 Scottish/old English
-
7 Junk/Insignificant -
8 Happy Love Love and Courtship (T4)
9 Grand/Royal Child Ballad (T1)
Topics from Cecil Sharp’s Collection of English Folk Songs [2]
• Preliminary results show the feasibility of topic modelling of folksongs using LDA
•Further investigation would be needed to reduce the insignificant topics identified
•Future work
•Topic Significance Ranking techniques to be tested to eliminate insignificant topics
•Subject matter experts for performance validation
•Larger data collections comprising folksongs in English from America, Australia, etc.
•Formal Folk Song Collections and Bibliographies
•Examples of collections with thematic categorisations
•Cecil Sharp’s Collection of English Folk Songs [2]
•David Atkinson, English Folksong Bibliography: An Introductory Bibliography Based on the Holdings of the Vaughn Williams Memorial Library, 3rd (electronic) edition, 2006
•Informal collection from the Internet,
•Eg: http://www.folkinfo.org with an alphabetically organised folksong collection,
• Modelling text corpora and discrete data collections
• to find short descriptions of the members of a collection that enable efficient processing of a large collection
• Topic Modelling has been applied to song lyrics text corpora
• Relatively few or no related studies on English Folksong lyrics from the English Tradition
[1] Blei, D.M., Ng. A.Y., Jordan, M.I., Latent Dirichlet Allocation. The Journal of Machine Learning research 3, 993-1022 (2003).
[2] Cecil Sharp’s Collection of English Folk Songs, edited by Maud Karples, Vol. 1 & 2, Oxford University Press, 1974.
[3] AlSumait, L., Barbara, D., Gentle, J., Domeniconi, Topic Significance Ranking of LDA Generative Models, W. Buntine et. Al. (Eds.): ECML PKDD 2009, part 1, LNAI 5781, pp. 67-82,.
[4] http://gibbslda.sourceforge.net
OBSERVATIONS
Formal Databases
- eg: An Index Search with the Roud Folksong Index from the Vaughn Williams Memorial Library (VWML) at www.library.efdss.org
Informal DataBases
- eg: Indexed alphabetically from www.folkinfo.org providing notation, lyrics, notes and descriptions of songs and song index number (eg: Roud index) if available
•Folksong collections in general are indexed by the collectors’ recorded data such as titles and, place collected, performer, etc
• Folksong collection tasks are based on an oral tradition and several lyric versions of the same song could be available
•Thematic categorisation of folksongs are commonly performed by collectors or bibliographers
• A subjective lyrics analysis would be required for this task
•Automated topic modelling would be useful to support folksong thematic categorisation tasks
Record 1 of 187800Song title: TuneFirst line
Record 2 of 187800
Record 3 of 187800
• Topic Significance Ranking
•To evaluate topic significance using the approach proposed by Alsumait et. al. [3]
•The distance between a topic distribution and three definitions of “junk distribution” is computed to determine topic significance
Lyrics Notation
,
Discussion Notes
There was a Lady …..,Lay the Bent to the……,And she had lovely …..,Fa, la la la, fa, la…..
There was a Knight of Noble…..,Which also lived in the ……
• Folksong Lyrics vs Contemporary music lyrics
•Classification
•Genres vs themes
•Vocabulary
•Modern vs Old English
• To utilise Latent Dirichlet Allocation (LDA) , a generative probabilistic model proposed by Blei et. Al [1] for topic model modelling
Folksong Lyrics
Collection
Latent Topic Analysis
Topic modelsLabeled models