Ways of searching for the Zeitgeist of Modernity - a corpus-based approach to modern fiction

Post on 23-Feb-2016

42 views 0 download

Tags:

description

Ways of searching for the Zeitgeist of Modernity - a corpus-based approach to modern fiction. Ilina Doykova Shumen University, Shumen (Bulgaria) ilina.doykova@abv.bg. Statistical analysis. Simple things may characterise different styles average sentence length average word length - PowerPoint PPT Presentation

Transcript of Ways of searching for the Zeitgeist of Modernity - a corpus-based approach to modern fiction

Ways of searching for the Zeitgeist of Modernity - a corpus-based approach to modern fiction

Ilina DoykovaShumen University, Shumen (Bulgaria)

ilina.doykova@abv.bg

Statistical analysis

• Simple things may characterise different styles

– average sentence length– average word length– vocabulary richness – vocabulary growth (homogeneity of text)

• More complex analyses give a more interesting picture

– specific syntactic structures– degree of modification in NPs– types of verbs (e.g. verbs of persuasion, speech verbs, action verbs, descriptive verbs)– distribution of pronouns (1st/2nd/3rd person)– themes, beliefs, etc.– authorship

• Especially when used comparatively

Linguistic Tools: WordSmith and Wmatrix

Useful features:

+ Tagging = identifies and labels PoS+ WordList = generates word-frequency lists+ Concordance = lists occurrences of a word in context and its immediate environment, gives access to collocates

• Identify syntactic use of word• Identify range of meanings • Identify relative frequency of different uses/meanings

+ KWIC (key word) = identification of key words through a comparison with a reference corpus+ Word Clouds = semantic tagsets in 21 domains

• Listings can be customised to show what you want more clearly:sort according to next or previous wordshow more or less context

highlight important information

MethodologyWord Frequency List (Wmatrix)

WordSmith frequency list of predicative adjectives, Modern British Women Fiction Writers Corpus

Key words list and dispersion plot(ALONE in MBWFW corpus)

Consistency analysis indicates whether a word is found consistently across lots of different texts or only in a narrow set of texts, or a specific text

Lemmatized results for relational pairsWordSmith and Wmatrix

Investigation of semantic domains through semantic tagging (Wmatrix)

Key Domain clouds (for Wmatrix only)

• The larger the word, the greater its “keyness” or uniqueness as compared to the BNC Written Sampler of imaginative texts.

Comparison of linguistic software

Research and language learning

Word frequency knowledge in present-day language textbooks (grammatical, collocational, semantic) is frequency-based;

Real usage corpora represent actual, not prescribed usage;

Translation find the best equivalent;

Grammar investigate on word classes, specific syntactic structures;

Teaching collocations‘trouble and strife’, ‘the elephant in the room’; ‘blue murder’

Decoding specific content (sexist, racist or ideological, etc. )

Authorship identification of true authorship

Analysis of texts written in any language and any alphabet

References

[1] Biber, Douglas et al. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP, 1998. [2] Campbell, R.S., & Pennebaker, J.W. (2003).The secret life of pronouns: Flexibility in writing style and physical health. Psychological

Science, 14, 60-65, 2003.[3] Leech, G. N. and Scott M. (1981). Style in Fiction. London: Longman, 1981. [4] Rayson, Paul. (2009). Wmatrix. A Web-based Corpus Processing Environment, Computing Department, Lancaster University, 2009. [5] Rayson, P., Archer, D., Piao, S. L., McEnery (2004). UCREL Semantic Analysis System (USAS), 2004. (http://ucrel.lancs.ac.uk/usas/)[6] Scott, M. (2012). WordSmith Tools, Version 6, Liverpool: Lexical Analysis Software, 2012 (

http://www.lexically.net/wordsmith/index.html).[7] Seizova-Nankova,T. (2012). Primary school education and computer-based language study, BETA Papers, 2012. [8] Seizova-Nankova,T. (in print). Developing collocational competence. A case study. 12th International language, Literature and Stylistics

Symposium, Edirne, Trakya University, Turkey. [9] Semino, E. and Scott, M. (2004). Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing, Routledge,

2004.[10] Sinclair, J. (2007). The Search for Units of Meaning. In Corpus Linguistics: Critical Concepts in Linguistics. Vol. 3. Routledge, 2007.[11] Yasunori Nishina. (2007). A Corpus-Driven Approach to Genre Analysis: The Reinvestigation of Academic, Newspaper and Literary

Texts”, ELR Journal, 1 (2), 2007, (http://ejournals.org.uk/ELR/article/2007/2 (accessed 27 June 2013)).[12] UCREL Home Page, Lancaster, UK. 1993-2013. 23 April, 2013, (http://www.comp.lancs.ac.uk/research/)

Electronic text resources• http://www.stylist.co.uk/books,• http://www.newyorker.com,• http://narrativemagazine.com,• http://www.one-story.com,• http://www.teachingenglish.org.uk/teaching-resources,• http://www.guardian.co.uk/books, • http://gutenberg.net.au/