Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick...

19
Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005

Transcript of Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick...

Page 1: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Contextual Information and Metadata

Sheena Gardner and

Dawn Hindle

CELTE,

University of Warwick

Corpus Linguistics Birmingham July 2005

Page 2: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Contextual vs. Linguistic

“the social context (place, time, participants) ..

is arguably .. as significant as ..

intrinsic linguistic features” Burnard 2004:8

Page 3: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

External vs. Internal External: essentially non-linguistic Internal: essentially linguistic “…the interrelation between them is one of the

areas of study for which a corpus is of primary value. In general external criteria can be determined without reading the text. ”

Atkins et al. 1992:5

Page 4: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

BAWE Grid

1 2 3 4

Arts & Humanities

Social Sciences

Life Sciences

Physical Sciences

Page 5: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Contextual perspectives: from the discourse community

a. Department documentation

b. Tutor interviews & surveys

c. Student submission forms

Page 6: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Sociology Assignment Types (interview)

Essays Book reviews Book reports Projects Urban Ethnography Assignments Fieldwork Reports Dissertations

Page 7: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

‘Linguistic’ perspectives: from text analysis

Multidimensional analysis will tell us about the language of texts in our 4 x 4 grid. Broadly, a description of register.

SFL Genre analysis will tell us about the generic stages of texts. Broadly, a description of text organization and purpose.

Page 8: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Assignment type,Register & Genre

Are essays in Sociology written in the same general academic register as essays in History?

Are essays in Sociology organised in the same way as essays in Biology?

And if the answers are ‘No’, do we still call them all ‘essays’? i.e. Can we blend contextual and linguistic categories?

Page 9: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Category blends?

Atkins et al. assume realist categories: style = prose/verse/rhyme “as determined

by surface features of the text or author’s claims” (1992:8)

region = regional type - ‘to be refined by internal evidence’

Burnard suggests it is not possible to entirely distinguish contextual and linguistic information (2004:5)

Page 10: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Diverse Issues in Blending

Linguistic choices can be interpreted differently by members of different discourse communities e.g. business and EFL (Forey 2004)

Same labels (e.g. ‘essay’), different values - whose meaning predominates?

Genre analysis is of language in context – requires both

Page 11: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Three Discrete classifications

1. ‘Contextual’ Assignment Types (from the discourse community)

2. ‘Linguistic’ Registers (from analysis of surface linguistic features)

3. ‘Contextual + Linguistic’ Genres (from analysis of language in context)

Page 12: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Issues in contextual information

“[Metadata] extends from straightforwardlabelling and identification of individualitems to the detailed representation ofcomplex interpretive data associated with theirlinguistic components”

Burnard 2004: 15

Page 13: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Year of study?

In terms of university administration, a third year student could be: in their third year of successfully completed

study; a student in the final year of their course, but

who spent their “third year” doing an industrial placement (intercalated year).

How will we classify this?

Page 14: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Year of study?

Categories: 1, 2, 3, 3*, 4, 4*

3 = 3rd year, no intercalated year 3*= 3rd year, intercalated year 4 = postgraduate 4* = 4th year undergraduate Masters

Page 15: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Language and educational background

Native vs. non-native English speakers

Submission form data:

1. first language

2. secondary education

Page 16: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Is it in or is it out?

What counts as ‘assessed’? What counts as English? What counts as ‘writing’? Creative writing / Fiction

Page 17: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Tradition vs. innovation

“We’re quite a traditional department in that we still use mainly essays, we’re very conscious that we would like to, and perhaps need to, do something about that”

Mike Neary, Director of Undergraduate Studies, Sociology, Warwick, 26/04/2005

Page 18: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Contexual Info as Metadata (1)

Student (unique code)

Gender M,F

First Language Eng, Ger, Fre …

UK Education UKA, OSA, UK1, UK2, ..

Course BA English Studies, BSc Maths

Page 19: Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.

Contextual Info as Metadata (2) Assignment (unique code) Title (text) Date written (year-month) 2002-2, 2004-5, .. Level (of student) 1,2,3,3*,4,4* Module title Critical Theory, … Module code EN3001, … Assignment Type Essay, Report, … JACS Code A, E, D … Disciplinary Grouping AH, LS, PS, SS Grade M(erit), D(istinction) Authors 1,2,3,…