Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

20
Measuring Linguistic Complexity Kristopher Kyle 3-5-2015

Transcript of Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Page 1: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Measuring Linguistic Complexity

Kristopher Kyle3-5-2015

Page 2: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Who is this guy?

Interested in:

L2 Writing Quality/Development

Assessment

Natural Language Processing

Productive Vocabulary

Productive Syntax

Page 3: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Outline of Workshop

Why measure linguistic complexity?

How can linguistic complexity measures be conceptualized?

How do we actually measure linguistic complexity?

Hands-on workshop I: Measuring syntactic complexity

Hands-on workshop II: From raw data to findings (if time)

Page 4: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Why measure linguistic complexity?

In the 70’s, SLA researchers (e.g., Larsen-Freeman, 1978) wanted to measure language development

Larsen-Freeman proposed three constructs of development: complexity accuracy fluency

The general hypothesis (with regard to complexity) has been: As language learners develop, their language will become more complex.

How complexity is measured has been the subject of much debate (e.g., Bulté & Housen, 2012)

Page 5: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

Wolfe-Quintero et al. (1998) provides a compendium of CAF measures up until the late 90’s

Lexical Complexity: a variety of general and part of speech specific

type/token ratio counts

Syntactic Complexity a variety of clause, sentence, and T-unit measures

that focus on clausal complexity.

Page 6: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

Most of syntactic complexity indices are ratio scores: (Structure A)/(Structure B).

The denominator (Structure B) is either:

clause: a main verb and its dependents (I eat pizza.)

T-unit: an independent clause and any attached dependent clauses (I eat pizza because it is delicious.)

sentence: A string of words that starts with a capital letter and ends with sentence-ending punctuation (I think you know what a sentence is.)

Page 7: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

The numerator (Structure A) has included many structures:

clauses

dependent clauses

adverbial clauses

T-units

complex T-units

coordinate phrases

complex nominals

verb phrases

passives

Page 8: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

Length of unit measures have also been prominent (e.g., Ortega, 2003; Lu, 2011).

Mean length of clause (MLC)

Mean length of T-unit (MLTU)

Mean length of sentence (MLS)

Page 9: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

The rise of phrasal complexity:

Biber, Poonpon, and Grey (2011) suggested that clausal subordination (i.e., what most syntactic complexity indices measure) is NOT a prominent feature of academic writing

Informal speech includes many dependent clauses, but academic writing includes many dependent phrases (and especially noun phrases.

Page 10: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How can linguistic complexity measures be conceptualized?

Some important issues:

Definition of measures What counts as a clause?

Prominence of broad indices What does MLC really tell us about development?

Often only a limited range of measures are used.

Page 11: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How do we actually measure linguistic complexity?

To measure linguistic complexity, we have two options.

Option #1: Count features by hand

Option #2: Count features using a computer

Page 12: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How do we actually measure linguistic complexity?

Advantages of Option 1: Researcher has full control over how syntactic

complexity is measured. Human counts may be more accurate

Disadvantages of Option 1: Expensive! Intra-rater reliability Inter-rater reliability – who is qualified?

Page 13: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

How do we actually measure linguistic complexity?

Advantages of Option 2: Very cheap Reliable (same results every time) Usually Accurate

Biber (e.g., 2004) and Lu (2010, 2011) report accuracies above 90%

Can analyze a broad range of indices at once.

Disadvantages of Option 2: Research has less control (is at mercy of available

programs) Some data is not well-suited to automatic analysis Some linguistic features cannot be reliably captured

Page 14: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

Go to www.kristopherkyle.com/workshop/ and download the “short_samples.zip” file.

Without talking with your neighbor(s) fill in the included excel sheet for examples 1-5.

What were your answers?

Any issues with example 5?

Now do the same for example 6…

Page 15: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

Tool for the Automatic Analysis of Syntactic Complexity (TAASC) Prototype!!!

Includes indices created by Xiaofe Lu (Syntactic Complexity Analyzer; Lu, 2011)

Also includes some replications of the Biber Tagger

Page 16: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

How TAASC works:

Reads file

Splits file into sentences

Parses each sentence uses Stanford Parser

Uses regular expressions (a way to search for patterns) to identify particular structures in the parse tree. uses Stanford Tregex (regular expressions for parse

trees)

Page 17: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

Now, lets check to see if your computer is set up correctly.

First, search for Terminal (mac) or Command Prompt (Windows)

Then type: java –version

Then type: python

Go to www.kristopherkyle.com/workshop/ and download the appropriate version of TAASC (windows or mac).

Extract it to your Desktop

Copy the example files to the “to_process_2” folder

Page 18: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

Now, in Terminal/Command Prompt type: cd [location of TAASC folder] (then press “return”) python [name of the appropriate TAASC program]

(“return”)

Your results should now be in a file called “results.csv”

If you want to examine the accuracy of the parse trees, look in the folder “parsed_files” using Tregex

Page 19: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop I: Measuring syntactic complexity

Some simple patterns:

VP

VP<S

Some important patterns:

clause: S|SINV|SQ <<# MD|VBP|VBZ|VBD

T-unit: S|SBARQ|SINV|SQ > ROOT | [$-- S|SBARQ|SINV|SQ !>> SBAR|VP]

Page 20: Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Hands-on workshop II: From raw data to findings

Go to www.kristopherkyle.com/workshop/ and download the “Workshop_Data.zip” file.

58 participants, three timed essays over 1 year.

IEP Levels 3-4 (Intermediate/Advanced)

Now let’s analyze some data!

NOTE: We didn’t get to this in class…