Overview

24
1 Overview LING 5200 Computational Corpus Linguistics Martha Palmer

description

Overview. LING 5200 Computational Corpus Linguistics Martha Palmer. What’s a corpus?. McEnery & Wilson: (i) (loosely) any body of text (ii) (most commonly) a body of machine-readable text - PowerPoint PPT Presentation

Transcript of Overview

Page 1: Overview

1

Overview

LING 5200Computational Corpus LinguisticsMartha Palmer

Page 2: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52002

What’s a corpus?

McEnery & Wilson: (i) (loosely) any body of text (ii) (most commonly) a body of

machine-readable text (iii) (more strictly) a finite collection

of machine-readable text, sampled to be maximally representable of a language or variety

Page 3: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52003

What’s corpus linguistics?

“the study of language based on examples of ‘real life’ language use” (McEnery & Wilson) A methodology, not a branch of linguistics

Biber et al.: Uses computers “Natural” texts Large & principled collection Both quantitative and qualitative

Page 4: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52004

What was Chomsky’s complaint? Linguistics should model competence not

performance. What are the underlying rules that allow us to generate language?

Context – structuralists believed in collecting linguistic data about a language without taking meaning and communication into consideration.

Mirrors the debate between the rationalists and the empiricists.

But, does Chomsky account for meaning? (see Searle)

Page 5: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52005

Which Linguistic branches can make use of corpus linguistics? Phonetics Phonology Morphology Syntax Semantics Pragmatics

Psycholinguistics Computational Lx Descriptive Lx Historical Lx Sociolinguistics

Page 6: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52006

Corpus linguistics in context

data applications

models

CorpusLinguistics

NaturalLanguageProcessing

ComputationalLinguistics

Page 7: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52007

What’s LING 5200 Corpus Linguistics? Tools Techniques

Page 8: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52008

Overview

Quick intro to Unix A little corpus design Quick tour of corpora and annotation Tools for working with corpora Programming in Python Some software engineering

Page 9: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

52009

Why Python?

It works Many advantages It’s a bona fide programming language You’ll need it for CSCI 5832

Page 10: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520010

Administrative things

Textbooks – Unix, Python Office hours – Mon 5-6, Tues 1-2 verbs.colorado.edu/mpalmer/ling5200 Prerequisites - none Grades – homeworks/project Accounts on babel

Page 11: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520011

Logging on for the first time

First thing to do: change your password. passwd Give it your current password, then

your new password. Repeat the new one. (to catch typos)

Page 12: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520012

Connecting with another computer

ssh –l your_name babel.colorado.edu

You are prompted to log in.

Page 13: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520013

Logging on for the first time, again First thing to do: change your password. passwd Give it your current password, then

your new password. Repeat the new one. (Why?)

Page 14: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520014

Where am I?

Type pwd You see something like this:

/home/mpalmer

Page 15: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520015

What's that mean??

/

bin home etc usr

Page 16: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520016

Important directories

/

bin home etc usr

mpalmer

ling5200

RCS

local

bin

Page 17: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520017

Important directories

/

bin home etc usr

mpalmer

ling5200

RCS

local

bin/home/mpalmer/ling5200

Page 18: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520018

Important directories

/

bin home etc usr

mpalmer

ling5200

RCS

local

bin/home/mpalmer/ling5200 /usr/local/bin

Page 19: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520019

Navigating directories

ls to list contents, cd to change directory Directories are just like windows folders

/home/mpalmer shortcut: ~ “the directory above this one”: .. “this directory”: .

Page 20: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520020

What's in the neighborhood?

Type ls You see a list of directories and files

that are contained within the current directory

Homework_1.txttoolsbuglog.txt

Page 21: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520021

I'd like to go somewhere else…

Type pwd Type cd Where are you? Type cd .. Where are you? Type cd your_user_id Where are you?

Page 22: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520022

Unix is a verb-initial language

cd ..

"go" where to go

Page 23: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520023

Unix is a verb-initial language

cd

"go"If no argument, I assume you mean "home"

Page 24: Overview

LING 5200, 2006 BASED on Kevin Cohen’s LING

520024

Making a new directory

Type cd Type ls Type mkdir ling5200 Type ls Go to the directory you just made (how?) Type pwd Type ls