Corpus linguistic What is corpus linguistic?

Post on 30-Dec-2015

49 views 7 download

description

Corpus linguistic What is corpus linguistic?. Corpus linguistic use large collection spoken or written natural text that are stores in computers . One of major contribution corpus linguistic is in area explore pattern language use. Corpus Design and compilation:. - PowerPoint PPT Presentation

Transcript of Corpus linguistic What is corpus linguistic?

Corpus linguistic

What is corpus linguistic?•Corpus linguistic use large collection spoken or

written natural text that are stores in computers.

•One of major contribution corpus linguistic is in area explore pattern language use

Corpus Design and compilation: Corpus is a large and principled collection

text stored in electronic format.

There is no minimal size for textcollection to be consider as corpus, an

standard size set by creator Brown corpus was on

million words.

Type of corpora There are many corpus such as

1: LOB corpus

2: COCA corpus

3: BNC corpus

Issues in corpus design: One of most important factor in corpus

linguistic is design of corpus.

The composition of corpus reflect the anticipate research goal.

Corpus used for explore lexical question to very large to allow accurate representation large number of words and of the different sense or meaning that word might have.

Corpus compilation: When creating corpus ,data collection obtain or

creating electronic version of target text, and stored and organize them.

Written corpora far less labour intensive to collect than spoken corpora.

Data collection for written corpus mean: using scanner and optical character recognition software to scan paper document into electronic text files

Markup and Annotation:

Simple corpus consist of raw text, with no additional information about origin,

authors, speaker ,structure or content of text themselves.

Encode some this information in markup make corpus much richer and useful esp.

To research who were not involved in compilation.

Structural markup refer to use of code in text to identify structural feature of text