Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by...

10
Sketch engine for Chinese Discussion notes

Transcript of Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by...

Page 1: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Sketch engine for Chinese

Discussion notes

Page 2: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Wordsketch, subsequently Sketch Engine

• Was developed by Kilgarriff et al at Brighton

• Gives automatic, corpus-based summaries of a word’s grammatical and collocational behaviour

• Captures information in a more accessible way then hundreds of KWIC lines

• Uses MI based salience algorithm

Page 3: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Other corpus query tools do collocational salience too, but…

• Sketch engine uses lemmata not word-forms– So that eat and eats are treated the same

• And it takes account of grammatical relations– So that The plane banks and The investment

banks are treated separately– And (if the corpus is appropriately parsed) He

robs banks and He robbed the bank would be accorded similar treatment

Page 4: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Grammatical relations example

Unary relations

Word2 and Prep are not specified

Binary relations

Prep not specified

Binary relations, Word2 not specifiedTrinary relations

Page 5: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Sketch engine modules

• Concordance– KWIC or sentence context

• Thesaurus– A list of “similar” words

• Sketch differences, for distinguishing near-synonyms– If both lemmata x and y have strong collocational

salience with a, then they are near-synonyms

• Wordsketch

Page 6: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Page 7: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Sample of grammatical relation definitions script (M language)

• define(`wh_word',`[tag=3D"AVQ"|tag=3D"D`$ p& TQ"|tag=3D"PNQ"]')� �• define(`whether_if',`[tag=3D"PNQ" & word=3D"if" |word=3D"whether"]')• define(`determiner',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro]')• define(`conjunction',`"CJC"')• define(`simple_neg',`"XX."')• define(`rel_start',`[tag=3D"DTQ"|tag=3D"PNQ"|tag=3Dthat_comp]')• define(`adv_neg',`[tag=3Dany_adv|tag=3Dsimple_neg]')• define(`number',`"[OC]RD"')• define(`goal_adv',`[word=3D"back"|word=3D"over"|word=3D"home"|word=3D"awa=• y"|word=3D"out"]')• define(`long_np',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposp& €( s_pro|� �

tag=3Dnumber|ta=• g=3Dany_adv|tag=3Dany_adj|tag=3Dgenitive]{0,3} any_noun{0,2} 2:any_noun =• [tag!=3Dany_noun & tag !=3D genitive]')• define(`np_start',`[tag=3D"AT."|tag=3D"DT."|tag=3Dposs_pro|tag=3Dnumber|t=• ag=3Dany_adj|tag=3Dany_noun]')

Page 8: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Applications

• Intended as an aid to lexicographers• At least one paper on MT application• Could be used in pedagogical applications

– Earlier NSF grant aimed at a complete Chinese learning platform, with Wordsketch as a module

– Comparison of similar lexemes cross-linguistically

• Yiching is publishing about express vs biaoshi, and this work may use Wordsketch

Page 9: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

Chinese Wordsketch

• Kilgarriff et al report that Wordsketch can be ported to any language– Pavel Rychly in Czech Rep has implemented concordancing at

Chinese character level only

• AS has acquired Chinese Gigaword, and POS-tagged it automatically– No parsing has been attempted so far

• Grammatical relations ruleset for Chinese is needed• I would plan to

– contribute to the writing of this ruleset– collaborate on cross-linguistic lexical analyses, using

Wordsketch where possible

Page 10: Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.

links

• http://nlp.fi.muni.cz/projects/bonito2/chinese/– test chin

• http://www.sketchengine.co.uk/sampler/– ssmith ssmith