Developing affordable technologies for resource-poor languages

24
Developing affordable technologies for resource- poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004

description

Developing affordable technologies for resource-poor languages. Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004. dot = language. Motivation. Resource-poor scenarios - PowerPoint PPT Presentation

Transcript of Developing affordable technologies for resource-poor languages

Page 1: Developing affordable technologies for resource-poor languages

Developing affordable technologies for resource-poor

languages

Ariadna Font Llitjós

Language Technologies Institute

Carnegie Mellon University

September 22, 2004

Page 2: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 2

dot = language

Page 3: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 3

MotivationResource-poor scenarios- Indigenous communities have difficult access

to crucial information that directly affects their life (such as land laws, health warnings, etc.)

- Formalize a potentially endangered language

Affordable technologies, such as- spell-checkers, - on-line dictionaries, - Machine Translation (MT) systems, - computer assisted tutoring

Page 4: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 4

AVENUE PartnersLanguage Country Institutions

Mapudungun

(in place)

Chile Universidad de la Frontera, Institute for Indigenous Studies,

Ministry of Education

Quechua

(started)

Peru Ministry of Education

Iñupiaq

(discussion)

US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center

Siona

(discussion)

Colombia OAS-CICAD, Plante, Department of the Interior

Page 5: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 5

ChileOfficial Language: SpanishPopulation: ~15 million

~1/2 million Mapuche people

Language: Mapudungun

Mapudungun for the Mapuche

Page 6: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 6

What’s Machine Translation (MT)?

Japanesesentence Swahili

sentence

Page 7: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 7

Speech to Speech MT

Page 8: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 8

Why Machine Translation for resource-poor (indigenous) languages?

• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)

• Benefits include:– Better government access to indigenous communities

(Epidemics, crop failures, etc.)– Better indigenous communities participation in

information-rich activities (health care, education, government) without giving up their languages.

– Language preservation– Civilian and military applications (disaster relief)

Page 9: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 9

MT for resource-poor languages: Challenges

• Minimal amount of parallel text (oral tradition)• Possibly competing standards for

orthography/spelling• Often relatively few trained linguists• Access to native informants possible• Need to minimize development time and cost

Page 10: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 10

Interlingua

Transfer rules

Corpus-based methodsanalysis

interpretation

generation

I saw you Yo vi tú

Machine Translation Pyramid

Page 11: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 11

AVENUE MT system overview

\spa Una mujer se quedó en casa\map Kie domo mlewey ruka mew\eng One woman stayed at home.

{VP,3}

VP::VP : [VP NP] -> [VP NP]

( (X1::Y1) (X2::Y2)

((x2 case) = acc)

((x0 obj) = x2)

((x0 agr) = (x1 agr))

(y2 == (y0 obj))

((y0 tense) = (x0 tense))

((y0 agr) = (y1 agr)))

V::V |: [stayed] -> [quedó]

((X1::Y1)

((x0 form) = stay)

((x0 actform) = stayed)

((x0 tense) = past-pp)

((y0 agr pers) = 3)

((y0 agr num) = sg))

Page 12: Developing affordable technologies for resource-poor languages

Interactive and Automatic Refinement of Translation Rules

Or: How to recycle corrections of MT

output back into the MT system by adjusting and adapting

the grammar and lexical rules

Page 13: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 15

Error correction by non-expert bilingual users

Page 14: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 16

Interactive elicitation of MT errorsAssumptions:• non-expert bilingual users can reliably detect

and minimally correct MT errors, given:– SL sentence (I saw you)– TL sentence (Yo vi tú)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)

• using an online GUI: the Translation Correction Tool (TCTool)

Goal: • simplify MT correction task maximally

Page 15: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 17

TranslationCorrection

Tool

Actions:

Page 16: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 18

SL + best TL picked by user

Page 17: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 20

Changing “grande” into “gran”

Page 18: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 21

Page 19: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 22

Page 20: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 23

Automatic Rule Refinement Framework

• Find best RR operations given a:• grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’)

such that TQ2 > TQ1• Which can also be expressed as:

max TQ(TL|TL’,P,SL,RR(G,L))

Page 21: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 24

Types of RR operations

• Grammar:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]– R0 R1[=R0 + constr= -]

R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]

• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr] – Lex0 Lex1[=Lex0 + constr]– Lex0 Lex1[Lex0 + TLword] Lex1 (adding lexical item)

Page 22: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 25

Questions & Discussion

Thanks!

Page 23: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 26

Formalizing Error Information

Wi = error

Wi’ = correction

Wc = clue word

Example:

SL: the red car - TL: *el auto roja TL’: el auto rojo

Wi = roja Wi’ = rojo Wc = auto

Page 24: Developing affordable technologies for resource-poor languages

October 11, 2002 AMTA 2002 27

Finding Triggering Features

Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature.

If set is empty, need to postulate a new binary feature

Delta function: