Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina...

23
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low- Density Languages Katharina Probst April 5, 2002
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina...

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density

Languages

Katharina Probst

April 5, 2002

Overview of the talk

Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning

Seed Generation Seeded Version Space Learning

Conclusions and Future Work

Overview of the talk

Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning

Seed Generation Seeded Version Space Learning

Conclusions and Future Work

Introduction and Motivation Basic idea: opening up Machine Translation to Languages to

minority languages Scarce resources for minority languages:

Bilingual text Monolingual text Target language grammar

Due to scarce resources, statistical and example-based methods will likely not perform as well

Our approach: A system that elicits necessary information about the target

language from a bilingual informant The elicited information is used in conjunction with any other

available target language information to learn syntactic transfer rules

System overview

User

Learning Module

ElicitationProcess

SVSLearning Process

TransferRules

Run-Time Module SLInput

SL Parser

TransferEngine

TLGenerator

EBMTEngine

UnifierModule

TLOutput

Overview of the talk

Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning

Seed Generation Seeded Version Space Learning

Conclusions and Future Work

Elicitation

Eliciation is the process of presenting a bilingual speaker with sets of sentences. The user translates the sentences and specifies how the words align

The elicitation process serves multiple purposes: Collection of data Feature detection

Feature Detection Feature detection is a process by which the learning module

answers questions such as “Does the target language mark number on nouns?”

The elicitation corpus is organized in minimal pairs, i.e. pairs of sentences that differ in only one feature. For example:

1. You (John) are falling. [2nd person m, subj, present tense]2. You (Mary) are falling. [2nd person f, subj, present tense]3. You (Mary) fell. [2nd person f, subj, past tense] Sentences 1 and 2 and sentences 2 and 3 are minimal pairs.

By comparing the translations for “you”, the system gets indications of whether plural is marked on nouns.

The results of feature detection will be used to guide the system in navigating through the elicitation corpus by eliminating parts used on Implicational Universals

The results will also be used by the rule learning module

More on the elicitation corpus

Eliciting data from bilingual informants entails a number of challenges:

1. The bilingual informant him/herself2. Morphology and the lexicon3. Learning grammatical features4. Compositional elicitation5. Elicitation of non-compositional data6. Verb subcategorization7. Alignment issues8. Bias towards the source language

Overview of the talk

Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning

Seed Generation Seeded Version Space Learning

Conclusions and Future Work

Rule Learning in the AVENUE project - Introduction

The goal is to semi-automatically (i.e. with the help of the user) infer syntactic transfer rules

Rule learning can be divided into two main steps: Seed Generation: The system produces an initial

“guess” at a transfer rule based on only one sentence. The produced rule is quite specific to the input sentence.

Version Space Learning: Here, the system takes the seed rules and generalize them.

Transfer rule formalism

A transfer rule (TR) consists of the following components:1. Source language sentence, Target language sentence that

the TR was produced from2. Word alignments3. Phrase information such as NP, S, …4. Part-of-Speech sequences for source and target language.5. X-side constraints, i.e. constraints on the source language.

These are used for parsing.6. Y-side constraints, i.e. constraints on the target language.

These are used for generation.7. XY-constraints, i.e. constraints that transfer features from the

source to the target language. These are used for transfer.

Seed Generation

Type of Information Source of Information

SL, TL sentence Informant

Alignment Informant

Phrase Information Elicitation corpus, same as SL on TL

SL POS sequence English parse (c,f)

TL POS sequence English parse, TL dictionary

X-side constraints English parse (f)

Y-side constraints English parse, list of projecting features, TL dictionary

XY constraints ---

A word on compositionality

Basic idea: if you produce a transfer rule for a sentence, and there already exist transfer rules that can translated parts of the sentence, why not use them?

Adjust the alignments, part-of-speech sequences, and the constraints

The trickiest part is to find new constraints that cannot be in the lower-level rule, but are necessary to translate correctly in the context of a sentence

Clustering

Seed rules are “clustered” into groups that warrant attempt to merge

Clustering criteria: POS sequences, Phrase information, Alignments

Main reason for clustering: divide the large version space into a number of smaller version spaces and run the algorithm on each version space separately

Possible danger: Rules that should be considered together (such as “the man”, “men”) will not be

The Version Space A set of seed rules in a cluster defines a version space as follows: The

seed rules form the specific boundary (S). A virtual rule with the same POS sequences, alignments, and phrase information, but no constraints forms the general boundary (G):

G boundary: virtual rule with no constraints

S boundary: seed rules

Generalizations of seed rules, less specific than rule in G

The partial ordering of rules in the version space

A rule TR2 is said to be strictly more general than another rule TR1 if the set of f-structures that satisfy TR2 are a superset of the set of f-structures that satisfy TR1. It is said to be equivalent to TR1 if the set of f-structures that satisfies TR1 is the same as the set of f-structures that satisfies TR2.

We have defined three operations that move a transfer rule to a strictly more general rule

Generalization operations

Operation 1: delete value constraint, e.g.((X1 agr) = *3pl) → NULL

Operation 2: delete agreement constraint, e.g.((X1 agr) = (X2 agr)) → NULL

Operation 3: merge two value constraints to an agreement constraint((X1 agr) = *3pl) , ((X2 agr) = *3pl)

→ ((X1 agr) = (X2 agr))

Merging two transfer rules

At the heart of the seeded version space learning algorithm is the merging of two transfer rules (TR1 and TR2) to a more general rule (TR3):

1. All constraints that are both in TR1 and TR2 are inserted into TR3 and removed from TR1 and TR2.

2. Perform all instances of Operation3 on TR1 and TR2 separately.

3. Repeat step 1.

Seeded Version Space Algorithm

1. Remove duplicate rules from the S boundary2. Try to merge each pair of transfer rules3. A merge is successful only if the CSet (set of

covered sentences, i.e. sentences that are translated correctly) of the merged rule is a superset of the union of the CSets of the two unmerged rules

4. Pick the successful merge that optimizes an evaluation criterion

5. Repeat until no more merges are found

Evaluating a set of transfer rules Initial thought: evaluate a merge based on the

“goodness” of the new rule, i.e. its CSet and based on the size of the rule set

Goal: maximize coverage and minimize set Currently: merges are only successful if there is no

loss in coverage, so size of rule set only criterion used

Future(1): Coverage should be measured on a test set

Future(2): Relax the constraint that a successful merge cannot result in loss of coverage

Overview of the talk

Introduction and Motivation Overview of the AVENUE project Elicitation of bilingual data Rule Learning

Seed Generation Seeded Version Space Learning

Conclusions and Future Work

Conclusions and Future Work

Novel approach to data-driven MT: less data, more encoded linguistic knowledge

Still in the first stages, so system is under heavy development and subject to major changes

Current work: compositionality Future work includes:

Expanding coverage Addressing (much) more complex constructions Eliminating some assumptions