Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme:...

34
Generation
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme:...

Page 1: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Generation

Page 2: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Aims of this talk Discuss MRS and LKB generation Describe larger research programme:

modular generation Mention some interactions with other

work in progress: RMRS SEM-I

Page 3: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Outline of talk Towards modular generation Why MRS? MRS and chart generation Data-driven techniques SEM-I and documentation

Page 4: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Modular architecture

Language independent component

Meaning representation

Language dependent realization

string or speech output

Page 5: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Desiderata for a portable realization module Application independent Any well-formed input should be

accepted No grammar-specific/conventional

information should be essential in the input

Output should be idiomatic

Page 6: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Architecture (preview)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Page 7: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Why MRS? Flat structures

independence of syntax: conventional LFs partially mirror tree structure

manipulation of individual components: can ignore scope structure etc

lexicalised generation composition by accumulation of EPs: robust

composition Underspecification

Page 8: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

An excursion: Robust MRS Deep Thought: integration of deep and

shallow processing via compatible semantics

All components construct RMRSs Principled way of building robustness into

deep processing Requirements for consistency etc help

human users too

Page 9: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Extreme flattening of deep output

x

every

cat

x

some

y dog1 chase

y x y

some

y dog1

y

every

x cat chase

xe

x ye

lb1:every_q(x), RSTR(lb1,h9), BODY(lb1,h6), lb2:cat_n(x), lb5:dog_n_1(y),

lb4:some_q(y), RSTR(lb4,h8), BODY(lb4,h7), lb3:chase_v(e),ARG1(lb3,x),

ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5

Page 10: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Extreme Underspecification Factorize deep representation to minimal

units Only represent what you know Robust MRS

Separating relations Separate arguments Explicit equalities Conventions for predicate names and sense

distinctions Hierarchy of sorts on variables

Page 11: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Chart generation with the LKB1. Determine lexical signs from MRS2. Determine possible rules contributing EPs

(`construction semantics’: compound rule etc)

3. Instantiate signs (lexical and rule) according to variable equivalences

4. Apply lexical rules5. Instantiate chart6. Generate by parsing without string position7. Check output against input

Page 12: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Lexical lookup for generation _like_v_1(e,x,y) – return lexical entry for

sense 1 of verb like temp_loc_rel(e,x,y) – returns multiple temp_loc_rel(e,x,y) – returns multiple

lexical entrieslexical entries multiple relations in one lexical entry: multiple relations in one lexical entry:

e.g., e.g., who, wherewho, where entries with null semantics: heuristicsentries with null semantics: heuristics

Page 13: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Instantiation of entries _like_v_1(e,x,y) & named(x,”Kim”) &

named(y,”Sandy”) find locations corresponding to `x’s in all FSs replace all `x’s with constant repeat for `y’s etc

Also for rules contributing construction semantics

`Skolemization’ (misleading name ...)

Page 14: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Lexical rule application Lexical rules that contribute EPs only

used if EP is in input Inflectional rules will only apply if

variable has the correct sort Lexical rule application does

morphological generation (e.g., liked, bought)

Page 15: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Chart generation proper Possible lexical signs added to a chart

structure Currently no indexing of chart edges

chart generation can use semantic indices, but current results suggest this doesn’t help

Rules applied as for chart parsing: edges checked for compatibility with input semantics (bag of EPs)

Page 16: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Root conditions Complete structures must consume all

the EPs in the input MRS Should check for compatibility of scopes

precise qeq matching is (probably) too strict

exactly same scopes is (probably) unrealistic and too slow

Page 17: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Generation failures due to MRS issues Well-formedness check prior to input to

generator (optional) Lexical lookup failure: predicate doesn’t

match entry, wrong arity, wrong variable types

Unwanted instantiations of variables Missing EPs in input: syntax (e.g., no noun),

lexical selection Too many EPs in input: e.g., two verbs and no

coordination

Page 18: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Improving generation via corpus-based techniques CONTROL: e.g. intersective modifier

order: Logical representation does not determine

order• wet(x) & weather(x) & cold(x)

UNDERSPECIFIED INPUT: e.g., Determiners: none/a/the/ Prepositions: in/on/at

Page 19: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Constraining generation for idiomatic output Intersective modifier order: e.g.,

adjectives, prepositional phrases Logical representation does not

determine order wet(x) & weather(x) & cold(x)

Page 20: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Adjective ordering Constraints / preferences

big red car * red big car cold wet weather wet cold weather (OK, but dispreferred)

Difficult to encode in symbolic grammar

Page 21: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Corpus-derived adjective ordering ngrams perform poorly Thater: direct evidence plus clustering positional probability Malouf (2000): memory-based learning

plus positional probability: 92% on BNC

Page 22: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Underspecified input to generationWe bought a car on FridayAccept:

pron(x) & a_quant(y,h1,h2) & car(y) & buy(epast,x,y) & on(e,z) & named(z,Friday)

and:pron(x) & general_q(y,h1,h2) & car(y) & buy(epast,x,y) & temploc(e,z) & named(z,Friday)

And maybe: pron(x1pl) & car(y) & buy(epast,x,y) & temp_loc(e,z) & named(z,Friday)

Page 23: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Guess the determiner We went climbing in _ Andes _ president of _ United States I tore _ pyjamas I tore _ duvet George doesn’t like _ vegetables We bought _ new car yesterday

Page 24: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Determining determiners Determiners are partly conventionalized,

often predictable from local context Translation from Japanese etc, speech

prosthesis application More `meaning-rich’ determiners assumed to

be specified in the input Minnen et al: 85% on WSJ (using TiMBL)

Page 25: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Preposition guessing Choice between temporal in/on/at

in the morning in July on Wednesday on Wednesday morning at three o’clock at New Year

ERG uses hand-coded rules and lexical categories Machine learning approach gives very high precision

and recall on WSJ, good results on balanced corpus (Lin Mei, 2004, Cambridge MPhil thesis)

Page 26: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

SEM-I: semantic interface Meta-level: manually specified

`grammar’ relations (constructions and closed-class)

Object-level: linked to lexical database for deep grammars

Definitional: e.g. lemma+POS+sense Linked test suites, examples,

documentation

Page 27: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

SEM-I development SEM-I eventually forms the `API’: stable,

changes negotiated. SEM-I vs Verbmobil SEMDB

Technical limitations of SEMDB Too painful! `Munging’ rules: external vs internal SEM-I development must be incremental

Page 28: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Role of SEM-I in architecture Offline

Definition of `correct’ (R)MRS for developers

Documentation Checking of test-suites

Online In unifier/selector: reject invalid RMRSs Patching up input to generation

Page 29: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Goal: semi-automated documentation

Lex DB [incr tsdb()]

and semantic test-suite

Object-level SEM-I

Meta-level SEM-I

Documentation

Auto-generate examples

autogenerateappendix

ERGDocumentation

strings

semi-automatic

examples, autogenerated

on demand

Page 30: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Robust generation SEM-I an important preliminary

check whether generator input is semantically compatible with grammars

Eventually: hierarchy of relations outside grammars, allowing underspecification

`fill-in’ of underspecified RMRS exploit work on determiner guessing etc

Page 31: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Architecture (again)

Chart generator

String

External LF

Internal LF

SEM-I

control modules

specializationmodules

Page 32: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Interface External representation

public, documented reasonably stable

Internal representation syntax/semantics interface convenient for analysis

External/Internal conversion via SEM-I

Page 33: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Guaranteed generation? Given a well-formed input MRS/RMRS,

with elementary predications found in SEM-I (and dependencies)

Can we generate a string? with input fix up? negotiation? Semantically bleached lexical items: which,

one, piece, do, make Defective paradigms, negative polarity,

anti-collocations etc?

Page 34: Generation. Aims of this talk Discuss MRS and LKB generation Describe larger research programme: modular generation Mention some interactions with other.

Next stages SEM-I development Documentation and test suite integration Generation from RMRSs produced by shallower

parser (or deep/shallow combination) Partially fixed text in generation (cogeneration) Further statistical modules: e.g., locational

prepositions, other modifiers More underspecification Gradually increase flexibility of interface to

generation