Post on 30-Dec-2015
Curs 10Curs 10Natural Language Natural Language
GenerationGenerationa highly complex task a highly complex task
both for people and for machinesboth for people and for machines
Slide-uri împrumutate de la Michael Zock
LIMSI-CNRSOrsay, France
2
Some preliminary issues Some preliminary issues : : WarningWarning
This is not a state of the art talk. If you are interested in those, this here could be a starting point :
Bateman & Zock : (2003) Natural Language Generation. In R. Mitkov (Ed.) Handbook of Computational Linguistics, Oxford University Press, pp. 284-304
List of systems: http://www.fb10.uni-bremen.de/anglistik/langpro/NLG-table/NLG-table-root.htm
Anything related to NLG: http://www.siggen.org/
3
Some preliminary Some preliminary issuesissues
Background materialBackground material
Willem Levelt• Speaking : from Intention to Articulation, MIT Press, 1989
E. Reiter & R. Dale• Building Natural Language Generation Systems (2000),
Cambridge University Press
4
Overview of this talkOverview of this talk
Part 1 : General problems • knowledge and constraints, architecture,
process, etc.
Part 2 : Deep generation message planning message ordering (text plan, outline)
Part 3: Surface generation lexical choice (acces and synthesis) computation of syntactic structure
Different waysDifferent ways to look atto look at text text generationgeneration
NLG
Fully automated generation text
Simulation of psycho-logical processes
connectionism
Online processingIncremental generation
Semi-automated, machine-mediated-generation
Writer’s workbench
Foreign languagelearning
NLG NLG FORFOR peoplepeople NLG NLG LIKELIKE peoplepeopleNLG NLG WITHWITH peoplepeople
6
What is NLG? - What is NLG? - askask googlegoogle
Fort méconnue du grand public, la génération de textes demeure une discipline sportive essentiellement universitaire, pratiquée par d'obs-curs chercheurs dans des labora-toires tristes et exigus. Cette dis-cipline pousse ses malheureux adeptes à des pratiques honteuses : la génération par ordinateur inter-posé de textes longs et soporifiques à partir d'une composition séman-
tique produite mécaniquement.
Hardly known by the great majority of people, text generation remains a sport basically practiced by people from academia. Those engaged in this activity usually work in sad and narrow places. The discipline induces strange kinds of behavior like the generation of long and boaring texts via computers on the basis of mechanically produced semantic representations.
7
What is NLG?What is NLG? In search for a definitionIn search for a definition
The focus and definition may depend on the domain (psychology, linguistic, computer science)
Mapping problem: translate meanings into linguistic form
Linguistically-mediated problem solving
Language as a search problem
8
What is NL-What is NL-Generation? Generation? (I)(I)
Generation as aGeneration as a mappingmapping processprocess
NLG viewed as a process of mapping a conceptual structure (meaning) onto a linguistic
form
Input: concepts
Output: words
C1
W1
C2
W2
C3
W3
9
Catch me if you canCatch me if you can
We tend to think faster than we can find the corresponding words and convert them into
sounds
Conceptualization
Expression
C1
W1
C2
W2
C3
W3
C4
There is There is nono one-to-oneone-to-one mapping betweenmapping between linguistic linguistic
structuresstructures andand conceptual conceptual structuresstructures
11
The The samesame conceptualconceptual structurestructure may may map ontomap onto
differentdifferent linguisticlinguistic structuresstructures (synonymes, paraphrase)(synonymes, paraphrase)
This car belongs to the president verb This is the car of the president preposition
This is the president's car genitive This is his car. Poss. Adj.
PossessionPossession
12
The The samesame linguisticlinguistic structurestructure may may map ontomap onto
differentdifferent conceptualconceptual structuresstructures
Peter's car is broken possession Peter's brother is sick family relationship Peter's leg hurts inalienable
possession, part of
Linguistic ressource: genitif
13
NLG as NLG as language mediatedlanguage mediated problem solvingproblem solving
14
A A simplesimple generatiogeneratio
n n modelmodel
15
Nature of choicesNature of choices
pragmatic conceptual linguistic
16
Pragmatic choicesPragmatic choices
Languages are indirect means for achieving goals
• mediating devices
Different linguistic means serve different discourse purposes
• i.e. different forms are used in order to achieve different goals
17
Pragmatic choices: Pragmatic choices: languagelanguage as a as a resourceresource
active vs. passive voice [topic, perspective]
main vs. subordinate clause [relative prominence]
18
Conceptual choicesConceptual choices
Different meanings yield generally different forms
NUMBER he sings vs. they singTENSE he sings vs. he sang
19
Linguistic choicesLinguistic choices
The same meaning can be expressed by different words or syntactic forms (synonymes, paraphrases)
man, guy, chapGROWN UP MALE PERSON:
help, give a hand, assistHELP:
20
What is NL-What is NL-Generation?Generation?
Tentative definition Tentative definition (III)(III)
Generation as a search problemSize of mental lexicon : appr. 30 000 words
An An abstract abstract
viewview
An An exampleexample
23
Input: Input: analysisanalysis
24
Input: Input: synthesissynthesis
25
Different search Different search spacesspaces
Fundamental Fundamental problemsproblems
Analysis : ambiguity
Generation : choice
27
Why bother about generation ? Why bother about generation ? (1)(1)
DifferentDifferent kinds of kinds of motivationmotivation
Theoretical Practical Industrial
28
Theoretical reasons - building Theoretical reasons - building and testing a theoryand testing a theory
Testbed for a linguistic theory : • coverage (over/undergeneration),
correctness
Testbed for a psychological model: • simulation of cognitive processes (on-line
processing, language learning)
29
Practical reasonsPractical reasons(industrial-full automation)(industrial-full automation)
machine translation
text generation (business letters)
generation of resumes (stock market report, weather forecast, etc.)
help systems (audit trail, access to DB)
abstracting
30
Practical reasonsPractical reasons(help systems, semi automation)(help systems, semi automation)
Computer assisted language learning (tools)
Writer's workbench (pre/postediting: correction of grammar, style, spelling, text organization)
31
The decomposition of The decomposition of the task: the task: NLG-NLG-architecturesarchitectures
32
A A twotwo--stagestage modelmodelDivision of labor
GOAL
33
Four componantsFour componants
34
Procedural know-howProcedural know-how
Planning (determine the order of the different steps - textual organisation)
Searching (find the words; access)
Reasoning-inferencing (« see » possible links between ideas)
LTM
Up to lifetime
STMless than 30 seconds
Rose
Sensory Memory
1 second
Basic Memory Basic Memory Processes Processes
37
Number of choicesNumber of choices (space + time constraints)(space + time constraints)
We have to take a great number of choices under severe space and time constraints
space constraint (limitation of STM) time constraint : (speed)
speech is fast: 3-5 words / second average of decisions / word = 4
38
Diversity of choicesDiversity of choices
Conceptual choices Linguistic choices Pragmatic choices
39
The necessary information for synthesis is scattered all over
BOOKBOOK
Pronoun
Direct ObjectSubject
LISTENERLISTENER
Pronoun
GIVEGIVE
SPEAKERSPEAKER
Indirect Object
40
HowHow toto expressexpress the notion of thethe notion of the speakerspeaker ??
WhatWhat do the different formsdo the different forms dependdepend upon?upon?
SPEAKERmeme meme
moimoi meme
nousnous we / uswe / us
jeje II
41
Tu me donnes le livre.
You give me the book.
Tu nous donnes le livre.
You give us the book.
Tu ME donnes le livre.You give me the book.
Tu lui donnes le livre.You give him/her the book.
Person
Number
LISTENERLISTENER GIVEGIVE BOOKBOOKDO
SPEAKERSPEAKER
IO
Subj.
42
Donne-moi ce livre !Give me this book !
Ne me le donne pas !Don’t give me this book !
Tu m’as donné le livre.You have given me the book.
Tu me donnes le livre.You give me the book.
Donne-le moi !Give it to me !
Tu me donnes le livre.You give me the book.
Speech act
Tense
Polarity
LISTENERLISTENER GIVEGIVE BOOKBOOKDO
SPEAKERSPEAKER
IO
Subj.
43
44
Input present
PRAGMATIC CHOICEPaul = topicMarie = givenAider = new
MORPHOLOGY
Verb : 3d person, singular, present aide
Subject : Noun Paul
Direct object : pronoun la
LEXICALIZATION
HELP = aider PAUL = Paul MARY = Marie
PHONO-GRAPH. SYNTH.
Paul l’aide.
PAULPAUL MARYAgent Object
PART OF SPEECH
HELP = verb
Paul = noun
Mary = pronoun
WORD ORDER
SUBJECT noun
DIR. OBJECT pronoun
VERB verb
HELPHELP
voice = active
Paul = subject
Mary = direct object
SYNT. FUNCT. & VOICE
Paul helps her
output
45
Consequences for languages, Consequences for languages, architecture & processingarchitecture & processing
languages are and need to be flexible
information does not become available in a strict order: it may vary on every occasion
EVENT-TIME-PLACE vs. PLACE-EVENT-TIME , etc.
Consequences (interaction and accomodation) Data : accomodation of the different data structures (interaction between
words and syntax) in the different modules (conceptual lexical, syntactic), Process : feedback to higher components
46
Example illustrating Example illustrating thethe consequencesconsequences (i.e. (i.e.
functional dependenciesfunctional dependencies ) ) of theof the choiceschoices
47
Conceptual inputConceptual input
48
LetLet’’s consider the s consider the consequencesconsequences
of the following of the following 22 choiceschoices
Topicalisation the concept to start the sentence with
Lexical choice synonymes
49
Topicalize Topicalize AgentAgent
Consequences:
Agent --> Subject voice --> active Patient --> Direct Object
50
Consequences of Consequences of topicalisationtopicalisation
51
Topicalize Topicalize PatientPatient
Consequences: Agent --> PP Voice --> passive Patient -->grammatical Subject
52
Consequences of Consequences of topicalisationtopicalisation
53
Summary of theSummary of the consequences consequences of theof the topicalizationtopicalization choicechoice at at
the topthe top levellevel
Strategy 1 Strategy 2
Topic agent patient
Agent grammatical subject
preposit. phrase
Patient direct object grammat. subject
voice active passive
54
Assumptions - Assumptions - ConclusionConclusion
No superexpert but a set of cooperative agents competition - accomodation no algorithmic processing but opportunistic
planning various orders of processing various components need the same information system is heterarchical rather than hierarchic