English version Parsing natural languages : from "combinatorial" to "deterministic" parsing Jacques...
-
Upload
coral-horton -
Category
Documents
-
view
223 -
download
0
Transcript of English version Parsing natural languages : from "combinatorial" to "deterministic" parsing Jacques...
English version
Parsing natural languages : from "combinatorial"
to "deterministic" parsing
Jacques VergneGREYC - Université de Caen
http://www.info.unicaen.fr/~jvergne
3/7/2001 © Jacques Vergne TALN 2001 -2-
English version
-
Introduction :
• our 1998 parser :
- 1st place at the GRACE evaluation (1995-1998)
- Grammaires et Ressources pour les Analyseurs de Corpus et leur Évaluation
- 22 participants of France, Suisse, Deutschland, Québec, USA : labs, companies (AT&T, IBM, Xerox, France-Télécom, ...)
- decision = 100% (= tokens with a unique tag / total of tokens)
- precision = 94,5% (= tokens with the same tag than human / tokens with a unique tag )
• what are the features of this parser ?
it is a deterministic parser
3/7/2001 © Jacques Vergne TALN 2001 -3-
English version
-
Introduction : our aims
• stressing the evolution of concepts and methods in parsing
• understanding why compiling drove to combinatorial parsing
• understanding principles of deterministic parsing
3/7/2001 © Jacques Vergne TALN 2001 -4-
English version
parserdata to be
parsedparsed
data
-
resources : declarative —> procedural
resources : static —> dynamic
process : combinatorial —> deterministic
programming languages
—> natural languages
Introduction : our workspace
parsingprocess
resources of the process
3/7/2001 © Jacques Vergne TALN 2001 -5-
English version
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
• parsing models :
origin, historical evolution
• criteria :
parsing process ,
resources of the process
parsingprogramming
languages
-
Introduction : our way into this space
3/7/2001 © Jacques Vergne TALN 2001 -6-
English version
• 2. parsing natural languages : combinatorial —> deterministic
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
Plan of the lecture
• 3. Some features of our parsers
• 1. parsing programming languages —> parsing natural languages
3/7/2001 © Jacques Vergne TALN 2001 -7-
English version
• 1. Parsingprogramming
languages —> parsing
natural languages
-
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
3/7/2001 © Jacques Vergne TALN 2001 -8-
English version
• 1.1 Formal grammars : a modelling tool for natural language syntax
• model : simplified and formalised representation of an object or a process
• Noam Chomsky
- first training as a mathematician
- then as linguist, Harris' student (who was a Bloomfield's student)
but at odds / object : attested material —> speaker's "competence"
- 1957 : Syntactic Structures : a linguist's book
- for chomskian linguists : modelling the "competence" of the native speaker (while generating)
- for "NLists" : modelling natural languages syntax, as attested material (while parsing)
- divergence between both trends about 1971 (Extended Standard Theory)
-
3/7/2001 © Jacques Vergne TALN 2001 -9-
English version
• 1.2 Formal grammars : a modelling tool
for programming language syntax
• 1958-1960 : ALGOL 60, first programming language
whose syntax is defined by a (context free) formal grammar
• formal grammar —> a method to design compilers
• first "ALGorithmic" Oriented programming Language :
no more goto in control structures ,but : alternative : if ... then ... else
repetitive : for ... step ... until ... do
for ... while ... do
-
3/7/2001 © Jacques Vergne TALN 2001 -10-
English version
• 1.2 Formal grammars : a modelling tool
for programming language syntax
ALGOL 60 : 1st language with recursive block structures :
program complex_statement
block
simple_statement
**
program —> complex_statement *
complex_statement —> simple_statement | block
block —> complex_statement *
formal grammar
UML diagram
-
3/7/2001 © Jacques Vergne TALN 2001 -11-
English version
• 1.3 A member of the ALGOL group :Bernard Vauquois
• 7 countries : Deutschland, Denmark, USA, France, Great-Britain, Nederland, Suisse
• 14 delegates : Backus, Bauer, Green, Katz, Mc Carthy, Naur, Perlis, Rutishauser, Salmelson, Turanski, Vauquois, Wegstein, Van Wijngaarden, Woodger
• conferences : Zurich (1958), Mayenne (1958), Copenhague (1959), Paris (June 1959, January
1960)
• filiation of ALGOL 60 : Pascal —> C —> C++ —> Java —> Ada
-
3/7/2001 © Jacques Vergne TALN 2001 -12-
English version
• 1.4 Bernard Vauquois, director of the CETA in 1961
• astronomer-mathematician —> computer scientist - linguist
• computer science teacher (formal language theory) at the university of Grenoble
• his ideas to base the Machine Translation of 2d generation :- using the formal language theory
- basing Machine Translation on the compiling model
• Christian Boitet, "L'apport scientifique de Bernard Vauquois" (Analectes, 1989) :
Il revient sans doute au CETA, à l'initiative de B. Vauquois, d'avoir introduit
l'analogie entre TA et compilation. Ainsi un système de TA est-il vu comme
une sorte de "compilateur de langue naturelle".-
3/7/2001 © Jacques Vergne TALN 2001 -13-
English version
• 1.5 Compiling <—> Machine Translation
human translation :designing-programming
statements in a progr. language
automatic translation :compiling
human
statements in a natural language
processor
statements in machine language
automatic translation of Natural Languages
human
texts in a source NL
texts in a target NL
human
-
parsed languages :different
3/7/2001 © Jacques Vergne TALN 2001 -14-
English version
• 1.6 Transpositions of formal grammars into NLP
Extended Standard Theory
(Chomsky)
linguistics
modelling the "competence" of the native speaker, in generation
-
combinatorialNLP
modelling the syntax of natural languages (attested material), in parsing
analogy MT - compiling (Vauquois)
linguistics —> NLP : abandonment of the object
modelled by Chomsky
computer science
modelling the syntax of programming languages (ALGOL 60)
3/7/2001 © Jacques Vergne TALN 2001 -15-
English version
programming language parser
constituenttree
• 1.7 Compiling —> Natural Language parsing
program
exhaustive lexical
resources :primitives
exhaustive syntactic
resources :formal
grammar
sentence
dictionary
compiling parsingnatural languages
-
natural language parser
resources :the same static model of the parsed language
3/7/2001 © Jacques Vergne TALN 2001 -16-
English version
• 1.7 Compiling —> Natural Language parsing
criteria compiling parsing natural languages
process repetitive / token combinatorial deterministic non deterministic
complexity theoretical : polynomial theoretical : exponential in time practical : linear practical : polynomial
language formal language natural language
the same model of the parsed language, but a different process :
-
but the process is not transposed
the model of the parsed language is transposed
3/7/2001 © Jacques Vergne TALN 2001 -17-
English version
• 1.7 What difference between programming languages
and natural languages ?
criteria programming languages natural languages
dictionary closed and frozen open and changing
how many 1 token 1 token tags <—> <—> per token ? 1 unique tag several tags
=> compiling = deterministic process —> parsing natural languages
= non deterministic process
-
3/7/2001 © Jacques Vergne TALN 2001 -18-
English version• 2. Parsing natural languages :
combinatorial —> deterministic
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
resources = dynamic models of the computation process
resources = static models of the expected structures :
formal grammars
3/7/2001 © Jacques Vergne TALN 2001 -19-
English version
• 2.0 An example of two solving ways
• A problem :a father is 4 times older than his son, and their age difference is 30 years
• Its combinatorial resolution :
- be s the age of the son, and f the age of the father
- with 0<s<100, 0<f<100 (supposing an integer solution) :
for each 10 000 couples (s, f), if both constraints are satisfied then output the couple (s, f)
- number of solutions : a priori unknown : 0, 1, n
• Its deterministic resolution :
- posing the system of 2 equations with 2 unknowns : f=4s f-s=30- solving the system => unique solution :
f=4s and f-s=30 => 4s-s=3s=30 => s=10 => f=30+s=40-
3/7/2001 © Jacques Vergne TALN 2001 -20-
English version
• 2.1 Parsing natural
languages : combinatorial
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
3/7/2001 © Jacques Vergne TALN 2001 -21-
English version
• 2.1 Posing a problem in a combinatorial way
• "combinatorial" :
characterise a way to pose and solve a problem
but not the problem itself
• posing a problem in a combinatorial way :
- the attributes of a set of units have several possible values
- there are constraints on attribute values
- we want to find the attribute values which satisfy constraints
= posing it as a Constraint Satisfaction Problem (CSP)
-
3/7/2001 © Jacques Vergne TALN 2001 -22-
English version
• 2.1 Solving a problem in a combinatorial way
• in a combinatorial resolution :
-
• theoretical complexity in time : exponential according to the number of units
verifyverified
combinations
(0, 1, n)constraints
-2- for each combination, the constraint satisfaction is verified
enumeratecombinations
of values
attributes possible values
units
to process
-1- all possible combinations are enumerated
3/7/2001 © Jacques Vergne TALN 2001 -23-
English version
• 2.1 Posing NL parsing in a combinatorial way
• the problem of NL parsing
is traditionally posed and solved in a combinatorial way
• the problem is posed in such a way :
- words of a sentence have several possible tags - all attributes possible values of all words
are "exhaustively" enumerated in the dictionary
- constraints on tags are possible syntactic structures of sentences and phrases, explicited in the formal grammar
- we want to find word tags which satisfy constraints(= "disambiguisation")
-
3/7/2001 © Jacques Vergne TALN 2001 -24-
English version
• 2.1 Solving NL parsing in a combinatorial way
verifyverified
combinations
(0, 1, n)constraintsformal grammar
enumeratecombinations
of tags
units to processwords of the sentence
possible values of attributespossible tags (dictionary)
-
3/7/2001 © Jacques Vergne TALN 2001 -25-
English version
• 2.2 combinatorialNL parsing
—>tagging
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
resources = static models of the expected structures :
formal grammars
resources = dynamic models of the computation process
3/7/2001 © Jacques Vergne TALN 2001 -26-
English version
• 2.2 combinatorial parsing —> tagging
-
declarative or static resources : formal grammar
combinatorial process : recognising expectations
sentence parsedsentence
parsingprocess
resources of the process : expected structures
combinatorialNL parsing
0,1 or nphrase trees
complexity : theoretical : exponential, practical : polynomial
procedural or dynamic resources
deterministic process : interpreting rules
text parsedtext
parsingprocess
resources of the process : contextual rules
taggingchunking
1 unique result
complexity : theoretical : linear, practical : linear
3/7/2001 © Jacques Vergne TALN 2001 -27-
English version
• 2.2 Tagging, a process from forms and their position
-
Le site allemand de Dasa à Hambourg
devra assembler ce nouvel avion .
• some linguistic properties of forms & their position :
word tag => constraint on the tag of the following word
—> tagging process (of words)
det. => noun or adjective prep. => det., noun or adjective
3/7/2001 © Jacques Vergne TALN 2001 -28-
English version
• 2.2 Chunking, a process from forms and their position
-
Le site allemand de Dasa à Hambourg
devra assembler ce nouvel avion .
Le .... ........ de .... à ........
devra .......er ce ...... ..... .
• some linguistic properties of forms & their position :
function word => beginning and type of a chunk (Abney 1991)
—> segmentation process (chunking), tagging (of chunks)
[ [N [[ [ [
pN pN
NV
Le site allemand de Dasa à Hambourg
devra assembler ce nouvel avion .
3/7/2001 © Jacques Vergne TALN 2001 -29-
English version
• 2.3 Tagging —>
deterministic parsing
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
resources = static models of the expected structures :
formal grammars
resources = dynamic models of the computation process
3/7/2001 © Jacques Vergne TALN 2001 -30-
English version
• 2.3 Tagging —> deterministic parsing
-
• quasi identical, but which differences ?
- lexical resources : function words + morphemes of word endingonly one tag by default
- rules : conditions => actions : + linking units
text parsedtext
parsingprocess
resources of the process : contextual rules
procedural or dynamic resources
deterministic process : interpreting rules
1 uniqueresult
complexity : theoretical : linear, practical : linear
tagging
rules : conditions => actions
deterministicparsing
3/7/2001 © Jacques Vergne TALN 2001 -31-
English version
• 2.4 Parsing natural
languages : deterministic
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
resources = static models of the expected structures :
formal grammars
resources = dynamic models of the computation process
3/7/2001 © Jacques Vergne TALN 2001 -32-
English version
• 2.4 Posing and solving a problem in a deterministic way • to pose and solve a problem in a deterministic way :
- posing the problem in terms of computing from data, and not in terms of choice among known possible values
- having a better knowledge about properties of units to process and better exploiting this knowledge
=> finding more properties and finding how to use them to directly and definitively compute the value of an attribute
computeunique solution
properties
units to process
computeunique solution
operations
operandsproperties allow
to build operationson units to process
-
• deterministic resolution :
3/7/2001 © Jacques Vergne TALN 2001 -33-
English version
• 2.4 Posing and solving NL parsing
in a deterministic way
• explicitly taking in account the openness of NL :
- it is impossible to exhaustively describe a natural language, contrary to a programming language
- minimal lexical resources : function words + morphemes of word ending
- no formal grammar : no (exhaustive) inventory of expected structures no grammaticality test
• rules conditions => actions explicit in the same formalism :
- minimal typographical, lexical and morphological resources,
- linguistic properties (explicit use of context),
- the linking process of units
-
static model of the expected structures —>
dynamic model of the computing process
3/7/2001 © Jacques Vergne TALN 2001 -34-
English version
• computing process :
- a rule engine triggers "conditions => actions" rules once on each unit :
characters, tokens, phases, clauses, sentences, (paragraphs, ... )
- conditions on unit attributes and on links between units (contiguities, constituencies, dependencies, co-ordinations,
...)
- actions : affecting values to attributes, setting links (dependencies et co-ordinations), and generating units of the level above
• process of complexity linear : flow processing at constant rate
• lexicon et syntactic structures of the parsed text computed and output
-
• 2.4 Solving NL parsing in a deterministic way
3/7/2001 © Jacques Vergne TALN 2001 -35-
English version
• 3. Some features
of our parsers
parsing natural languages
combinatorialparsing
taggingchunking
deterministicparsing
parsingprogramming
languages
-
resources = static models of the expected structures :
formal grammars
resources = dynamic models of the computation process
3/7/2001 © Jacques Vergne TALN 2001 -36-
English version
• 3.1 A process from forms and their position
-
Le site allemand de Dasa à Hambourg
devra assembler ce nouvel avion .
Le .... ........ de .... à ........
devra .......er ce ...... ..... .
N pN pN
NV
V ?
N?
pN ? pN ? pN ?
subject - verb object - verb
• some linguistic properties of forms & their position :
—> segmenting process (chunking), tagging and linking (chunks)
Le site allemand de Dasa à Hambourg
devra assembler ce nouvel avion .
3/7/2001 © Jacques Vergne TALN 2001 -37-
English version
The ............ of .... in .......
have .......ed a ............. .
• 3.2 Another language, same process
-
N pN pN
NV
V ? pN ? pN ? pN ?
N?object - verb
• first rules package : written forms —> units attributes
• following packages : computing on attributes, identical for several natural languages
subject - verb
3/7/2001 © Jacques Vergne TALN 2001 -38-
English version
main clause ?
Qu'il s'agisse d'un logement vide ou meublé ,
le propriétaire ne peut pas s'opposer
à ce que le locataire héberge un animal familier .
Qu'il ........ .... ........ .... .. ...... ,
.. ............ .. ..... ... .........
à ce que .. ......... ....... .. ...... ........ .
• 3.3 Another level, same process
-
subordinated clause
subordinated clause
main clause
• some linguistic properties of forms & and their position :
—> segmentation process (into clauses), tagging and linking (clauses)
subordination
sub. clause ?
subordination
3/7/2001 © Jacques Vergne TALN 2001 -39-
English version
condition
• 3.4 A process of complexity linear to link units
• a 2 steps process, while units are arriving :
unit i
virtual unit
unit j
typestep 1rule 1
intermediary in the process,always invokable in conditions
type
step 2rule 2
action
-
conditionactions• process of linear complexity, independent of units arriving between both linked units
• this process models a valence saturation
3/7/2001 © Jacques Vergne TALN 2001 -40-
English version
• Conclusion
combinatorialNLP
resources :static model
of all possible expected forms formal grammarexhaustive dictionary word grain
deterministicNLP
computing from forms
and their position
computing rules are based on somelinguistic properties
grains : document, paragraph, sentence,
clause, chunk, ...
partial resources :dynamic model
of the computing process
-
abandonment of the static model
of all expected forms
3/7/2001 © Jacques Vergne TALN 2001 -41-
English version
end of the lecture
•
• you can download this presentation on http://www.info.unicaen.fr/~jvergne/TALN2001_JVergne_en.ppt
• also see the tutorial of Coling 2000"Trends in Robust Parsing"
on http://www.info.unicaen.fr/~jvergne/tutorialColing2000.html
(presentation and references)
-