HPSG Alpino System
description
Transcript of HPSG Alpino System
HPSG Alpino System
Universität des SaarlandesSeminar: Recent Advances in Parsing TechnologyWinter Semester 2011-2012
Jesús Calvillo
Outline Introduction Overview Part of Speech Tagging
Lexical Ambiguity HMM Tagger Tagger Training Results
Disambiguation Component Parsing
Recovery of Best Parse Accuracy References
Introduction
What is Alpino? Computational Analyzer for Dutch. Exploits Knowledge-based (HPSG-
grammar and -lexicon) and Corpus-based Technologies.
Aims at accurate, full parsing of unrestricted text, with coverage and accuracy comparable to state-of-the-art parsers for English.
Introduction Grammar
Wide Coverage Computational HPSG. About 600 construction specific rules. Rather than
general rule schemata and abstract linguistic principles.
Lexicon About 100,000 entries and 200,000 named
entities. Lexical rules for dates, temporal expressions, etc. Large variety of unknown word heuristics. Morphological constructor.
Overview
POS Tagging Lexical ambiguity has an important negative
effect on parsing efficiency.
In some cases, a category assigned is obviously wrong. I called the man up I called the man
Application of hand-written rules relies on human experts and is bound to have mistakes.
POS Tagging Training corpus used by the tagger is
labeled by the parser itself (unsupervised learning).
Not forced to disambiguate all words. It only removes about half of the tags assigned by the dictionary.
Resulting System can be much faster, while parsing accuracy actually increases slightly.
HMM Tagger Variant of a standard trigram HMM tagger To Discard tags: Compute probabilities for each tag
individually:
α and β are the forward and backward probabilities as defined:
is the total probability of all paths through the model that end at tag t at position i;
is the total probability of all paths starting at tag t in position i, to the end.
HMM Tagger
After calculating all the probabilities for all the potential tags...
A tag t on position i is removed if there is another t´, such that:
is a constant threshold value.
Training the Tagger Training Corpus constructed by the parser.
Running the parser on a large set of example sentences, and collecting the sequences of lexical category classes that were used by what the parser believed to be the best parse.
Contains Errors. It does not learn the “correct” lexical category sequences, but rather which sequences are favored by the parser.
Corpus: 4 years of Dutch daily newspaper text. Using only “easy” sentences (sentences <20 words or sentences that take <20 secs of CPU time)
Experimental Results Applied to the first 220 sentences of
the Alpino Treebank. 4 Sentences were removed.
Low threshold -> small number of tags -> fast parsing
High threshold -> higher accuracy -> decrease efficiency.
If all lexical categories for a given sentence are allowed, then the parser can can almost always find a single (but sometimes bad) parse.
If the parser is limited to the more plausible lexical categories, it will more often come up with a robust parse containing two or more partiall parses.
A modest decrease in coverage results in a modest increase in accuracy.
Best threshold: 4.25
Disambiguation Component Simple rule frequency methods
known from context free parsing cannot be used directly for HPSG-like formalism, since these methods rely crucially on the statistical independence of context-free rule applications.
Solution: Maximum Entropy Models.
Stochastic Attribute Value Grammars A typically large set of features of parses are identified. They
distinguish “good” parses from “bad” parses.
Parses represented as vectors. Each cell contains the frequency of a particular feature (40,000 in Alpino).
The features encode: rule names, local trees of rule names, pairs of words and their lexical category, lexical dependencies between words, etc.
Among them a variety of more global syntactic features exists: features to recognize whether the coordinations are parallel in structure, features which recognize whether the dependency in a WH-question or a
relative clause is local or not, etc.
Stochastic Attribute Value Grammars In training, a weight is
established for each feature indicating that parses containing the corresponding feature should be preferred or not.
The parse evaluation function is the sum of the counts of the frequency of each feature times the weight of the features.
The parse with the largest sum is the best parse.
Drawback: If we train the model, we need access to all parses of a corpus sentence.
Stochastic Attribute Value Grammars It suffices to train on the basis of
representative samples of parses for each training sentence. (Osborne,2000)
Any sub-sample of the parses in the training data which yields unbiased estimates of feature expectations should result in as accurate a model as the complete set of parses.
Dependency ProblemProblem: Alpino treebank contains correct Dependency
Structures.
Dependency Structures abstract away from syntactic details.
The training data should contain the full parse as produced by the grammar.
Possible Solution: Use the grammar to parse a given sentence and then select the parse with the correct dependency structure.
However, the parser will not always be able to produce a parse with the correct dependency structure.
Dependency Problem Mapping the accuracy of a parse to the frequency of that
parse in the training data. Rather than distinguishing correct and incorrect, we determine
the “quality” of each parse: Concept Accuracy (CA)
is the number of relations produced by the parser for sentence i, is the number of relations in the treebank parse , and is the number of incorrect and missing relations produced by the parser.
Thus, if a parse has a CA of 85%, we add the parse to the training data marked with a weight of 0.85.
Parse Forest
The left-corner parser constructs all possible parses.
The Parse Forest is a tree substitution grammar, which derives exactly all derivation trees of the input sentence.
Each tree in the tree substitution grammar is a left-corner spine.
Example: “I see a man at home“
Parse Forest
Parse Forest
Parse Forest
Parse Forest
Best Parse Recovery
For each state in the search space maintain only the b best candidates, where b is a small integer (the beam).
If the beam is decreased, we run a larger risk of missing the best parse (the result will typically still be a “good” parse); if the beam is increased, then the amount of computation increases.
Beam Recover
Effect of Beam Size
Accuracy
Alpino: development set optimized. CLEF: Dutch questions from the CLEF
Questioning Answering competition (2003,2004 and 2005).
Trouw: First 1400 sentences of the Trouw 2001 newspaper, from the Twente News corpus.
References [Mal04] Robert Malouf and Gertjan van Noord.
Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP-04 workshop: beyond shallow analyses - formalisms and statistical modeling for deep analyses, Hainan Island, China, 2004.
[van06] Gertjan van Noord. At Last Parsing Is Now Operational. In Actes de la 13e conference sur le traitement automatique des langues naturelles (TALN 2006), pages 20–42, Leuven, Belgium, 2006.
Questions??