HPSG Alpino System

HPSG Alpino System

Universität des SaarlandesSeminar: Recent Advances in Parsing TechnologyWinter Semester 2011-2012

Jesús Calvillo

Outline Introduction Overview Part of Speech Tagging

Lexical Ambiguity HMM Tagger Tagger Training Results

Disambiguation Component Parsing

Recovery of Best Parse Accuracy References

Introduction

What is Alpino? Computational Analyzer for Dutch. Exploits Knowledge-based (HPSG-

grammar and -lexicon) and Corpus-based Technologies.

Aims at accurate, full parsing of unrestricted text, with coverage and accuracy comparable to state-of-the-art parsers for English.

Introduction Grammar

Wide Coverage Computational HPSG. About 600 construction specific rules. Rather than

general rule schemata and abstract linguistic principles.

Lexicon About 100,000 entries and 200,000 named

entities. Lexical rules for dates, temporal expressions, etc. Large variety of unknown word heuristics. Morphological constructor.

Overview

POS Tagging Lexical ambiguity has an important negative

effect on parsing efficiency.

In some cases, a category assigned is obviously wrong. I called the man up I called the man

Application of hand-written rules relies on human experts and is bound to have mistakes.

POS Tagging Training corpus used by the tagger is

labeled by the parser itself (unsupervised learning).

Not forced to disambiguate all words. It only removes about half of the tags assigned by the dictionary.

Resulting System can be much faster, while parsing accuracy actually increases slightly.

HMM Tagger Variant of a standard trigram HMM tagger To Discard tags: Compute probabilities for each tag

individually:

α and β are the forward and backward probabilities as defined:

is the total probability of all paths through the model that end at tag t at position i;

is the total probability of all paths starting at tag t in position i, to the end.

HMM Tagger

After calculating all the probabilities for all the potential tags...

A tag t on position i is removed if there is another t´, such that:

is a constant threshold value.

Training the Tagger Training Corpus constructed by the parser.

Running the parser on a large set of example sentences, and collecting the sequences of lexical category classes that were used by what the parser believed to be the best parse.

Contains Errors. It does not learn the “correct” lexical category sequences, but rather which sequences are favored by the parser.

Corpus: 4 years of Dutch daily newspaper text. Using only “easy” sentences (sentences <20 words or sentences that take <20 secs of CPU time)

Experimental Results Applied to the first 220 sentences of

the Alpino Treebank. 4 Sentences were removed.

Low threshold -> small number of tags -> fast parsing

High threshold -> higher accuracy -> decrease efficiency.

If all lexical categories for a given sentence are allowed, then the parser can can almost always find a single (but sometimes bad) parse.

If the parser is limited to the more plausible lexical categories, it will more often come up with a robust parse containing two or more partiall parses.

A modest decrease in coverage results in a modest increase in accuracy.

Best threshold: 4.25

Disambiguation Component Simple rule frequency methods

known from context free parsing cannot be used directly for HPSG-like formalism, since these methods rely crucially on the statistical independence of context-free rule applications.

Solution: Maximum Entropy Models.

Stochastic Attribute Value Grammars A typically large set of features of parses are identified. They

distinguish “good” parses from “bad” parses.

Parses represented as vectors. Each cell contains the frequency of a particular feature (40,000 in Alpino).

The features encode: rule names, local trees of rule names, pairs of words and their lexical category, lexical dependencies between words, etc.

Among them a variety of more global syntactic features exists: features to recognize whether the coordinations are parallel in structure, features which recognize whether the dependency in a WH-question or a

relative clause is local or not, etc.

Stochastic Attribute Value Grammars In training, a weight is

established for each feature indicating that parses containing the corresponding feature should be preferred or not.

The parse evaluation function is the sum of the counts of the frequency of each feature times the weight of the features.

The parse with the largest sum is the best parse.

Drawback: If we train the model, we need access to all parses of a corpus sentence.

Stochastic Attribute Value Grammars It suffices to train on the basis of

representative samples of parses for each training sentence. (Osborne,2000)

Any sub-sample of the parses in the training data which yields unbiased estimates of feature expectations should result in as accurate a model as the complete set of parses.

Dependency ProblemProblem: Alpino treebank contains correct Dependency

Structures.

Dependency Structures abstract away from syntactic details.

The training data should contain the full parse as produced by the grammar.

Possible Solution: Use the grammar to parse a given sentence and then select the parse with the correct dependency structure.

However, the parser will not always be able to produce a parse with the correct dependency structure.

Dependency Problem Mapping the accuracy of a parse to the frequency of that

parse in the training data. Rather than distinguishing correct and incorrect, we determine

the “quality” of each parse: Concept Accuracy (CA)

is the number of relations produced by the parser for sentence i, is the number of relations in the treebank parse , and is the number of incorrect and missing relations produced by the parser.

Thus, if a parse has a CA of 85%, we add the parse to the training data marked with a weight of 0.85.

Parse Forest

The left-corner parser constructs all possible parses.

The Parse Forest is a tree substitution grammar, which derives exactly all derivation trees of the input sentence.

Each tree in the tree substitution grammar is a left-corner spine.

Example: “I see a man at home“

Parse Forest

Best Parse Recovery

For each state in the search space maintain only the b best candidates, where b is a small integer (the beam).

If the beam is decreased, we run a larger risk of missing the best parse (the result will typically still be a “good” parse); if the beam is increased, then the amount of computation increases.

Beam Recover

Effect of Beam Size

Accuracy

Alpino: development set optimized. CLEF: Dutch questions from the CLEF

Questioning Answering competition (2003,2004 and 2005).

Trouw: First 1400 sentences of the Trouw 2001 newspaper, from the Twente News corpus.

References [Mal04] Robert Malouf and Gertjan van Noord.

Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP-04 workshop: beyond shallow analyses - formalisms and statistical modeling for deep analyses, Hainan Island, China, 2004.

[van06] Gertjan van Noord. At Last Parsing Is Now Operational. In Actes de la 13e conference sur le traitement automatique des langues naturelles (TALN 2006), pages 20–42, Leuven, Belgium, 2006.

Questions??

HPSG Alpino System

Documents

Transcript of HPSG Alpino System