Example-Based Treebank Querying
description
Transcript of Example-Based Treebank Querying
Example-Based Treebank Querying
Liesbeth AugustinusVincent Vandeghinste
Frank Van Eynde
CLARIN Sofia, 2012-10-28
NEDERBOOMS• Exploitation of Dutch treebanks for research in
linguistics
• CLARIN-VL Project• Centre for Computational Linguistics (CCL)• Dutch Grammar and Language Use (NGTG)
• Goals:– User-friendly tools– Access to large data files
NEDERBOOMS
How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?
QUERYING LASSYExisting search tools:
• dtsearch (Kloosterman 2007)
• Dact (de Kok 2010)
stand-alone tools
query language: XPath= standard query language for xml trees
QUERYING LASSYSome examples:
• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”
//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
QUERYING LASSYSome examples:
• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”
//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
• “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken”
//node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]
QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary
= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort
QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary
= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort
How to make interaction between computational linguistics and theoretical linguistics possible?
GrETELGreedy Extraction of Trees for Empirical Linguistics
• Search tool based on example sentences
• Input = natural language
• No explicit knowledge of formal query language nor Alpino grammar required
• Bridge gap between descriptive and computational linguistics
• Available online http://nederbooms.ccl.kuleuven.be (optimised for Mozilla Firefox)
GrETEL - online
GrETEL - online
GrETEL - input
Green versus red word order in Dutch
– green: past participle – auxiliary
De NAVO stelt dat ze er alles aan gedaan heeft
– red: auxiliary – past participleDe NAVO stelt dat ze er alles aan heeft gedaan
“The NATO claim that they have done everything in their power” (deredactie.be)
GrETEL - input
GrETEL - input
>> parsed with Alpino
GrETEL - annotation
GrETEL - annotation
>> info added to Alpino parse
GrETEL – query info
GrETEL – query infoinput example
GrETEL – query infoinput example
Alpino parse
GrETEL – query infoinput example
query treeAlpino parse
GrETEL – query infoinput example
query treeAlpino parse
XPath query
GrETEL – query infoinput example
query treeAlpino parse
XPath query
treebanks
GrETEL – query tree
GrETEL – query tree
>> subtree extraction
GrETEL – query tree
query tree
GrETEL – query tree
query tree >> XPath generator >>
GrETEL – query tree
query tree
//node[@cat="ssub" andnode[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]
XPath expression>> XPath generator >>
GrETEL – treebanks
GrETEL – treebanks
LASSY Small in PostgreSQL database
>> no local installation required
GrETEL – results
GrETEL – results
>> (adapted) query
GrETEL – results
>> (adapted) query
>> quantitative
information
GrETEL – results
GrETEL – results
>> tree viewer
GrETEL – results
>> tree viewer
>> list of results