Example-Based Treebank Querying

Post on 29-Jan-2016

28 views 0 download

Tags:

description

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde. CLARIN Sofia, 2012-10-28. NEDERBOOMS. Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG) - PowerPoint PPT Presentation

Transcript of Example-Based Treebank Querying

Example-Based Treebank Querying

Liesbeth AugustinusVincent Vandeghinste

Frank Van Eynde

CLARIN Sofia, 2012-10-28

NEDERBOOMS• Exploitation of Dutch treebanks for research in

linguistics

• CLARIN-VL Project• Centre for Computational Linguistics (CCL)• Dutch Grammar and Language Use (NGTG)

• Goals:– User-friendly tools– Access to large data files

NEDERBOOMS

How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?

QUERYING LASSYExisting search tools:

• dtsearch (Kloosterman 2007)

• Dact (de Kok 2010)

stand-alone tools

query language: XPath= standard query language for xml trees

QUERYING LASSYSome examples:

• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”

//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]

QUERYING LASSYSome examples:

• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”

//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]

• “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken”

//node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]

QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary

= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort

QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary

= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort

How to make interaction between computational linguistics and theoretical linguistics possible?

GrETELGreedy Extraction of Trees for Empirical Linguistics

• Search tool based on example sentences

• Input = natural language

• No explicit knowledge of formal query language nor Alpino grammar required

• Bridge gap between descriptive and computational linguistics

• Available online http://nederbooms.ccl.kuleuven.be (optimised for Mozilla Firefox)

GrETEL - online

GrETEL - online

GrETEL - input

Green versus red word order in Dutch

– green: past participle – auxiliary

De NAVO stelt dat ze er alles aan gedaan heeft

– red: auxiliary – past participleDe NAVO stelt dat ze er alles aan heeft gedaan

“The NATO claim that they have done everything in their power” (deredactie.be)

GrETEL - input

GrETEL - input

>> parsed with Alpino

GrETEL - annotation

GrETEL - annotation

>> info added to Alpino parse

GrETEL – query info

GrETEL – query infoinput example

GrETEL – query infoinput example

Alpino parse

GrETEL – query infoinput example

query treeAlpino parse

GrETEL – query infoinput example

query treeAlpino parse

XPath query

GrETEL – query infoinput example

query treeAlpino parse

XPath query

treebanks

GrETEL – query tree

GrETEL – query tree

>> subtree extraction

GrETEL – query tree

query tree

GrETEL – query tree

query tree >> XPath generator >>

GrETEL – query tree

query tree

//node[@cat="ssub" andnode[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]

XPath expression>> XPath generator >>

GrETEL – treebanks

GrETEL – treebanks

LASSY Small in PostgreSQL database

>> no local installation required

GrETEL – results

GrETEL – results

>> (adapted) query

GrETEL – results

>> (adapted) query

>> quantitative

information

GrETEL – results

GrETEL – results

>> tree viewer

GrETEL – results

>> tree viewer

>> list of results

Thanks for your attention!

Questions?

liesbeth@ccl.kuleuven.be

http://nederbooms.ccl.kuleuven.be