Example-Based Treebank Querying

36
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28

description

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde. CLARIN Sofia, 2012-10-28. NEDERBOOMS. Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG) - PowerPoint PPT Presentation

Transcript of Example-Based Treebank Querying

Page 1: Example-Based Treebank Querying

Example-Based Treebank Querying

Liesbeth AugustinusVincent Vandeghinste

Frank Van Eynde

CLARIN Sofia, 2012-10-28

Page 2: Example-Based Treebank Querying

NEDERBOOMS• Exploitation of Dutch treebanks for research in

linguistics

• CLARIN-VL Project• Centre for Computational Linguistics (CCL)• Dutch Grammar and Language Use (NGTG)

• Goals:– User-friendly tools– Access to large data files

Page 3: Example-Based Treebank Querying

NEDERBOOMS

How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?

Page 4: Example-Based Treebank Querying

QUERYING LASSYExisting search tools:

• dtsearch (Kloosterman 2007)

• Dact (de Kok 2010)

stand-alone tools

query language: XPath= standard query language for xml trees

Page 5: Example-Based Treebank Querying

QUERYING LASSYSome examples:

• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”

//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]

Page 6: Example-Based Treebank Querying

QUERYING LASSYSome examples:

• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”

//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]

• “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken”

//node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]

Page 7: Example-Based Treebank Querying

QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary

= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort

Page 8: Example-Based Treebank Querying

QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary

= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort

How to make interaction between computational linguistics and theoretical linguistics possible?

Page 9: Example-Based Treebank Querying

GrETELGreedy Extraction of Trees for Empirical Linguistics

• Search tool based on example sentences

• Input = natural language

• No explicit knowledge of formal query language nor Alpino grammar required

• Bridge gap between descriptive and computational linguistics

• Available online http://nederbooms.ccl.kuleuven.be (optimised for Mozilla Firefox)

Page 10: Example-Based Treebank Querying

GrETEL - online

Page 11: Example-Based Treebank Querying

GrETEL - online

Page 12: Example-Based Treebank Querying

GrETEL - input

Green versus red word order in Dutch

– green: past participle – auxiliary

De NAVO stelt dat ze er alles aan gedaan heeft

– red: auxiliary – past participleDe NAVO stelt dat ze er alles aan heeft gedaan

“The NATO claim that they have done everything in their power” (deredactie.be)

Page 13: Example-Based Treebank Querying

GrETEL - input

Page 14: Example-Based Treebank Querying

GrETEL - input

>> parsed with Alpino

Page 15: Example-Based Treebank Querying

GrETEL - annotation

Page 16: Example-Based Treebank Querying

GrETEL - annotation

>> info added to Alpino parse

Page 17: Example-Based Treebank Querying

GrETEL – query info

Page 18: Example-Based Treebank Querying

GrETEL – query infoinput example

Page 19: Example-Based Treebank Querying

GrETEL – query infoinput example

Alpino parse

Page 20: Example-Based Treebank Querying

GrETEL – query infoinput example

query treeAlpino parse

Page 21: Example-Based Treebank Querying

GrETEL – query infoinput example

query treeAlpino parse

XPath query

Page 22: Example-Based Treebank Querying

GrETEL – query infoinput example

query treeAlpino parse

XPath query

treebanks

Page 23: Example-Based Treebank Querying

GrETEL – query tree

Page 24: Example-Based Treebank Querying

GrETEL – query tree

>> subtree extraction

Page 25: Example-Based Treebank Querying

GrETEL – query tree

query tree

Page 26: Example-Based Treebank Querying

GrETEL – query tree

query tree >> XPath generator >>

Page 27: Example-Based Treebank Querying

GrETEL – query tree

query tree

//node[@cat="ssub" andnode[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]

XPath expression>> XPath generator >>

Page 28: Example-Based Treebank Querying

GrETEL – treebanks

Page 29: Example-Based Treebank Querying

GrETEL – treebanks

LASSY Small in PostgreSQL database

>> no local installation required

Page 30: Example-Based Treebank Querying

GrETEL – results

Page 31: Example-Based Treebank Querying

GrETEL – results

>> (adapted) query

Page 32: Example-Based Treebank Querying

GrETEL – results

>> (adapted) query

>> quantitative

information

Page 33: Example-Based Treebank Querying

GrETEL – results

Page 34: Example-Based Treebank Querying

GrETEL – results

>> tree viewer

Page 35: Example-Based Treebank Querying

GrETEL – results

>> tree viewer

>> list of results

Page 36: Example-Based Treebank Querying

Thanks for your attention!

Questions?

[email protected]

http://nederbooms.ccl.kuleuven.be