Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes...
-
Upload
gwendoline-booker -
Category
Documents
-
view
219 -
download
2
Transcript of Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes...
![Page 2: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/2.jpg)
Why Deep?
Is Shallow Processing Enough? For TREC-like QA evaluation
(in most cases) YESYES However, for restricted domain QA
More complicated questions Less information redundancy for data
intensive approach Domain knowledge available
![Page 3: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/3.jpg)
Deep Processing Provides
More fine-grained linguistic analysis Long distance dependency Agreements …
Semantic Representation MRS/RMRS
![Page 4: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/4.jpg)
General Problems with Deep Processing
Robustness Lexicon Compound NP
Specificity “John saw Mary”
Efficiency (not discussed here)
![Page 5: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/5.jpg)
Deep Processing MRS/RMRS
(Robust) Semantic representation with underspecification.
HPSG Grammars LinGO ERG Grammar Other grammars (German, Japanese, Modern Greek,
Norwegian, Chinese, …) HoG
Hybrid shallow & deep processing architecture with uniformed semantic representation (RMRS).
![Page 6: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/6.jpg)
QA in QUETAL (1)
Hybrid shallow & deep approach Cross-lingual QA QA on
Texts Semi-structured documents Database
![Page 7: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/7.jpg)
QA in QUETAL (2)
NLQIR SchemaSyntax Ana.
•Dependency Parser•TAG for En/De Q.
Seman Ana.•Seman Q. Ana.•Q-type •A-type•Q-focus
Ans. Planning& GenerationGetData
IR Query Planner
Info Source
Texts IE Fact DB
Result Merge
![Page 8: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/8.jpg)
QA in QUETAL (3)
Deep processing in QUETAL HPSG grammar used for question
analysis. Documents are processed with relatively
shallow methods. Answer matching with RMRS.
![Page 9: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/9.jpg)
Restricted Domain QA
More complicated questions Less documents with better quality Domain specific ontology available
![Page 10: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/10.jpg)
Restricted Domain QA – an Example
Shanghai City Planning Exhibition HallShanghai City Planning Exhibition Hall [LOC_1][LOC_1] is is located located toto the east the east of the of the City HallCity Hall [LOC_2][LOC_2], …, settin, …, setting off g off with the crystal-like with the crystal-like GrandGrand TheatreTheatre[LOC_3][LOC_3] to thto thee west west. .
Where is the Where is the City HallCity Hall of Shanghai? of Shanghai?
Between Between Shanghai City Planning Shanghai City Planning Exhibition HallExhibition Hall and the and the Grand Grand TheatreTheatre. . Domain Onto.Domain Onto.
![Page 11: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/11.jpg)
Open Topics
Grammar extension & automated lexicon acquisition
Robust deep processing Semantic answer matching Cross-lingual
![Page 12: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/12.jpg)
Grammar Extension
Tourism Domain ERG extended for
“RONDANE” -- Norway mountain area tourism 1.4K sentences 15 word/sentence coverage > 74%
Shanghai tourist guide from http://www.shanghai.gov.cn
1,600 sentences 18 word/sentence
![Page 13: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/13.jpg)
Test on RONDANE corpus
![Page 14: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/14.jpg)
Test on RONDANE Corpus
![Page 15: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/15.jpg)
Grammar Extension ERG lexicon
It is relatively easier to automated the lexicon acquisition for nouns
Lexicon Entry #
Top 10 Leaf Types Lexicon
Coverage
Verb 2891 77%
Noun 6873 96%
Adj. 2505 90%
![Page 16: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/16.jpg)
Automated Lexicon Acquisition
POS tagging Name entity recognition Statistical models finding the best
lexical type for unknown noun.
![Page 17: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/17.jpg)
Robust Deep Processing
Back-off to RMRS generated with intermediate or shallow parsers (HoG architecture).
Keep non-full parsing charts and corresponding MRS fragments for semantic answer matching.
![Page 18: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/18.jpg)
Parse Disambiguation Select the best parse with statistical models
(Toutanova et al. 2002)
![Page 19: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/19.jpg)
Answer Matching with (R)MRS Semantic answer matching
Create semantic patterns for each question type. where -> locate_v(e, x1, x2)
Semantic distance measurement. pred1(x)&pred2(x) <-> pred1(x)&pred2(y)
Query expansion Synonym substitution Semantic structure replacement
give_v(e1, x1, x2, x3) => receive_v(e2, x2, x1, x3)
![Page 20: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/20.jpg)
Work Plan
Narrow down my focus onto one of the topics above.
Continue the Chinese HPSG grammar development.
![Page 21: Deep Processing for Restricted Domain QA Yi Zhang Universit ä t des Saarlandes yzhang@coli.uni-sb.de.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee15503460f94bf17be/html5/thumbnails/21.jpg)
References Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim and Stephan Oepen (to appear) Road-testing the English Resour
ce Grammar over the British National Corpus, In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal.
Ulrich Callmeier. 2002. PET – a platform for experimentation with efficient HPSG processing techniques. In Collaborative Language Engineering. CSLI Publications, Stanford, USA.
Hans Uszkoreit. 2002. New chances for deep linguistic processing. In Proc. of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan.
Ann Copestake, Dan Flickinger, Ivan A. Sag, and Carl Pollard. 2003. Minimal recursion semantics: An introduction. Under review.
Timothy Baldwin and Francis Bond. 2003. Learning the countability of English nouns from corpus data. In Proc. of the 41st Annual Meeting of the ACL, pages 463–70, Sapporo, Japan.
Carol, J. and Fang, A. Automatic Acquisition of Verb Subcategorisations and their Impact on the Performance of an HPSG Parser. IJCNLP 2004
Oepen, Stephan, Dan Flickinger, Kristina Toutanova, Christoper D. Manning. 2002. LinGO Redwoods: A Rich and Dynamic Treebank for HPSG In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.
Toutanova, Kristina, Christoper D. Manning, Stephan Oepen. 2002. Parse Ranking for a Rich HPSG Grammar In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.
Stephan Oepen. [incr tsdb()] - Competence and Performance Laboratory. User Manual.Technical Report. Computational Linguistics. Saarland University (in preparation).
Robert Malouf and Gertjan van Noord. 2004. "Wide coverage parsing with stochastic attribute value grammars." In IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.
Toutanova, Kristina, Christopher D. Manning, Stuart M. Shieber, Dan Flickinger, and Stephan Oepen. 2002. Parse Disambiguation for a Rich HPSG Grammar. First Workshop on Treebanks and Linguistic Theories (TLT2002), pp. 253-263. Sozopol, Bulgaria.