CompSci 101 Introduction to Computer Science January 13, 2015 Prof. Rodger compsci 101 spring 20151.
1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
-
Upload
ali-graddick -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.
![Page 1: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/1.jpg)
1
Natural Language ProcessingCOMPSCI 423/723
Rohit Kate
![Page 2: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/2.jpg)
2
Semantic Parsing
Some of the slides have been adapted from the ACL 2010 tutorial by Rohit Kate and Yuk Wah Wong, and from Raymond Mooney’s NLP course at UT Austin.
![Page 3: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/3.jpg)
Introduction to the Semantic Parsing Task
![Page 4: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/4.jpg)
4
Semantic Parsing
• “Semantic Parsing” is, ironically, a semantically ambiguous term– Semantic role labeling (find agent, patient etc. for
a verb)– Finding generic relations in text (find part-whole,
member-of relations)– Transforming a natural language sentence into its
meaning representation
![Page 5: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/5.jpg)
5
Semantic Parsing
• Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specific applications
• Realistic semantic parsing currently entails domain dependence
• Example application domains– ATIS: Air Travel Information Service– CLang: Robocup Coach Language – Geoquery: A Database Query Application
![Page 6: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/6.jpg)
6
• Interface to an air travel database [Price, 1990]
• Widely-used benchmark for spoken language understanding
ATIS: Air Travel Information Service
Air-TransportationShow: (Flight-Number)
Origin: (City "Cleveland")
Destination: (City "Dallas")
May I see all the flights from Cleveland to
Dallas? NA 1439, TQ 23, …
Semantic Parsing Query
![Page 7: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/7.jpg)
7
CLang: RoboCup Coach Language
• In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org]
• The coaching instructions are given in a computer language called CLang [Chen et al. 2003]
Simulated soccer field
CLang
If the ball is in our goal area then player 1 should
intercept it.
(bpos (goal-area our) (do our {1} intercept))
Semantic Parsing
![Page 8: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/8.jpg)
8
Geoquery: A Database Query Application
• Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996]
Which rivers run through the states bordering Texas?
Queryanswer(traverse(next_to(stateid(‘texas’))))
Semantic Parsing
Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande …
Answer
![Page 9: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/9.jpg)
9
What is the meaning of “meaning”?
• Representing the meaning of natural language is ultimately a difficult philosophical question
• Many attempts have been made to define generic formal semantics of natural language – Can they really be complete?– What can they do for us computationally?– Not so useful if the meaning of Life is defined as Life’
• Our meaning representation for semantic parsing does something useful for an application
• Procedural Semantics: The meaning of a sentence is a formal representation of a procedure that performs some action that is an appropriate response– Answering questions– Following commands
![Page 10: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/10.jpg)
10
Meaning Representation Languages
• Meaning representation language (MRL) for an application is assumed to be present
• MRL is designed by the creators of the application to suit the application’s needs independent of natural language
• CLang was designed by RoboCup community to send formal coaching instructions to simulated players
• Geoquery’s MRL was based on the Prolog database
• MRL is unambiguous by design
![Page 11: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/11.jpg)
11
MR: answer(traverse(next_to(stateid(‘texas’))))Unambiguous parse tree of MR:
Productions: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) STATE NEXT_TO(STATE) TRAVERSE traverse NEXT_TO next_to STATEID ‘texas’
ANSWER
answer
STATE
RIVER
STATE
NEXT_TO
TRAVERSE
STATEID
stateid ‘texas’
next_to
traverse
Meaning Representation Languages
![Page 12: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/12.jpg)
12
Engineering Motivation for Semantic Parsing
• Applications of domain-dependent semantic parsing– Natural language interfaces to computing systems– Communication with robots in natural language– Personalized software assistants– Question-answering systems
• Machine learning makes developing semantic parsers for specific applications more tractable
• Training corpora can be easily developed by tagging natural-language glosses with formal statements
![Page 13: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/13.jpg)
13
Cognitive Science Motivation for Semantic Parsing
• Most natural-language learning methods require supervised training data that is not available to a child– No POS-tagged or treebank data
• Assuming a child can infer the likely meaning of an utterance from context, NL-MR pairs are more cognitively plausible training data
![Page 14: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/14.jpg)
14
Distinctions from Other NLP Tasks: Deeper Semantic Analysis
• Information extraction involves shallow semantic analysis
Show the long email Alice sent me yesterday
Sender Sent-to Type Time
Alice Me Long 7/10/2010
![Page 15: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/15.jpg)
15
Distinctions from Other NLP Tasks: Deeper Semantic Analysis
• Semantic role labeling also involves shallow semantic analysis
sender recipient
theme
Show the long email Alice sent me yesterday
![Page 16: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/16.jpg)
16
Distinctions from Other NLP Tasks: Deeper Semantic Analysis
• Semantic parsing involves deeper semantic analysis to understand the whole sentence for some application
Show the long email Alice sent me yesterday
Semantic Parsing
![Page 17: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/17.jpg)
17
Distinctions from Other NLP Tasks: Final Representation
• Part-of-speech tagging, syntactic parsing, SRL etc. generate some intermediate linguistic representation, typically for latter processing; in contrast, semantic parsing generates a final representation
Show the long email Alice sent me yesterday
determiner adjective noun noun verb pronoun nounverb
verb phrasenoun phrasenoun phraseverb phrase
sentence
noun phrase
sentence
![Page 18: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/18.jpg)
18
Distinctions from Other NLP Tasks: Computer Readable Output
• The output of some NLP tasks, like question-answering, summarization and machine translation, are in NL and meant for humans to read
• Since humans are intelligent, there is some room for incomplete, ungrammatical or incorrect output in these tasks; credit is given for partially correct output
• In contrast, the output of semantic parsing is in formal language and is meant for computers to read; it is critical to get the exact output, strict evaluation with no partial credit
![Page 19: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/19.jpg)
19
Distinctions from Other NLP Tasks
• Shallow semantic processing – Information extraction– Semantic role labeling
• Intermediate linguistic representations– Part-of-speech tagging– Syntactic parsing– Semantic role labeling
• Output meant for humans– Question answering– Summarization– Machine translation
![Page 20: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/20.jpg)
20
Relations to Other NLP Tasks:Word Sense Disambiguation
• Semantic parsing includes performing word sense disambiguation
Which rivers run through the states bordering Mississippi?
answer(traverse(next_to(stateid(‘mississippi’))))
Semantic Parsing
State? River?State
![Page 21: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/21.jpg)
21
Relations to Other NLP Tasks:Syntactic Parsing
• Semantic parsing inherently includes syntactic parsing but as dictated by the semantics
our player 2 has
the ball
our player(_,_) 2 bowner(_)
null null
null
bowner(_) player(our,2)
bowner(player(our,2))
MR: bowner(player(our,2))
A semantic derivation:
![Page 22: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/22.jpg)
22
Relations to Other NLP Tasks:Syntactic Parsing
• Semantic parsing inherently includes syntactic parsing but as dictated by the semantics
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
null null
null
VP-bowner(_)NP-player(our,2)
S-bowner(player(our,2))
MR: bowner(player(our,2))
A semantic derivation:
![Page 23: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/23.jpg)
23
Relations to Other NLP Tasks:Machine Translation
• The MR could be looked upon as another NL [Papineni et al., 1997; Wong & Mooney, 2006]
Which rivers run through the states bordering Mississippi?
answer(traverse(next_to(stateid(‘mississippi’))))
![Page 24: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/24.jpg)
24
Relations to Other NLP Tasks:Natural Language Generation
• Reversing a semantic parsing system becomes a natural language generation system [Jacobs, 1985; Wong & Mooney, 2007a]
Which rivers run through the states bordering Mississippi?
answer(traverse(next_to(stateid(‘mississippi’))))
Semantic Parsing NL Generation
![Page 25: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/25.jpg)
25
Relations to Other NLP Tasks
• Tasks being performed within semantic parsing– Word sense disambiguation– Syntactic parsing as dictated by semantics
• Tasks closely related to semantic parsing– Machine translation– Natural language generation
![Page 26: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/26.jpg)
26
ReferencesChen et al. (2003) Users manual: RoboCup soccer server manual for soccer server
version 7.07 and later. Available at http://sourceforge.net/projects/sserver/
P. Jacobs (1985). PHRED: A generator for natural language interfaces. Comp. Ling., 11(4):219-242.
K. Papineni, S. Roukos, T. Ward (1997). Feature-based language understanding. In Proc. of EuroSpeech, pp. 1435-1438. Rhodes, Greece.
P. Price (1990). Evaluation of spoken language systems: The ATIS domain. In Proc. of the Third DARPA Speech and Natural Language Workshop, pp. 91-95.
Y. W. Wong, R. Mooney (2006). Learning for semantic parsing with statistical machine translation. In Proc. of HLT-NAACL, pp. 439-446. New York, NY.
Y. W. Wong, R. Mooney (2007a). Generation by inverting a semantic parser that uses statistical machine translation. In Proc. of NAACL-HLT, pp. 172-179. Rochester, NY.
J. Zelle, R. Mooney (1996). Learning to parse database queries using inductive logic programming. In Proc. of AAAI, pp. 1050-1055. Portland, OR.
![Page 27: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/27.jpg)
27
An Early Hand-Built System: Lunar (Woods et al., 1972)
• English as the query language for a 13,000-entry lunar geology database built after the Apollo program
• Hand-built system to do syntactic analysis and then semantic interpretation
• A non-standard logic as a formal language• System contains:
– Grammar for a subset of English– Semantic interpretation rules– Dictionary of 3,500 words
![Page 28: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/28.jpg)
28
Lunar (Woods et al., 1972)
How many breccias contain olivine(FOR THE X12 /
(SEQL(NUMBER X12 / (SEQ TYPECS) :
(CONTAIN X12 (NPR* X14 / (QUOTE OLIV))(QUOTE NIL)))) : T ;
(PRINTOUT X12)) (5)
What are they(FOR EVERY X12 / (SEQ TYPECS) :
(CONTAIN X12 (NPR* X14 / (QUOTE OLIV))(QUOTE NIL)) ; (PRINTOUT X12))
S10019, S10059, S10065, S10067, S10073
![Page 29: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/29.jpg)
29
References
J. Dowding, R. Moore, F. Andry, D. Moran (1994). Interleaving syntax and semantics in an efficient bottom-up parser. In Proc. of ACL, pp. 110-116. Las Cruces, NM.
S. Seneff (1992). TINA: A natural language system for spoken language applications. Comp. Ling., 18(1):61-86.
W. Ward, S. Issar (1996). Recent improvement in the CMU spoken language understanding system. In Proc. of the ARPA HLT Workshop, pp. 213-216.
D. Warren, F. Pereira (1982). An efficient easily adaptable system for interpreting natural language queries. American Journal of CL, 8(3-4):110-122.
W. Woods, R. Kaplan, B. Nash-Webber (1972). The lunar sciences natural language information system: Final report. Tech. Rep. 2378, BBN Inc., Cambridge, MA.
![Page 30: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/30.jpg)
30
Learning for Semantic Parsing: Motivations
• Manually programming robust semantic parsers is difficult
• It is easier to develop training corpora by associating natural-language sentences with meaning representations
• The increasing availability of training corpora, and the decreasing cost of computation, relative to engineering cost, favor the learning approach
![Page 31: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/31.jpg)
31
Learning Semantic Parsers
Meaning
RepresentationNovel sentence
Training Sentences &
Meaning Representations
Semantic Parser Learner
Semantic Parser
![Page 32: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/32.jpg)
32
Learning Semantic Parsers
Semantic Parser Learner
Semantic Parser
Which rivers run through the states that are next to Texas?
answer(traverse(next_to(stateid(‘texas’))))
What is the lowest point of the state with the largest area?
answer(lowest(place(loc(largest_one(area(state(all)))))))
What is the largest city in states that border California?
answer(largest(city(loc(next_to(stateid( 'california'))))))
……
Which rivers run through the states
bordering Mississippi?answer(traverse(next_to(
stateid(‘mississippi’))))
Training Sentences &
Meaning Representations
Novel sentence Meaning
Representation
![Page 33: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/33.jpg)
33
Outline: Recent Semantic Parser Learners
• Wong & Mooney (2006, 2007a, 2007b)– Syntax-based machine translation methods
• Zettlemoyer & Collins (2005, 2007)– Structured learning with combinatory categorial
grammars (CCG)
• Kate & Mooney (2006), Kate (2008a, 2008b)– SVM with kernels for robust semantic parsing
• Lu et al. (2008) – A generative model for semantic parsing
• Ge & Mooney (2005, 2009)– Exploiting syntax for semantic parsing
![Page 34: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/34.jpg)
Semantic Parsing using Machine Translation Techniques
![Page 35: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/35.jpg)
35
WASP: A Machine Translation Approach to Semantic Parsing
• Based on a semantic grammar of the natural language
• Uses machine translation techniques– Synchronous context-free grammars (SCFG)– Word alignments (Brown et al., 1993)
Wong & Mooney (2006)
![Page 36: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/36.jpg)
36
Synchronous Context-Free Grammar
• Developed by Aho & Ullman (1972) as a theory of compilers that combines syntax analysis and code generation in one phase– Formal to formal languages
• Used for syntax-based machine translation (Wu, 1997; Chiang 2005)– Natural to natural languages
• Generates a pair of strings in a derivation
![Page 37: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/37.jpg)
37
QUERY What is CITY
CITY the capital CITY
CITY of STATE
STATE Ohio
Context-Free Semantic Grammar
Ohio
of STATE
QUERY
CITYWhat is
CITYthe capital
![Page 38: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/38.jpg)
38
QUERY What is CITY
Production of Context-Free Grammar
![Page 39: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/39.jpg)
39
QUERY What is CITY / answer(CITY)
Production of Synchronous Context-Free Grammar
Natural language Formal language
![Page 40: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/40.jpg)
40
STATE Ohio / stateid('ohio')QUERY What is CITY / answer(CITY)CITY the capital CITY / capital(CITY)CITY of STATE / loc_2(STATE)
What is the capital of Ohio
Synchronous Context-Free Grammar Derivation
Ohio
of STATE
QUERY
CITYWhat is
QUERY
answer ( CITY )
capital ( CITY )
loc_2 ( STATE )
stateid ( 'ohio' )
answer(capital(loc_2(stateid('ohio'))))
CITYthe capital
![Page 41: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/41.jpg)
41
Probabilistic SCFG for Semantic Parsing
• S (start symbol) = QUERY
• L (lexicon) =
• w (feature weights)
• Features:– fi(x, d): Number of times production i is used in derivation d
• Log-linear model: Pw(d | x) exp(w . f(x, d))• Best derviation: d* = argmaxd w . f(x, d)
STATE Ohio / stateid('ohio')
QUERY What is CITY / answer(CITY)
CITY the capital CITY / capital(CITY)
CITY of STATE / loc_2(STATE)
![Page 42: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/42.jpg)
4242
Probabilistic Parsing Model
Ohio
of STATE
CITY CITY
capital ( CITY )
loc_2 ( STATE )
stateid ( 'ohio' )
capital CITY
STATE Ohio / stateid('ohio')
CITY capital CITY / capital(CITY)
CITY of STATE / loc_2(STATE)
d1
![Page 43: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/43.jpg)
4343
Probabilistic Parsing Model
Ohio
of RIVER
CITY CITY
capital ( CITY )
loc_2 ( RIVER )
riverid ( 'ohio' )
capital CITY
RIVER Ohio / riverid('ohio')
CITY capital CITY / capital(CITY)
CITY of RIVER / loc_2(RIVER)
d2
![Page 44: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/44.jpg)
4444
CITY
capital ( CITY )
loc_2 ( STATE )
stateid ( 'ohio' )
Probabilistic Parsing Model
CITY
capital ( CITY )
loc_2 ( RIVER )
riverid ( 'ohio' )
STATE Ohio / stateid('ohio')
CITY capital CITY / capital(CITY)
CITY of STATE / loc_2(STATE)
RIVER Ohio / riverid('ohio')
CITY capital CITY / capital(CITY)
CITY of RIVER / loc_2(RIVER)
0.5
0.3
0.5
0.5
0.05
0.5
λ λ
1.3 1.05
+ +
Pr(d1|capital of Ohio) = exp( ) / Z Pr(d2|capital of Ohio) = exp( ) / Z
d1 d2
normalization constant
![Page 45: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/45.jpg)
4545
Overview of WASP
Lexical acquisition
Parameter estimation
Semantic parsing
Unambiguous CFG of MRL
Training set, {(e,f)}Lexicon, L
Parsing model parameterized by λ
Input sentence, e' Output MR, f'
Training
Testing
![Page 46: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/46.jpg)
4646
Lexical Acquisition
• SCFG productions are extracted from word alignments between an NL sentence, e, and its correct MR, f, for each training example, (e, f)
![Page 47: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/47.jpg)
4747
Word Alignments
• A mapping from French words to their meanings expressed in English
And the program has been implemented
Le programme a été mis en application
![Page 48: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/48.jpg)
4848
Lexical Acquisition
• Train a statistical word alignment model (IBM Model 5) on training set
• Obtain most probable n-to-1 word alignments for each training example
• Extract SCFG productions from these word alignments
• Lexicon L consists of all extracted SCFG productions
![Page 49: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/49.jpg)
49
Lexical Acquisition
• SCFG productions are extracted from word alignments between training sentences and their meaning representations
![Page 50: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/50.jpg)
5050
Use of MRL Grammar
The
goalie
should
always
stay
in
our
half
RULE (CONDITION DIRECTIVE)
CONDITION (true)
DIRECTIVE (do TEAM {UNUM} ACTION)
TEAM our
UNUM 1
ACTION (pos REGION)
REGION (half TEAM)
TEAM our
top-down, left-most derivation of an un-ambiguous CFG
n-to-1
![Page 51: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/51.jpg)
5151
TEAM
Extracting SCFG Productions
The
goalie
should
always
stay
in
our
half
RULE (CONDITION DIRECTIVE)
CONDITION (true)
DIRECTIVE (do TEAM {UNUM} ACTION)
TEAM our
UNUM 1
ACTION (pos REGION)
REGION (half TEAM)
TEAM our
TEAM our / our
![Page 52: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/52.jpg)
5252
REGIONTEAM
REGION TEAM half / (half TEAM)
The
goalie
should
always
stay
in
half
RULE (CONDITION DIRECTIVE)
CONDITION (true)
DIRECTIVE (do TEAM {UNUM} ACTION)
TEAM our
UNUM 1
ACTION (pos REGION)
REGION (half TEAM)
TEAM our
REGION (half our)
Extracting SCFG Productions
![Page 53: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/53.jpg)
5353
ACTION
ACTION (pos (half our))
REGION
ACTION stay in REGION / (pos REGION)
The
goalie
should
always
stay
in
RULE (CONDITION DIRECTIVE)
CONDITION (true)
DIRECTIVE (do TEAM {UNUM} ACTION)
TEAM our
UNUM 1
ACTION (pos REGION)
REGION (half our)
Extracting SCFG Productions
![Page 54: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/54.jpg)
54
Output SCFG Productions
TEAM our / our
REGION TEAM half / (half TEAM)
ACTION stay in REGION / (pos REGION)
UNUM goalie / 1
RULE [the] UNUM should always ACTION / ((true) (do our {UNUM} ACTION))
• Phrases can be non-contiguous
![Page 55: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/55.jpg)
5555
• Based on maximum-entropy model:
• Features fi (d) are number of times each transformation rule is used in a derivation d
• Output translation is the yield of most probable derivation
i
ii fZ
)(exp)(
1)|(Pr d
eed
Probabilistic Parsing Model
)|(Prmaxarg λ edf d* m
![Page 56: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/56.jpg)
5656
Parameter Estimation
• Maximum conditional log-likelihood criterion
• Since correct derivations are not included in training data, parameters λ* are iteratively computed that maximizes the probability of the training data
),(
λλ )|(Prlogmaxargλfe
* ef
![Page 57: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/57.jpg)
57
References
A. Aho, J. Ullman (1972). The Theory of Parsing, Translation, and Compiling. Prentice Hall.
P. Brown, V. Della Pietra, S. Della Pietra, R. Mercer (1993). The mathematics of statistical machine translation: Parameter estimation. Comp. Ling., 19(2):263-312.
D. Chiang (2005). A hierarchical phrase-based model for statistical machine translation. In Proc. of ACL, pp. 263-270. Ann Arbor, MI.
P. Koehn, F. Och, D. Marcu (2003). Statistical phrase-based translation. In Proc. of HLT-NAACL. Edmonton, Canada.
Y. W. Wong, R. Mooney (2006). Learning for semantic parsing with statistical machine translation. In Proc. of HLT-NAACL, pp. 439-446. New York, NY.
![Page 58: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/58.jpg)
58
References
Y. W. Wong, R. Mooney (2007a). Generation by inverting a semantic parser that uses statistical machine translation. In Proc. of NAACL-HLT, pp. 172-179. Rochester, NY.
Y. W. Wong, R. Mooney (2007b). Learning synchronous grammars for semantic parsing with lambda calculus. In Proc. of ACL, pp. 960-967. Prague, Czech Republic.
D. Wu (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comp. Ling., 23(3):377-403.
![Page 59: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/59.jpg)
Semantic Parsing using CCG
![Page 60: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/60.jpg)
60
Combinatory Categorial Grammar (CCG)
• Highly structured lexical entries• A few general parsing rules (Steedman, 2000;
Steedman & Baldridge, 2005)• Each lexical entry is a word paired with a
category
Texas := NP
borders := (S \ NP) / NP
Mexico := NP
New Mexico := NP
![Page 61: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/61.jpg)
61
Parsing Rules (Combinators)
• Describe how adjacent categories are combined• Functional application:
A / B B ⇒ A (>)
B A \ B ⇒ A (<)
Texas borders New Mexico
NP (S \ NP) / NP NP
S \ NP>
<S
Forward composition
Backward composition
![Page 62: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/62.jpg)
62
CCG for Semantic Parsing
• Extend categories with semantic types (Lambda calculus expressions)
• Functional application with semantics:
Texas := NP : texas
borders := (S \ NP) / NP : λx.λy.borders(y, x)
A / B : f B : a ⇒ A : f(a) (>)
B : a A \ B : f ⇒ A : f(a) (<)
![Page 63: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/63.jpg)
63
Sample CCG Derivation
Texas borders New Mexico
NP
texas
(S \ NP) / NP
λx.λy.borders(y, x)
NP
new_mexico
S \ NP
λy.borders(y, new_mexico)
>
<S
borders(texas, new_mexico)
![Page 64: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/64.jpg)
64
Another Sample CCG Derivation
Texas touches New Mexico
NP
texas
(S \ NP) / NP
λx.λy.borders(y, x)
NP
new_mexico
S \ NP
λy.borders(y, new_mexico)
>
<S
borders(texas, new_mexico)
![Page 65: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/65.jpg)
65
Probabilistic CCG for Semantic Parsing
• L (lexicon) =
• w (feature weights)
• Features:– fi(s, d): Number of times lexical item i is used in derivation d of
sentence s
• Log-linear model: Pw(d | s) exp(w . f(s, d))• Best derviation: d* = argmaxd w . f(s, d)
– Consider all possible derivations d for the sentence s given the lexicon L
Zettlemoyer & Collins (2005)
Texas := NP : texas
borders := (S \ NP) / NP : λx.λy.borders(y, x)
Mexico := NP : mexico
New Mexico := NP : new_mexico
![Page 66: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/66.jpg)
66
Learning Probabilistic CCG
Lexical Generation
CCG ParserLogical FormsSentences
Training Sentences &
Logical Forms
Parameter Estimation
Lexicon L
Feature weights w
![Page 67: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/67.jpg)
67
Lexical Generation
• Input:
• Output lexicon:
Texas borders New Mexico
borders(texas, new_mexico)
Texas := NP : texas
borders := (S \ NP) / NP : λx.λy.borders(y, x)
New Mexico := NP : new_mexico
![Page 68: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/68.jpg)
68
Lexical Generation
Input sentence:
Texas borders New Mexico
Output substrings:
Texas
borders
New
Mexico
Texas borders
borders New
New Mexico
Texas borders New
…
Input logical form:
borders(texas, new_mexico)
Output categories:
![Page 69: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/69.jpg)
69
Category Rules
Input Trigger Output Category
constant c NP : c
arity one predicate p N : λx.p(x)
arity one predicate p S \ NP : λx.p(x)
arity two predicate p (S \ NP) / NP : λx.λy.p(y, x)
arity two predicate p (S \ NP) / NP : λx.λy.p(x, y)
arity one predicate p N / N : λg.λx.p(x) g(x)
arity two predicate p and constant c
N / N : λg.λx.p(x, c) g(x)
arity two predicate p (N \ N) / NP : λx.λg.λy.p(y, x) g(x)
arity one function f NP / N : λg.argmax/min(g(x), λx.f(x))
arity one function f S / NP : λx.f(x)
![Page 70: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/70.jpg)
70
Lexical Generation
Input sentence:
Texas borders New Mexico
Output substrings:
Texas
borders
New
Mexico
Texas borders
borders New
New Mexico
Texas borders New
…
Input logical form:
borders(texas, new_mexico)
Output categories:
NP : texas
NP : new _mexico
(S \ NP) / NP : λx.λy.borders(y, x)
(S \ NP) / NP : λx.λy.borders(x, y)
…
Take cross product to form an initial lexicon, include some domain
independent entries What | S/(S\NP)/N : λf.λg.λx.f(x) g(x)∧
![Page 71: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/71.jpg)
71
Parameter Estimation
• A lexicon can lead to multiple parses, parameters decide the best parse
• Maximum conditional likelihood: Iteratively find parameters that maximizes the probability of the training data
• Derivations d are not annotated, treated as hidden variables
• Stochastic gradient ascent (LeCun et al., 1998)
• Keep only those lexical items that occur in the highest scoring derivations of training set
![Page 72: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/72.jpg)
72
References
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner (1998). Gradient-based learning applied to document recognition. In Proc. of the IEEE, 86(11):2278-2324.
M. Steedman (2000). The Syntactic Process. MIT Press.
M. Steedman, J. Baldridge (2005). Combinatory categorial grammar. To appear in: R. Borsley, K. Borjars (eds.) Non-Transformational Syntax, Blackwell.
L. Zettlemoyer, M. Collins (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proc. of UAI. Edinburgh, Scotland.
L. Zettlemoyer, M. Collins (2007). Online learning of relaxed CCG grammars for parsing to logical form. In Proc. of EMNLP-CoNLL, pp. 678-687. Prague, Czech Republic.
Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. Inducing Probabilistic CCG Grammars from Logical Form with Higher-order Unification . In Proceedings of the Conference on Emperical Methods in Natural Language Processing (EMNLP), 2010.
![Page 73: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/73.jpg)
73
Homework 7
Given the following synchronous grammar, parse the sentence “player three should always intercept” and hence find its meaning representation.
TEAM our / our
TEAM opponent’s / opponent
REGION TEAM half / (half TEAM)
ACTION stay in REGION / (pos REGION)
ACTION intercept / (intercept)
UNUM goalie / 1
UNUM player three / 3
RULE UNUM should always ACTION / ((true) (do our {UNUM} ACTION))
![Page 74: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/74.jpg)
Semantic Parsing Using Kernels
![Page 75: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/75.jpg)
75
KRISP: Kernel-based Robust Interpretation for Semantic Parsing
• Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar
• Productions of MRL are treated like semantic concepts
• A string classifier is trained for each production to estimate the probability of an NL string representing its semantic concept
• These classifiers are used to compositionally build MRs of the sentences
Kate & Mooney (2006), Kate (2008a)
![Page 76: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/76.jpg)
76
MR: answer(traverse(next_to(stateid(‘texas’))))Parse tree of MR:
Productions: ANSWER answer(RIVER) RIVER TRAVERSE(STATE) STATE NEXT_TO(STATE) TRAVERSE traverse NEXT_TO next_to STATEID ‘texas’
ANSWER
answer
STATE
RIVER
STATE
NEXT_TO
TRAVERSE
STATEID
stateid ‘texas’
next_to
traverse
Meaning Representation Language
ANSWER answer(RIVER)
RIVER TRAVERSE(STATE)
TRAVERSE traverse STATE NEXT_TO(STATE)
NEXT_TO next_to STATE STATEID
STATEID ‘texas’
![Page 77: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/77.jpg)
77
Overview of KRISP
Semantic Parser
Semantic Parser Learner
MRL Grammar
NL sentences with MRs
String classification probabilities
Novel NL sentences Best MRs
![Page 78: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/78.jpg)
78
Semantic Parsing by KRISP
• String classifier for each production gives the probability that a substring represents the semantic concept of the production
Which rivers run through the states bordering Texas?
NEXT_TO next_to 0.02 0.01
NEXT_TO next_toNEXT_TO next_to 0.95
![Page 79: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/79.jpg)
79
Semantic Parsing by KRISP
• String classifier for each production gives the probability that a substring represents the semantic concept of the production
Which rivers run through the states bordering Texas?
TRAVERSE traverseTRAVERSE traverse 0.210.91
![Page 80: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/80.jpg)
80
Semantic Derivation of an NL Sentence
Which rivers run through the states bordering Texas?
ANSWER answer(RIVER)
RIVER TRAVERSE(STATE)
TRAVERSE traverse STATE NEXT_TO(STATE)
NEXT_TO next_to STATE STATEID
STATEID ‘texas’
Semantic Derivation: Each node covers an NL substring:
![Page 81: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/81.jpg)
81
Semantic Derivation of an NL Sentence
Through the states that border Texas which rivers run?
ANSWER answer(RIVER)
RIVER TRAVERSE(STATE)
TRAVERSE traverse STATE NEXT_TO(STATE)
NEXT_TO next_to STATE STATEID
STATEID ‘texas’
Substrings in NL sentence may be in a different order:
![Page 82: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/82.jpg)
82
Semantic Derivation of an NL Sentence
Through the states that border Texas which rivers run?
(ANSWER answer(RIVER), [1..10])
(RIVER TRAVERSE(STATE), [1..10]]
(TRAVERSE traverse, [7..10])(STATE NEXT_TO(STATE), [1..6])
(NEXT_TO next_to, [1..5]) (STATE STATEID, [6..6])
(STATEID ‘texas’, [6..6])
Nodes are allowed to permute the children productions from the original MR parse
![Page 83: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/83.jpg)
83
Semantic Parsing by KRISP
• Semantic parsing reduces to finding the most probable derivation of the sentence
• Efficient dymamic programming algorithm with beam search (Extended Earley’s algorithm) [Kate & Mooney 2006]
Which rivers run through the states bordering Texas?
ANSWER answer(RIVER)
RIVER TRAVERSE(STATE)
TRAVERSE traverse STATE NEXT_TO(STATE)
NEXT_TO next_to STATE STATEID
STATEID ‘texas’
0.91
0.95
0.89
0.92
0.81
0.98
0.99
Probability of the derivation is the product of the probabilities at the nodes.
![Page 84: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/84.jpg)
84
Overview of KRISP
Semantic Parser
Semantic Parser Learner
MRL Grammar
NL sentences with MRs
String classification probabilities
![Page 85: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/85.jpg)
85
KRISP’s Training Algorithm
• Takes NL sentences paired with their respective MRs as input
• Obtains MR parses using MRL grammar• Induces the semantic parser and refines it in
iterations• In the first iteration, for every production:
– Call those sentences positives whose MR parses use that production
– Call the remaining sentences negatives– Trains Support Vector Machine (SVM) classifier
[Cristianini & Shawe-Taylor 2000] using string-subsequence kernel
![Page 86: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/86.jpg)
86
Overview of KRISP
Semantic Parser
Semantic Parser Learner
MRL Grammar
NL sentences with MRs
![Page 87: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/87.jpg)
87
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Training
![Page 88: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/88.jpg)
88
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Training
![Page 89: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/89.jpg)
89
KRISP’s Training Algorithm contd.
STATE NEXT_TO(STATE)
•which rivers run through the states bordering texas?
•what is the most populated state bordering oklahoma ?
•what is the largest city in states that border california ?
…
•what state has the highest population ?
•what states does the delaware river run through ?
•which states have cities named austin ?
•what is the lowest point of the state with the largest area ?
…
Positives Negatives
String-kernel-based SVM classifier
First Iteration
![Page 90: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/90.jpg)
90
Support Vector Machines (SVMs)
• Recent very popular machine learning method for classification
• Finds that linear separator that maximizes the margin between the classes
• Based in computational learning theory, which explains why max-margin is a good approach (Vapnik, 1995)
• Good at avoiding over-fitting in high-dimensional feature spaces
• Performs well on various text and language problems, which tend to be high-dimensional
![Page 91: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/91.jpg)
91
Picking a Linear Separator
• Which of the alternative linear separators is best?
![Page 92: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/92.jpg)
92
Classification Margin
• Consider the distance of points from the separator.• Examples closest to the hyperplane are support
vectors. • Margin ρ of the separator is the width of separation
between classes.r
ρ
![Page 93: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/93.jpg)
93
SVM Algorithms
• Finding the max-margin separator is an optimization problem
• Algorithms that guarantee an optimal margin take at least O(n2) time and do not scale well to large data sets
• Approximation algorithms like SVM-light (Joachims, 1999) and SMO (Platt, 1999) allow scaling to realistic problems
![Page 94: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/94.jpg)
94
SVM and Kernels
• An interesting observation: SVM sees the training and test data only through their dot products
• This observation has been exploited with the idea of a kernel
![Page 95: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/95.jpg)
95
Kernels
• SVMs can be extended to learning non-linear separators by using kernel functions.
• A kernel function is a similarity function between two instances, K(x1,x2), that must satisfy certain mathematical constraints.
• A kernel function implicitly maps instances into a higher dimensional feature space where (hopefully) the categories are linearly separable.
• A kernel-based method (like SVMs) can use a kernel to implicitly operate in this higher-dimensional space without having to explicitly map instances into this much larger (perhaps infinite) space (called “the kernel trick”).
• Kernels can be defined on non-vector data like strings, trees, and graphs, allowing the application of kernel-based methods to complex, unbounded, non-vector data structures.
![Page 96: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/96.jpg)
96
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
![Page 97: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/97.jpg)
97
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
K(s,t) = ?
![Page 98: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/98.jpg)
98
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = states
K(s,t) = 1+?
![Page 99: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/99.jpg)
99
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = next
K(s,t) = 2+?
![Page 100: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/100.jpg)
100
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = to
K(s,t) = 3+?
![Page 101: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/101.jpg)
101
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = states next
K(s,t) = 4+?
![Page 102: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/102.jpg)
102
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = states to
K(s,t) = 5+?
![Page 103: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/103.jpg)
103
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = next to
K(s,t) = 6+?
![Page 104: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/104.jpg)
104
String Subsequence Kernel
• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]
s = “states that are next to”
t = “the states next to”
u = states next to
K(s,t) = 7
![Page 105: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/105.jpg)
105
String Subsequence Kernel contd.
• The kernel is normalized to remove any bias due to different string lengths
• Lodhi et al. [2002] give O(n|s||t|) algorithm for computing string subsequence kernel
),(*),(
),(),(
ttKssK
tsKtsKnormalized
![Page 106: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/106.jpg)
106
String Subsequence Kernel• Define kernel between two strings as the number of
common subsequences between them [Lodhi et al., 2002]
• The examples are implicitly mapped to the feature space of all subsequences and the kernel computes the dot products
the states next to
states bordering
states that border
states that share border
state with the capital of
states through which
STATE NEXT_TO(STATE)
states with area larger than
![Page 107: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/107.jpg)
107
Support Vector Machines
• SVMs find a separating hyperplane such that the margin is maximized
the states next to
Separating hyperplane
Probability estimate of an example belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999]
states bordering
states that border
states that share border
state with the capital of
states with area larger than
states through which
0.63
STATE NEXT_TO(STATE)
0.97
next to state
states that are next to
![Page 108: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/108.jpg)
108
Support Vector Machines
• SVMs find a separating hyperplane such that the margin is maximized
the states next to
Separating hyperplane
SVMs with string subsequence kernel softly capture different ways of expressing the semantic concept.
states bordering
states that border
states that share border
state with the capital of
states with area larger than
states through which
0.63
STATE NEXT_TO(STATE)
0.97
next to state
states that are next to
![Page 109: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/109.jpg)
109
KRISP’s Training Algorithm contd.
STATE NEXT_TO(STATE)
•which rivers run through the states bordering texas?
•what is the most populated state bordering oklahoma ?
•what is the largest city in states that border california ?
…
•what state has the highest population ?
•what states does the delaware river run through ?
•which states have cities named austin ?
•what is the lowest point of the state with the largest area ?
…
Positives Negatives
String classification probabilities
String-kernel-based SVM classifier
First Iteration
![Page 110: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/110.jpg)
110
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Training
![Page 111: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/111.jpg)
111
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Training
![Page 112: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/112.jpg)
112
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Training
String classification probabilities
![Page 113: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/113.jpg)
113
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Best MRs (correct and incorrect)
Training
![Page 114: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/114.jpg)
114
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Best semantic derivations (correct and incorrect)
Training
![Page 115: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/115.jpg)
115
KRISP’s Training Algorithm contd.
• Using these classifiers, it tries to parse the sentences in the training data
• Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations
• For the next iteration, collect positive examples from correct derivations and negative examples from incorrect derivations
• Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse
• Collect negatives from incorrect derivations with higher probability than the most probable correct derivation
![Page 116: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/116.jpg)
116
KRISP’s Training Algorithm contd.
Most probable correct derivation:
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
1 2 3 4 5 6 7 8 9
![Page 117: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/117.jpg)
117
KRISP’s Training Algorithm contd.
Most probable correct derivation: Collect positive examples
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4]) (STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
1 2 3 4 5 6 7 8 9
![Page 118: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/118.jpg)
118
KRISP’s Training Algorithm contd.Incorrect derivation with probability greater than the
most probable correct derivation:
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
1 2 3 4 5 6 7 8 9
Incorrect MR: answer(traverse(stateid(‘texas’)))
![Page 119: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/119.jpg)
119
KRISP’s Training Algorithm contd.Incorrect derivation with probability greater than the
most probable correct derivation: Collect negative examples
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
1 2 3 4 5 6 7 8 9
Incorrect MR: answer(traverse(stateid(‘texas’)))
![Page 120: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/120.jpg)
120
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
![Page 121: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/121.jpg)
121
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
![Page 122: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/122.jpg)
122
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
![Page 123: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/123.jpg)
123
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
![Page 124: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/124.jpg)
124
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Traverse both trees in breadth-first order till the first nodes where their productions differ are found.
![Page 125: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/125.jpg)
125
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Mark the words under these nodes.
![Page 126: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/126.jpg)
126
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
Mark the words under these nodes.
![Page 127: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/127.jpg)
127
Consider all the productions covering the marked words.Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
![Page 128: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/128.jpg)
128
Consider the productions covering the marked words.Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.
KRISP’s Training Algorithm contd.
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..7])
(STATE STATEID, [8..9])
(STATEID ‘texas’,[8..9])
Which rivers run through the states bordering Texas?
(ANSWER answer(RIVER), [1..9])
(RIVER TRAVERSE(STATE), [1..9])
(TRAVERSE traverse, [1..4])
(STATE NEXT_TO(STATE), [5..9])
(STATE STATEID, [8..9])
(STATEID ‘texas’, [8..9])
(NEXT_TO next_to, [5..7])
Most ProbableCorrect derivation:
Incorrect derivation:
![Page 129: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/129.jpg)
129
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Best semantic derivations (correct and incorrect)
Training
![Page 130: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/130.jpg)
130
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Best semantic derivations (correct and incorrect)
Training
![Page 131: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/131.jpg)
131
Overview of KRISP
Train string-kernel-based SVM classifiers
Semantic Parser
Collect positive and negative examples
MRL Grammar
NL sentences with MRs
Best semantic derivations (correct and incorrect)
Novel NL sentences Best MRs
Testing
![Page 132: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/132.jpg)
132
A Dependency-based Word Subsequence Kernel
• A word subsequence kernel can count linguistically meaningless subsequences– A fat cat was chased by a dog.– A cat with a red collar was chased two days ago
by a fat dog
• A new kernel that counts only the linguistically meaningful subsequences
Kate (2008a)
![Page 133: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/133.jpg)
133
A Dependency-based Word Subsequence Kernel
• Count the number of common paths in the dependency trees; efficient algorithm to do it
• Outperforms word subsequence kernel on semantic parsing
was
cat chased
by
dog
a
fata
was
cat chased
a with
collar
reda
by ago
dog
fata
days
two
Kate (2008a)
![Page 134: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/134.jpg)
134
References
Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris
Watkins (2002). Text classification using string kernels. Journal of Machine Learning Research, 2:419--444.
John C. Platt (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, pages 185--208. MIT Press.
Rohit J. Kate and Raymond J. Mooney (2006). Using string-kernels for learning semantic parsers. In Proc. of COLING/ACL-2006, pp. 913-920, Sydney, Australia, July 2006.
Rohit J. Kate (2008a). A dependency-based word subsequence kernel. In Proc. of EMNLP-2008, pp. 400-409, Waikiki, Honolulu, Hawaii, October 2008.
Rohit J. Kate (2008b). Transforming meaning representation grammars to improve semantic parsing. In Proc. Of CoNLL-2008, pp. 33-40, Manchester, UK, August 2008.
![Page 135: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/135.jpg)
Exploiting Syntax for Semantic Parsing
![Page 136: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/136.jpg)
136
• Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations
• Integrated syntactic-semantic parsing– Allows both syntax and semantics to be used
simultaneously to obtain an accurate combined syntactic-semantic analysis
• A statistical parser is used to generate a semantically augmented parse tree (SAPT)
SCISSOR Ge & Mooney (2005)
![Page 137: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/137.jpg)
137
Syntactic Parse
PRP$ NN CD VB
DT NN
NP
VPNP
S
our player 2 has
the ball
![Page 138: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/138.jpg)
138
SAPT
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
Non-terminals now have both syntactic and semantic labels
![Page 139: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/139.jpg)
139
SAPT
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
MR: (bowner (player our {2}))
Compose MR
![Page 140: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/140.jpg)
140
SCISSOR Overview
Integrated Semantic ParserSAPT Training Examples
TRAINING
learner
![Page 141: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/141.jpg)
141
Integrated Semantic Parser
SAPT
Compose MR
MR
NL Sentence
TESTING
SCISSOR Overview
![Page 142: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/142.jpg)
142
Integrated Syntactic-Semantic Parsing
• Find a SAPT with the maximum probability• A lexicalized head-driven syntactic parsing model • Extended Collins (1997) syntactic parsing model to
generate semantic labels simultaneously with syntactic labels
• Smoothing– Each label in SAPT is the combination of a syntactic label
and a semantic label which increases data sparsity– Break the parameters down into syntactic label and
semantic label given syntactic label
![Page 143: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/143.jpg)
143
Experimental Corpora
• CLang (Kate, Wong & Mooney, 2005)– 300 pieces of coaching advice– 22.52 words per sentence
• Geoquery (Zelle & Mooney, 1996)– 880 queries on a geography database– 7.48 word per sentence– MRL: Prolog and FunQL
![Page 144: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/144.jpg)
144
Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
FunQL: answer(river(loc_2(stateid(texas))))
Logical forms: widely used as MRLs in computational semantics, support reasoning
Prolog vs. FunQL (Wong & Mooney, 2007b)
![Page 145: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/145.jpg)
145
Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
FunQL: answer(river(loc_2(stateid(texas))))
Flexible order
Strict order
Better generalization on Prolog
Prolog vs. FunQL (Wong & Mooney, 2007b)
![Page 146: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/146.jpg)
146
Experimental Methodology
• Standard 10-fold cross validation• Correctness
– CLang: exactly matches the correct MR– Geoquery: retrieves the same answers as the correct
MR
• Metrics– Precision: % of the returned MRs that are correct– Recall: % of NLs with their MRs correctly returned– F-measure: harmonic mean of precision and recall
![Page 147: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/147.jpg)
147
Compared Systems
• COCKTAIL (Tang & Mooney, 2001)– Deterministic, inductive logic programming
• WASP (Wong & Mooney, 2006)– Semantic grammar, machine translation
• KRISP (Kate & Mooney, 2006)– Semantic grammar, string kernels
• Z&C (Zettleymoyer & Collins, 2007)– Syntax-based, combinatory categorial grammar (CCG)
• LU (Lu et al., 2008)– Semantic grammar, generative parsing model
• SCISSOR (Ge & Mooney 2005)– Integrated syntactic-semantic parsing
![Page 148: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/148.jpg)
148
Compared Systems
• COCKTAIL (Tang & Mooney, 2001)– Deterministic, inductive logic programming
• WASP (Wong & Mooney, 2006)– Semantic grammar, machine translation
• KRISP (Kate & Mooney, 2006)– Semantic grammar, string kernels
• Z&C (Zettleymoyer & Collins, 2007)– Syntax-based, combinatory categorial grammar (CCG)
• LU (Lu et al., 2008)– Semantic grammar, generative parsing model
• SCISSOR (Ge & Mooney 2005)– Integrated syntactic-semantic parsing
Hand-built lexicon for Geoquery
Small part of the lexicon
![Page 149: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/149.jpg)
149
Compared Systems
• COCKTAIL (Tang & Mooney, 2001)– Deterministic, inductive logic programming
• WASP (Wong & Mooney, 2006, 2007b)– Semantic grammar, machine translation
• KRISP (Kate & Mooney, 2006)– Semantic grammar, string kernels
• Z&C (Zettleymoyer & Collins, 2007)– Syntax-based, combinatory categorial grammar (CCG)
• LU (Lu et al., 2008)– Semantic grammar, generative parsing model
• SCISSOR (Ge & Mooney 2005)– Integrated syntactic-semantic parsing
λ-WASP, handling logical forms
![Page 150: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/150.jpg)
150
Results on CLang
Precision Recall F-measure
COCKTAIL - - -
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
Z&C - - -
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
Memory overflow
Not reported
![Page 151: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/151.jpg)
151
Results on CLang
Precision Recall F-measure
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
![Page 152: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/152.jpg)
152
Results on Geoquery
Precision Recall F-measure
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
COCKTAIL 89.9 79.4 84.3
λ-WASP 92.0 86.6 89.2
Z&C 95.5 83.2 88.9
(LU: F-measure after reranking is 85.2%)
Prolog
FunQL
![Page 153: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/153.jpg)
153
Results on Geoquery (FunQL)
Precision Recall F-measure
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
(LU: F-measure after reranking is 85.2%)
competitive
![Page 154: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/154.jpg)
154
When the Prior Knowledge of Syntax Does Not Help
• Geoquery: 7.48 word per sentence• Short sentence
– Sentence structure can be feasibly learned from NLs paired with MRs
• Gain from knowledge of syntax vs. flexibility loss
![Page 155: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/155.jpg)
155
Limitation of Using Prior Knowledge of Syntax
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Traditional syntactic analysis
![Page 156: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/156.jpg)
156
Limitation of Using Prior Knowledge of Syntax
What state
is the smallest
state is the smallest
N1
What N2
N1
N2
answer(smallest(state(all))) answer(smallest(state(all)))
Traditional syntactic analysis Semantic grammar
Isomorphic syntactic structure with MRBetter generalization
![Page 157: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/157.jpg)
157
When the Prior Knowledge of Syntax Does Not Help
• Geoquery: 7.48 word per sentence• Short sentence
– Sentence structure can be feasibly learned from NLs paired with MRs
• Gain from knowledge of syntax vs. flexibility loss
![Page 158: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/158.jpg)
158
Clang Results with Sentence Length
0-10(7%)
11-20(33%)
21-30(46%)
31-40(13%)
0-10(7%)
11-20(33%)
21-30(46%)
0-10(7%)
11-20(33%)
31-40(13%)
21-30(46%)
0-10(7%)
11-20(33%)
Knowledge of syntax improves
performance on long sentences
Sentence length
![Page 159: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/159.jpg)
159
SYNSEM
• SCISSOR requires extra SAPT annotation for training
• Must learn both syntax and semantics from same limited training corpus
• High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005)
Ge & Mooney (2009)
![Page 160: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/160.jpg)
160
SCISSOR Requires SAPT Annotation
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
Time consuming.Automate it!
![Page 161: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/161.jpg)
161
SYNSEM Overview
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Disambiguation
Model
Syntactic
Parse
Multiple word
alignments
Multiple
SAPTS
Best
SAPT
Ge & Mooney (2009)
![Page 162: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/162.jpg)
162
SYNSEM Training : Learn Semantic Knowledge
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Syntactic
Parse
Multiple word
alignments
![Page 163: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/163.jpg)
163
Syntactic Parser
PRP$ NN CD VB
DT NN
NP
VPNP
S
our player 2 has
the ball
Use a statistical syntactic parser
![Page 164: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/164.jpg)
164
SYNSEM Training: Learn Semantic Knowledge
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Syntactic
Parse
Multiple word
alignmentsMR
![Page 165: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/165.jpg)
165
Semantic Lexicon
P_OUR P_PLAYER P_UNUM P_BOWNER NULL NULL
our player 2 hasthe ball
Use a word alignment model (Wong and Mooney (2006) )
our player 2 has ballthe
P_PLAYERP_BOWNER P_OUR P_UNUM
![Page 166: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/166.jpg)
166
Learning a Semantic Lexicon
• IBM Model 5 word alignment (GIZA++) • Top 5 word/predicate alignments for each training
example• Assume each word alignment and syntactic parse
defines a possible SAPT for composing the correct MR
![Page 167: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/167.jpg)
167
SYNSEM Training : Learn Semantic Knowledge
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Syntactic
Parse
Multiple word
alignmentsMR MR
![Page 168: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/168.jpg)
168
Introducing λvariables in semantic labels for missing arguments (a1: the first argument)
our player 2 has ballthe
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
NP
Introduce λ Variables
![Page 169: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/169.jpg)
169
our player 2 has ballthe
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Internal Semantic Labels
How to choose the dominant predicates?
NP
From Correct MR
![Page 170: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/170.jpg)
170
λa1λa2P_PLAYER P_UNUM
?
player 2
P_BOWNER
P_PLAYER
P_UNUMP_OUR
, a2=c2P_PLAYERλa1λa2PLAYER + P_UNUM λa1
(c2: child 2)
Collect Semantic Composition Rules
![Page 171: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/171.jpg)
171
our player 2 has ballthe
VP
S
NPP_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
?
λa1λa2PLAYER + P_UNUM {λa1P_PLAYER, a2=c2}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Collect Semantic Composition Rules
![Page 172: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/172.jpg)
172
our player 2 has ballthe
VP
S
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER ?
P_PLAYER
P_OUR +λa1P_PLAYER {P_PLAYER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Collect Semantic Composition Rules
![Page 173: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/173.jpg)
173
our player 2 has ballthe
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
?
P_PLAYER
P_UNUMP_OUR
Collect Semantic Composition Rules
P_BOWNER
![Page 174: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/174.jpg)
174
our player 2 has ballthe
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
P_BOWNER
P_PLAYER + λa1P_BOWNER {P_BOWNER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Collect Semantic Composition Rules
![Page 175: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/175.jpg)
175
Ensuring Meaning Composition
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Non-isomorphism
![Page 176: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/176.jpg)
176
Ensuring Meaning Composition
• Non-isomorphism between NL parse and MR parse– Various linguistic phenomena– Word alignment between NL and MRL– Use automated syntactic parses
• Introduce macro-predicates that combine multiple predicates
• Ensure that MR can be composed using a syntactic parse and word alignment
![Page 177: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/177.jpg)
177
SYNSEM Training: Learn Disambiguation Model
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Disambiguation
Model
Syntactic
Parse
Multiple word
alignments
Multiple
SAPTS
Correct
SAPTs
MR
![Page 178: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/178.jpg)
178
Parameter Estimation
• Apply the learned semantic knowledge to all training examples to generate possible SAPTs
• Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006)
• Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses
• Incomplete data since SAPTs are hidden variables
![Page 179: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/179.jpg)
179
Features
• Lexical features:– Unigram features: # that a word is assigned a
predicate– Bigram features: # that a word is assigned a
predicate given its previous/subsequent word.
• Rule features: # a composition rule applied in a derivation
![Page 180: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/180.jpg)
180
SYNSEM Testing
NL
Sentence
Syntactic Parser
Semantic Lexicon
Composition
Rules
Disambiguation
Model
Syntactic
Parse
Multiple word
alignments
Multiple
SAPTS
Best
SAPT
![Page 181: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/181.jpg)
181
Syntactic Parsers (Bikel,2004)
• WSJ only– CLang(SYN0): F-measure=82.15% – Geoquery(SYN0) : F-measure=76.44%
• WSJ + in-domain sentences– CLang(SYN20): 20 sentences, F-measure=88.21%
– Geoquery(SYN40): 40 sentences, F-measure=91.46%
• Gold-standard syntactic parses (GOLDSYN)
![Page 182: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/182.jpg)
182
Questions
• Q1. Can SYNSEM produce accurate semantic interpretations?
• Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers?
![Page 183: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/183.jpg)
183
Results on CLang
Precision Recall F-measure
GOLDSYN 84.7 74.0 79.0
SYN20 85.4 70.0 76.9
SYN0 87.0 67.0 75.7
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
SYNSEM
SAPTs
GOLDSYN > SYN20 > SYN0
![Page 184: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/184.jpg)
184
Questions
• Q1. Can SynSem produce accurate semantic interpretations? [yes]
• Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
• Q3. Does it also improve on long sentences?
![Page 185: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/185.jpg)
185
Detailed Clang Results on Sentence Length
31-40(13%)
21-30(46%)
0-10(7%)
11-20(33%)
Prior Knowledge
Syntactic error
+ Flexibility +
= ?
![Page 186: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/186.jpg)
186
Questions
• Q1. Can SynSem produce accurate semantic interpretations? [yes]
• Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
• Q3. Does it also improve on long sentences? [yes]
• Q4. Does it improve on limited training data due to the prior knowledge from large treebanks?
![Page 187: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/187.jpg)
187
Results on Clang (training size = 40)
Precision Recall F-measure
GOLDSYN 61.1 35.7 45.1
SYN20 57.8 31.0 40.4
SYN0 53.5 22.7 31.9
SCISSOR 85.0 23.0 36.2
WASP 88.0 14.4 24.7
KRISP 68.35 20.0 31.0
SYNSEM
SAPTs
The quality of syntactic parser is critically important!
![Page 188: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/188.jpg)
188
Questions
• Q1. Can SynSem produce accurate semantic interpretations? [yes]
• Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
• Q3. Does it also improve on long sentences? [yes]
• Q4. Does it improve on limited training data due to the prior knowledge from large treebanks? [yes]
![Page 189: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/189.jpg)
189
References
Bikel (2004). Intricacies of Collins’ Parsing Model. Computational Linguistics, 30(4):479-511.
Eugene Charniak and Mark Johnson (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proc. of ACL-2005, pp. 173-180, Ann Arbor, MI, June 2005.
Michal J. Collins (1997). Three generative, lexicalized models for syntactic parsing. In Proc. of ACL-97, pages 16-23, Madrid, Spain.
Ruifang Ge and Raymond J. Mooney (2005). A statistical semantic parser that integrates syntax and semantics. In Proc. of CoNLL-2005, pp. 9-16, Ann Arbor, MI, June 2005.
Ruifang Ge and Raymond J. Mooney (2006). Discriminative reranking for semantic parsing. In Proc. of COLING/ACL-2006, pp. 263-270, Syndney, Australia, July 2006.
Ruifang Ge and Raymond J. Mooney (2009). Learning a compositional semantic parser using an existing syntactic parser. In Proc. of ACL-2009, pp. 611-619, Suntec, Singapore, August 2009.
![Page 190: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/190.jpg)
Underlying Commonalities and Differences between Semantic Parsers
![Page 191: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/191.jpg)
191
Underlying Commonalities between Semantic Parsers
• A model to connect language and meaning
![Page 192: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/192.jpg)
192
A Model to Connect Language and Meaning
• Zettlemoyer & Collins: CCG grammar with semantic types
• WASP: Synchronous CFG
• KRISP: Probabilistic string classifiers
borders := (S \ NP) / NP : λx.λy.borders(y, x)
QUERY What is CITY / answer(CITY)
Which rivers run through the states bordering Texas?
NEXT_TO next_to 0.95
![Page 193: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/193.jpg)
193
A Model to Connect Language and Meaning
• Lu et al.: Hybrid tree, hybrid patterns
• SCISSOR/SynSem: Semantically annotated parse trees
do not
havestates
rivers
How many
?
QUERY:answer(NUM)
NUM:count(STATE)
STATE:exclude(STATE STATE)
STATE:state(all) STATE:loc_1(RIVER)
RIVER:river(all)
w: the NL sentencem: the MRT: the hybrid tree
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
![Page 194: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/194.jpg)
194
• A model to connect language and meaning• A mechanism for meaning composition
Underlying Commonalities between Semantic Parsers
![Page 195: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/195.jpg)
195
A Mechanism for Meaning Composition
• Zettlemoyer & Collins: CCG parsing rules• WASP: Meaning representation grammar• KRISP: Meaning representation grammar• Lu et al.: Meaning representation grammar• SCISSOR: Semantically annotated parse
trees• SynSem: Syntactic parses
![Page 196: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/196.jpg)
196
• A model to connect language and meaning• A mechanism for meaning composition• Parameters for selecting a meaning
representation out of many
Underlying Commonalities between Semantic Parsers
![Page 197: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/197.jpg)
197
Parameters to Select a Meaning Representation
• Zettlemoyer & Collins: Weights for lexical items and CCG parsing rules
• WASP: Weights for grammar productions• KRISP: SVM weights• Lu et al.: Generative model parameters• SCISSOR: Parsing model weights• SynSem: Parsing model weights
![Page 198: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/198.jpg)
198
• A model to connect language and meaning• A mechanism for meaning composition• Parameters for selecting a meaning
representation out of many• An iterative EM-like method for training to find
the right associations between NL and MR components
Underlying Commonalities between Semantic Parsers
![Page 199: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/199.jpg)
199
An Iterative EM-like Method for Training
• Zettlemoyer & Collins: Stochastic gradient ascent, structured perceptron
• WASP: Quasi-Newton method (L-BFGS)• KRISP: Re-parse the training sentences to
find more refined positive and negative examples
• Lu et al.: Inside-outside algorithm with EM• SCISSOR: None• SynSem: Quasi-Newton method (L-BFGS)
![Page 200: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/200.jpg)
200
• A model to connect language and meaning• A mechanism for meaning composition• Parameters for selecting a meaning
representation out of many• An iterative EM-like loop in training to find the
right associations between NL and MR components
• A generalization mechanism
Underlying Commonalities between Semantic Parsers
![Page 201: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/201.jpg)
201
A Generalization Mechanism
• Zettlemoyer & Collins: Different combinations of lexicon items and parsing rules
• WASP: Different combinations of productions• KRISP: Different combinations of meaning
productions, string similarity• Lu et al.: Different combinations of hybrid
patterns • SCISSOR: Different combinations of parsing
productions• SynSem: Different combinations of parsing
productions
![Page 202: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/202.jpg)
202
• A model to connect language and meaning• A mechanism for meaning composition• Parameters for selecting a meaning
representation out of many• An iterative EM-like loop in training to find the
right associations between NL and MR components
• A generalization mechanism
Underlying Commonalities between Semantic Parsers
![Page 203: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/203.jpg)
203
Differences between Semantic Parsers
• Learn lexicon or not
![Page 204: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/204.jpg)
204
Learn Lexicon or Not
• Learn it as a first step:– Zettlemoyer & Collins– WASP– SynSem– COCKTAIL
• Do not learn it:– KRISP– Lu at el. – SCISSOR
![Page 205: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/205.jpg)
205
Differences between Semantic Parsers
• Learn lexicon or not• Utilize knowledge of natural language or not
![Page 206: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/206.jpg)
206
Utilize Knowledge of Natural Language or Not
• Utilize knowledge of English syntax:– Zettlemoyer & Collins: CCG– SCISSOR: Phrase Structure Grammar– SynSem: Phrase Structure GrammarLeverage from natural language knowledge
• Do not utilize:– WASP– KRISP– Lu et al. Portable to other natural languages
![Page 207: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/207.jpg)
207 207207
Precision Learning Curve for GeoQuery (WASP)
![Page 208: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/208.jpg)
208 208208
Recall Learning Curve for GeoQuery (WASP)
![Page 209: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/209.jpg)
209
Differences between Semantic Parsers
• Learn lexicon or not• Utilize general syntactic parsing grammars or
not• Use matching patterns or not
![Page 210: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/210.jpg)
210
Use Matching Patterns or Not
• Use matching patterns:– Zettlemoter & Collins– WASP– Lu et al. – SCISSOR/SynSemThe systems can be inverted to form a generation
system, for e.g. WASP-1
• Do not use matching patterns:– KRISPMakes it robust to noise
![Page 211: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/211.jpg)
211
Robustness of KRISP • KRISP does not use matching patterns
• String-kernel-based classification softly captures wide range of natural language expressions
Robust to rephrasing and noise
Which rivers run through the states bordering Texas?
TRAVERSE traverse 0.95
![Page 212: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/212.jpg)
212
• KRISP does not use matching patterns
• String-kernel-based classification softly captures wide range of natural language expressions
Robust to rephrasing and noise
Robustness of KRISP
Which rivers through the states bordering Texas?
TRAVERSE traverse 0.65
![Page 213: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/213.jpg)
213
Experiments with Noisy NL Sentences contd.
• Noise was introduced in the NL sentences by:– Adding extra words chosen according to their
frequencies in the BNC– Dropping words randomly– Substituting words withphonetically close high
frequency words
• Four levels of noise was created by increasing the probabilities of the above
• We show best F-measures (harmonic mean of precision and recall)
![Page 214: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/214.jpg)
214
Results on Noisy CLang Corpus
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5
Noise Level
F-m
easu
re
KRISP
WASP
SCISSOR
![Page 215: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/215.jpg)
215
Differences between Semantic Parsers
• Learn lexicon or not• Utilize general syntactic parsing grammars or
not• Use matching patterns or not• Different amounts of supervision
![Page 216: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/216.jpg)
216
Different Amounts of Supervision
• Zettlemoyer & Collins– NL-MR pairs, CCG category rules, a small number
of predefined lexical items
• WASP, KRISP, Lu et al.– NL-MR pairs, MR grammar
• SCISSOR– Semantically annotated parse trees
• SynSem– NL-MR pairs, syntactic parser
![Page 217: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/217.jpg)
217
Differences between Semantic Parsers
• Learn lexicon or not• Utilize general syntactic parsing grammars or
not• Use matching patterns or not• Different forms of supervision
![Page 218: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/218.jpg)
218
Conclusions
• Semantic parsing maps NL sentences to completely formal MRs.
• Semantic parsing is an interesting and challenging task and is critical for developing computing systems that can understand and process natural language input
• Semantic parsers can be effectively learned from supervised corpora consisting of only sentences paired with their formal MRs (and possibly also SAPTs).
• The state-of-the-art in semantic parsing has been significantly advanced in recent years using a variety of statistical machine learning techniques and grammar formalisms
![Page 219: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/219.jpg)
219
Resources
ACL 2010 Tutorial:
http://www.cs.utexas.edu/users/ml/slides/semantic-parsing-tutorial-acl10.ppt
Geoquery and CLang data:http://www.cs.utexas.edu/users/ml/nldata/geoquery.html
http://www.cs.utexas.edu/users/ml/nldata/clang.html
WASP and KRISP semantic parsers: http://www.cs.utexas.edu/~ml/wasp/
http://www.cs.utexas.edu/~ml/krisp/
Geoquery Demo:http://www.cs.utexas.edu/users/ml/geo.html
![Page 220: 1 Natural Language Processing COMPSCI 423/723 Rohit Kate.](https://reader038.fdocuments.us/reader038/viewer/2022110205/56649c7f5503460f949361ba/html5/thumbnails/220.jpg)
220
Homework 8
• What will be the features that an SVM with string subsequence kernel will consider implicitly when considering an example string “near our goal”? And when considering an example string “near goal area”? What will it explicitly take as the only information about the two examples when it considers them together?