The Web as an Implicit Training Set: Application to...
Transcript of The Web as an Implicit Training Set: Application to...
![Page 1: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/1.jpg)
The Web as an Implicit Training Set:
Application to Noun Compound Syntax and Semantics
Preslav Nakov, Qatar Computing Research Institute (joint work with Marti Hearst, UC Berkeley)
MWE’2014 April 26, 2014 Gothenburg, Sweden
![Page 2: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/2.jpg)
2
Web-scale
Computational Linguistics
![Page 3: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/3.jpg)
3
The Big Dream (2001: A Space Odyssey)
Dave Bowman: “Open the pod bay doors, HAL”
HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
This is too hard!
So, we tackle sub-problems instead.
![Page 4: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/4.jpg)
4
The Rise of Corpora
• The field was stuck for quite some time. -‐ e.g., CYC: manually annotate all semantic concepts and relations
• A new statistical approach started in the 90s -‐ Get large text collections. -‐ Compute statistics over the words.
![Page 5: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/5.jpg)
5
Size Matters Banko & Brill: “Scaling to Very, Very Large Corpora for Natural
Language Disambiguation”, ACL’2001
• Spelling correction – Which word should we use? <principal> <principle>
– In a given context: • Randy Evans is the Principal of Gothenburg School District 20.
• Sweden’s Foreign Minister declares his support for principles to protect privacy in the face of surveillance.
![Page 6: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/6.jpg)
6
Ø Log-linear improvement even to a billion words!
Ø Getting more data is better than fine-tuning algorithms!
(Banko & Brill, 2001)
Great idea! Can it be extended to other tasks?
For this problem, one can get a lot of training data.
Size Matters: Using Billions of Words
![Page 7: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/7.jpg)
7
Language Models for SMT at Google: Using Quadrillions (1015) of Words!
(Brants&al,2007)
![Page 8: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/8.jpg)
8
The Web as a Baseline
• “Web as a baseline” (Lapata & Keller 04;05): n-gram models – machine translation candidate selection – article generation – noun compound interpretation – noun compound bracketing – adjective ordering – spelling correction – countability detection – prepositional phrase attachment
• Their conclusion: – The Web should be used as a baseline.
Significantly better than the best supervised algorithm.
Not significantly different from the best supervised algorithm.
These are all UNSUPERVISED!
We
can
do b
ette
r…
![Page 9: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/9.jpg)
9
The Web as an Implicit Training Set
• Much more can be achieved using – surface features – paraphrases – linguistic knowledge
• I will demonstrate this on noun compounds (and on some other problems)
![Page 10: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/10.jpg)
10
Noun Compounds
![Page 11: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/11.jpg)
11
Noun Compound
• Def: Sequence of nouns that function as a single noun, e.g. – healthcare reform – plastic water bottle – colon cancer tumor suppressor protein – Korpuslinguistikkonferenz (German)
Three problems: 1. Segmentation 2. Syntax 3. Semantics
![Page 12: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/12.jpg)
12
Noun Compounds • Encode Implicit Relations – hard to interpret
– malaria mosquito – CAUSE – plastic bottle - MATERIAL – water bottle - CONTAINER
• Abundant – cannot be ignored – 4% of the tokens in the Reuters corpus
• Highly productive – cannot be listed in a dictionary – 60.3% of the compounds in the British National Corpus occur just once – only 27% of English compounds of freq. >=10 are in an English-Japanese dictionary
• Also – ambiguous – context-dependent – (partially) lexicalized
![Page 13: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/13.jpg)
13
Noun Compounds: Applications
• Question Answering, Machine Translation, Information Extraction, Information Retrieval – WTO Geneva headquarters can be paraphrased as
headquarters of the WTO located in Geneva Geneva headquarters of the WTO
• Information Retrieval – Query: migraine treatment – verbs like relieve and prevent – for ranking and query refinement
![Page 14: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/14.jpg)
14
Noun Compound Syntax
![Page 15: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/15.jpg)
15
Noun Compound Syntax: The Problem
plastic water bottle plastic water bottle
OR ?
[ plastic [ water bottle ] ] [ [ plastic water ] bottle ]
right left
water bottle made of plastic bottle containing plastic water
![Page 16: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/16.jpg)
16
Measuring Word Association
• Frequencies – Dependency: #(w1,w2) vs. #(w1,w3) – Adjacency: #(w1,w2) vs. #(w2,w3)
• Probabilities – Dependency: Pr(w1→w2|w2) vs. Pr(w1→w3|w3) – Adjacency: Pr(w1→w2|w2) vs. Pr(w2→w3|w3)
• Also: Pointwise Mutual Information, Chi Square, etc.
Simple Word-based Models
w1 w2 w3
adjacency
dependency
plastic water bottle
![Page 17: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/17.jpg)
17
Web-derived Surface Features
• Observations – Authors often disambiguate noun compounds using surface
markers. – The size of the Web makes such markers frequent enough to be
useful.
• Ideas – Look for instances where the compound occurs with surface
markers. – Also try
• paraphrases • linguistic knowledge
The Web as an Implicit Training Set
![Page 18: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/18.jpg)
18
Web-derived Surface Features: Dash (hyphen)
• Left dash – cell-cycle analysis è left
• Right dash – donor T-cell è right
CoNLL'05: Nakov&Hearst
![Page 19: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/19.jpg)
19
Web-derived Surface Features: Possessive Marker
• After the first word – world’s food production è right
• After the second word – cell cycle’s analysis è left
CoNLL'05: Nakov&Hearst
![Page 20: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/20.jpg)
20
Web-derived Surface Features: Capitalization
• don’t-care – lowercase – uppercase – Plasmodium vivax Malaria è left – plasmodium vivax Malaria è left
• lowercase – uppercase – don’t-care – tumor Necrosis Factor è right – tumor Necrosis factor è right
CoNLL'05: Nakov&Hearst
![Page 21: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/21.jpg)
21
Web-derived Surface Features: Embedded Slash
• Left embedded slash – leukemia/lymphoma cell è right
CoNLL'05: Nakov&Hearst
![Page 22: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/22.jpg)
22
Web-derived Surface Features: Parentheses
• Single word – growth factor (beta) è left – (tumor) necrosis factor è right
• Two words – (cell cycle) analysis è left – adult (male rat) è right
CoNLL'05: Nakov&Hearst
![Page 23: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/23.jpg)
23
Web-derived Surface Features: Comma,dot,column,semi-column,…
• Following the second word – lung cancer: patients è left – health care, provider è left
• Following the first word – home. health care è right – adult, male rat è right
CoNLL'05: Nakov&Hearst
![Page 24: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/24.jpg)
24
Web-derived Surface Features: Abbreviation
• After the second word – tumor necrosis (TN) factor è left
• After the third word – tumor necrosis factor (NF) è right
CoNLL'05: Nakov&Hearst
![Page 25: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/25.jpg)
25
Web-derived Surface Features: Concatenation
Consider “health care reform”
• Dependency model – healthcare vs. healthreform
• Adjacency model – healthcare vs. carereform
• Triples – “healthcare reform” vs. “health carereform”
w1 w2 w3
adjacency
dependency
health care reform
CoNLL'05: Nakov&Hearst
![Page 26: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/26.jpg)
26
Web-derived Surface Features: Internal Inflection Variability
• First word – bone mineral density – bones mineral density
• Second word – bone mineral density – bone minerals density
CoNLL'05: Nakov&Hearst
![Page 27: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/27.jpg)
27
Web-derived Surface Features: Switch The First Two Words
• Predict right if we can reorder – adult male rat as – male adult rat
CoNLL'05: Nakov&Hearst
![Page 28: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/28.jpg)
28
• Prepositional – cells in (the) bone marrow è left (61,700) – cells from (the) bone marrow è left (16,500) – marrow cells from (the) bone è right (12)
• Verbal – cells extracted from (the) bone marrow è left (17) – marrow cells found in (the) bone è right (1)
• Copula – cells that are bone marrow è left (3)
“bone marrow cell”: left or right?
Paraphrases
CoNLL'05: Nakov&Hearst
“lef
t”
sum
“r
ight
” su
m
compare
![Page 29: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/29.jpg)
29
• Word associations
• Surface features and paraphrases
Evaluation Results On 244 noun compounds from Grolier’s encyclopedia (Lauer dataset)
Size does matter! Using MEDLINE instead of the Web (million times smaller) • 9.43% Coverage (23 out of 244 NCs) • 47.83% Accuracy (12 out of 23 wrong)
CoNLL'05: Nakov&Hearst
Acc. Cov.
Acc. Cov.
![Page 30: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/30.jpg)
30
Application to Other Syntactic Problems
![Page 31: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/31.jpg)
31
HLT-ENMLP'05: Nakov&Hearst
(a) Peter spent millions of dollars. (noun) (b) Peter spent time with his family. (verb) Can be represented as a quadruple: (v, n1, p, n2) (a) (spent, millions, of, dollars) (b) (spent, time, with, family)
• Accuracy – Surface features & paraphrases: 83.63% – Best unsupervised (Lin&Pantel’00): 84.30%
Syntactic Application 1: Prepositional Phrase Attachment
Human performance: n quadruple: 88% n whole sentence: 93%
![Page 32: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/32.jpg)
32
PP Attachment: n-gram models
• (i) Pr(p|n1) vs. Pr(p|v) • (ii) Pr(p,n2|n1) vs. Pr(p,n2|v)
– I eat/v spaghetti/n1 with/p a fork/n2. – I eat/v spaghetti/n1 with/p sauce/n2.
HLT-ENMLP'05: Nakov&Hearst
![Page 33: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/33.jpg)
33
PP Attachment: Web-derived Surface Features
• Example features – open the door / with a key à verb (100.00%, 0.13%) – open the door (with a key) à verb (73.58%, 2.44%) – open the door – with a keyà verb (68.18%, 2.03%) – open the door , with a key à verb (58.44%, 7.09%)
– eat Spaghetti with sauce à noun (100.00%, 0.14%) – eat ? spaghetti with sauceà noun (83.33%, 0.55%) – eat , spaghetti with sauce à noun (65.77%, 5.11%) – eat : spaghetti with sauce à noun (64.71%, 1.57%)
Acc Cov
sum
su
m
compare
HLT-ENMLP'05: Nakov&Hearst
![Page 34: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/34.jpg)
34
PP Attachment Paraphrases (1)
(1) v n1 p n2 è v n2 n1 (noun)
• Turn “n1 p n2” into a noun compound “n2 n1” – meet/v demands/n1 from/p customers/n2 è
meet/v the customer/n2 demands/n1
HLT-ENMLP'05: Nakov&Hearst
![Page 35: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/35.jpg)
35
PP Attachment Paraphrases (2)
(2) v n1 p n2 è v p n2 n1 (verb)
• Swap direct and indirect objects: – had/v a program/n1 in/p place/n2 è
had/v in/p place/n2 a program/n1
HLT-ENMLP'05: Nakov&Hearst
![Page 36: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/36.jpg)
36
PP Attachment Paraphrases (3)
(3) v n1 p n2 è p n2 * v n1 (verb)
• Look for apposition of “p n2” – I gave/v an apple/n1 to/p him/n2 è
(It was) to/p him/n2 (that) I gave/v an apple/n1
HLT-ENMLP'05: Nakov&Hearst
![Page 37: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/37.jpg)
37
PP Attachment Paraphrases (4)
(4) v n1 p n2 è n1 p n2 v (noun)
• Look for apposition of “n1 p n2” – shaken/v confidence/n1 in/p markets/n2 è
confidence/n1 in/p markets/n2 shaken/v
HLT-ENMLP'05: Nakov&Hearst
![Page 38: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/38.jpg)
38
PP Attachment Paraphrases (5)
(5) v n1 p n2 è v PRONOUN p n2 (verb)
• Substitute n1 with a pronoun (him, her) – put/v a client/n1 at/p odds/n2 è
put/v him at/p odds/n2
pronoun
HLT-ENMLP'05: Nakov&Hearst
![Page 39: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/39.jpg)
39
PP Attachment Paraphrases (6)
(6) v n1 p n2 è BE n1 p n2 (noun)
• Substitute v with is/are/was/were, e.g. – eat/v spaghetti/n1 with/p sauce/n2 è – is spaghetti/n1 with/p sauce/n2
to be
HLT-ENMLP'05: Nakov&Hearst
![Page 40: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/40.jpg)
40
Syntactic Application 2: Noun Compound Coordination & Ellipsis • Penn Treebank
– ellipsis: (NP bar/NN and/CC pie/NN graph/NN)
– no ellipsis: (NP (NP president/NN) and/CC (NP chief/NN executive/NN))
• Accuracy – Surface features & paraphrases: 80.61%
HLT-ENMLP'05: Nakov&Hearst
Real-world coordinations can be more complex: The Department of Chronic Diseases and Health Promotion leads and strengthens global efforts to prevent and control chronic diseases or disabilities and to promote health and quality of life.
![Page 41: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/41.jpg)
41
NP Coordination: N-gram models
(n1,c,n2,h)
• (i) #(n1,h) vs. #(n2,h) • (ii) #(n1,h) vs. #(n1,c,n2)
HLT-ENMLP'05: Nakov&Hearst
![Page 42: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/42.jpg)
42
NP Coordination: Surface Features su
m
sum
compare
HLT-ENMLP'05: Nakov&Hearst
![Page 43: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/43.jpg)
43
NP Coordination Paraphrases (1)
(1) n1 c n2 h è n2 c n1 h (ellipsis) • Swap n1 and n2
– bar/n1 and/c pie/n2 graph/h è pie/n2 and/c bar/n1 graph/h
HLT-ENMLP'05: Nakov&Hearst
![Page 44: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/44.jpg)
44
NP Coordination Paraphrases (2)
(2) n1 c n2 h è n2 h c n1 (NO ellipsis) • Swap n1 and n2 h
– president/n1 and/c chief/n2 executive/h è chief/n2 executive/h and/c president/n1
HLT-ENMLP'05: Nakov&Hearst
![Page 45: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/45.jpg)
45
NP Coordination Paraphrases (3)
(3) n1 c n2 h è n1 h c n2 h (ellipsis)
• Insert the elided head h – bar/n1 and/c pie/n2 graph/h è – bar/n1 graph/h and/c pie/n2 graph/h
h
HLT-ENMLP'05: Nakov&Hearst
![Page 46: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/46.jpg)
46
NP Coordination Paraphrases (4)
(4) n1 c n2 h è n2 h c n1 h (ellipsis) • Insert the head h; also switch n1 and n2
– bar/n1 and/c pie/n2 graph/h è – pie/n2 graph/h and/c bar/n1 graph/h
h
HLT-ENMLP'05: Nakov&Hearst
![Page 47: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/47.jpg)
47
More Applications (1): Bracketing the Penn Treebank NPs
• Augmenting the Penn Treebank ACL’07: Adding Noun Phrase Structure to the Penn Treebank David Vadas and James R. Curran
Ø Constituency Parsing Ø 0.5% drop in F1
Ø But Ø useful for QA, etc. Ø helps fix the CCG-bank
![Page 48: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/48.jpg)
48
More Applications (2): Search Engine Query Segmentation
• Query Segmentation – [ tumor suppressor protein ] – [ tumor suppressor ] [ protein ] – [ tumor ] [ suppressor protein ] – [ tumor ] [ suppressor ] [ protein ]
• Bracketing
[ [ tumor suppressor ] protein ]
[ tumor [ suppressor protein ] ]
EMNLP'07: Learning Noun Phrase Query Segmentation Shane Bergsma and Qin Iris Wang
![Page 49: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/49.jpg)
49
Ø Learn features for Ø all head-argument structures Ø individual attachments (not competing pairs)
Ø Generalize over POS Ø Use Google 1T 5-grams instead of Web Ø Error reduction
Ø dependency parser: 7% (MSTParser) Ø constituency parser 9.2% (Berkeley parser) Ø re-ranker: 3.4%
More Applications (3): Full Syntactic Parsing
ACL’11: Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein
For constituency parsing, improvement is due to Ø 40% affinity Ø 60% paraphrases
![Page 50: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/50.jpg)
50
Noun Compound
Semantics
![Page 51: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/51.jpg)
51
Noun Compound Semantics • Typically, choose one abstract relation
– Fixed set of abstract relations (Girju&al.,2005) • malaria mosquito è CAUSE • olive oil è SOURCE
– Prepositions (Lauer,1995) • malaria mosquito è with • olive oil è from
• Proposed approach: use multiple paraphrasing verbs – Paraphrasing verbs
• malaria mosquito è carries, spreads, causes, transmits, brings, has • olive oil è comes from, is obtained from, is extracted from
– Distribution over paraphrasing verbs
ACL'08: Nakov&Hearst
![Page 52: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/52.jpg)
52
Extracting Paraphrasing Verbs Using a Linguistic Paraphrasing Pattern
• Given “malaria mosquito”, query Google for
“mosquito THAT * malaria“
• Extract verbs 23 carry 16 spread 12 cause 9 transmit 7 bring 4 have 3 be infected with 3 infect with 2 give
post-modifier
pre-modifier
![Page 53: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/53.jpg)
53
Comparing to Girju&al. (2005)
shown are 14 out of 21 relations
![Page 54: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/54.jpg)
54
Amazon’s Mechanical Turk: Malaria Mosquito
• 10 human judges: – 8 carries – 4 causes – 2 transmits – 2 is infected with – 2 infects with – 1 has – 1 gives – 1 spreads – 1 supplies – …
Ø The program: Ø 23 carry Ø 16 spread Ø 12 cause Ø 9 transmit Ø 7 bring
Ø 4 have Ø 3 be infected with Ø 3 infect with Ø 2 give
On 250 noun-noun compounds and 25-30 human judges: 32% cosine correlation
SemEval-2010 task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions
C. Butnariu, Su Nam Kim, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
![Page 55: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/55.jpg)
55
Relational Componential Analysis
• Classic componential analysis
• Relational componential analysis
![Page 56: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/56.jpg)
56
Noun Compound Semantics Using Abstract Relations
[colon cancer] [[tumor suppressor] protein] ABSTRACT RELATIONS: [ [colon cancer]/LOCATION [ [tumor suppressor]/PURPOSE protein]/AGENT ]/LOCATION
![Page 57: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/57.jpg)
57
Noun Compound Semantics Using Prepositional Paraphrases
[colon cancer] [[tumor suppressor] protein] PREPOSITIONS:
{ {protein that is a {suppressor of tumors} }
in {cancer of/in the colon}
}
![Page 58: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/58.jpg)
58
Noun Compound Semantics Using Paraphrasing Verbs
[colon cancer] [[tumor suppressor] protein] VERBS:
{ {protein that acts as a {suppressor that inhibits tumors} }
which is implicated in {cancer that occurs in the colon}
} prevent/stop/keep from
developing/growing/arising
![Page 59: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/59.jpg)
59
Free Paraphrasing of Noun Compounds: Going Beyond Verbs and Prepositions
“onion tears” tears from onions tears due to cutting onion tears induced when cutting onions tears that onions induce tears that come from chopping onions tears that sometimes flow when onions are chopped tears that raw onions give you
SemEval-2013 task 4: Free Paraphrases of Noun Compounds C. Butnariu, I. Hendrickx, S. N. Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Szpakowicz, T. Veale
![Page 60: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/60.jpg)
60
Application to Other
Semantic Tasks
![Page 61: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/61.jpg)
61
The V+P+C Semantic Vector
• For “noun1 noun2”, query: "noun2 * noun1" "noun1 * noun2"
• Extract: – V: verbs – P: prepositions – C: coordinating conjunctions
committee member
![Page 62: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/62.jpg)
62
Semantic Application 1: Predicting Levi’s 12 Recoverably Deletable Predicates
• Accuracy (212 noun-noun compounds) – V+P+C: 50.0%±6.7% (baseline: 19.6%)
ACL'08: Nakov&Hearst
![Page 63: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/63.jpg)
63
Semantic Application 2: SAT Analogy Questions
• Accuracy (174 noun:noun examples) – LRA : 67.4%±7.1% – V+P+C : 71.3%±7.0%
ACL'08: Nakov&Hearst
![Page 64: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/64.jpg)
64
Semantic Application 3: Relations Between Complex Nominals
• Accuracy – Task #4 winner : 66.0% – V+P+C : 67.0% – + web-based argument generalization : 71.3%
SemEval-2007 task 4:Classification of semantic relations between nominals R. Girju, P. Nakov, V. Nastase, S. Szpakowicz, P. Turney, D. Yuret
Follow-up: SemEval-2010 task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
I. Hendrickx, Su Nam Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Padó, M. Pennacchiotti, L. Romano, S. Szpakowicz
ACL'08: Nakov&Hearst RANLP‘11: Nakov&Kozareva
![Page 65: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/65.jpg)
65
Semantic Application 4: 30 Head-Modifier Relations
• Accuracy (600 examples) – LRA : 39.8%±3.8% – V+P+C : 40.5%±3.9%
ACL'08: Nakov&Hearst
![Page 66: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/66.jpg)
66
“WTO Geneva headquarters” =
“headquarters of the WTO are located in Geneva”
(1) Geneva headquarters of the WTO
(2) WTO headquarters are located in Geneva
Semantic Application 5: Textual Entailment
![Page 67: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/67.jpg)
67
Application
to Machine Translation
![Page 68: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/68.jpg)
68
Statistical Machine Translation: Trained on Parallel Text
![Page 69: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/69.jpg)
69
Noun Compounds in Phrase-based SMT
English è After the oil price hikes of 1974 and 1980 , Japan's economy recovered through export growth .
Spanish Después de las alzas en los precios del petróleo de 1974 y 1980 , la economía nipona se recuperó a través del crecimiento basado en las exportaciones .
Idea: paraphrase the source phrase to increase coverage oil price hikes è alzas en los precios del petróleo hikes in oil prices è alzas en los precios del petróleo hikes in prices of oil è alzas en los precios del petróleo hikes in prices for oil è alzas en los precios del petróleo hikes in the prices of oil è alzas en los precios del petróleo hikes in the prices for oil è alzas en los precios del petróleo
![Page 70: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/70.jpg)
70
Pair each new sentence with the original transla1on, thus genera1ng a synthe1c corpus. Train an SMT system on it.
Paraphrasing a Source-Language Sentence
Improvement: equivalent to 33-50% of what could be achieved by doubling the amount of training data.
ECAI'08: Nakov
Looking forward to at least two papers on noun compounds in MT at this MWE’14: • German Compounds and Sta0s0cal Machine Transla0on. Can they get along?
Carla Parra Escar*n, Stephan Peitz and Hermann Ney • Paraphrasing Swedish Compound Nouns in Machine Transla0on
Edvin Ullman and Joakim Nivre
![Page 71: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/71.jpg)
71
Paraphrasing the Phrase Table
ECAI'08: Nakov
![Page 72: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/72.jpg)
72
Paraphrasing NPs & Noun Compounds
purely syntactic
use Web statistics
ECAI'08: Nakov
![Page 73: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/73.jpg)
73
Paraphrasing Noun Compounds
• Split the noun compound – N1=“beef”, N2=“import ban lifting” – N1=“beef import”, N2=“ban lifting” – N1=“beef import ban”, N2=“lifting”
• lt=word before • rt=word after
ECAI'08: Nakov
![Page 74: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/74.jpg)
74
Summary
![Page 75: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/75.jpg)
75
Summary
• Syntactic Tasks – Noun Compound Syntax – Prepositional Phrase Attachment – Noun Compound Coordination – Full syntactic parsing, etc.
• Semantic Tasks – Noun Compound Semantics – Predicting
• Abstract Semantic Relations • Relations Between Complex Nominals • Head-Modifier Relations
– Solving SAT Analogy Problems
• Application to a Real-World Task – Machine Translation
Tapped the potential of very large corpora for corpus linguistics by going beyond the n-gram:
• surface markers • paraphrases • linguistic knowledge
![Page 76: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/76.jpg)
76
Some Useful Tools and Resources • Yahoo! BOSS
• Google 1T 5-gram corpus
• Microsoft Web N-gram services
• IBM Web Fountain
• WaCKy
• Sketch engine
![Page 77: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/77.jpg)
77
Future Directions
![Page 78: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/78.jpg)
78
The Big Dream (2001: A Space Odyssey)
Dave Bowman: “Open the pod bay doors, HAL”
HAL 9000: “I’m sorry Dave. I’m afraid I can’t do that.”
![Page 79: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/79.jpg)
79
Semantics: Revolution is Needed?
• If we want the dream come true, we should – not rely on superficial statistics alone – need to get to the meaning of text
• A revolution in semantics is needed
– looking at words is not enough – we need better models for
• multi-word expressions (~70% of terminology) • semantic relations (meaning is in the links!)
• Key elements (in my opinion) – Web-scale corpora – linguistic knowledge – paraphrases
“Moving Lexical Seman1cs from Alchemy to Science”
Recent discussion on [Corpora-‐List]
• This is what Chomsky has done with syntax. • Should we expect the same for lexical seman1cs?
![Page 80: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/80.jpg)
80
Semantics: Community Efforts • Evaluations on shared corpora
– SemEval (18 tasks in 2015: fragmentation or community expansion?) – Shared tasks at *SEM and workshops
• Special journal issues – Computational Linguistics, LRE, JNLE, etc.
• Workshops – Really, really fragmented!
• MWE, RELMS, Disco, GEMS, TextInfer,…
– But now we also have *SEM! – And established workshops such as MWE:
• 2-day, 10-years old, … • MWE section of SIGLEX
![Page 81: The Web as an Implicit Training Set: Application to …multiword.sourceforge.net/mwe2014/slides/nakov_talk_MWE...The Web as an Implicit Training Set: Application to Noun Compound Syntax](https://reader034.fdocuments.us/reader034/viewer/2022042713/5fabe56a78569932051c3d74/html5/thumbnails/81.jpg)
81
The Future?
Three words: Web, paraphrases, linguistics