Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.
-
Upload
caroline-douglas -
Category
Documents
-
view
218 -
download
0
Transcript of Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.
![Page 1: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/1.jpg)
Multiword Expressions and LMF
Jan Odijk
PARSEME Workshop
Iaşi, 21-22 Sep 2015
1
![Page 2: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/2.jpg)
Overview
• MWEs
• Lexical Representation of MWEs
• DuELME
• DuELME and LMF
• Extensions
• Summary
2
![Page 3: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/3.jpg)
Overview
MWEs
• Lexical Representation of MWEs
• DuELME
• DuELME and LMF
• Extensions
• Summary
3
![Page 4: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/4.jpg)
What is an MWE?
• MWE = Multiword Expression
• Focus is on MWEs in an NLP context
4
![Page 5: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/5.jpg)
What is an MWE?
• sequence of words
• that has linguistic (lexical, orthographic, phonological, morphological, syntactic, semantic, pragmatic) or translational properties
• not predictable from the individual componentwords and the normal rules for combining them
5
![Page 6: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/6.jpg)
What is an MWE?
• sequence of – Not necessarily contiguous in a concrete utterance
• ...omdat hij de plaat wilde poetsen• …because he the plate wanted polish• ‘…because he bolted’
– Not necessarily always in the same order in each utterance• Hij poetste gisteren de plaat• He polished yesterday the plate• ‘he bolted yesterday’
6
![Page 7: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/7.jpg)
What is an MWE?
• words – Ambiguity between type and token (intentional)– Inflected word form v. lemma (both are needed)– Ambiguity between
• Character sequences separated from other character sequences by spaces and other separators (Narrow interpretation)
– Bibliotheekzaal v. library hall– compounds in Dutch, German, Norwegian, Swedish are NOT included– Compounds in English are included (parts separated by space)
• Abstract lexical units of the grammar (Broad interpretation)– Dutch, German compounds ARE included if they meet the other criteria
7
![Page 8: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/8.jpg)
What is an MWE?
• that has linguistic (lexical, orthographic, phonological, morphological, syntactic, semantic, pragmatic) or translational properties not predictable from the individual components and the normal rules for combining them
8
![Page 9: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/9.jpg)
What is an MWE?
• the normal rules for combining them– Assumptions about this must be made explicit
• In some cases they are not known
– For each concrete NLP-system: the rules of that NLP-system
9
![Page 10: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/10.jpg)
What is an MWE?
• Whether a word sequence is an MWE is an empirical hypothesis (or, in NLP, a proposed engineering solution)
• Intuitions about the status of expressions as MWEs have limited validity
• MWE-status must be argued for (or against)– Using the definition as a guide
10
![Page 11: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/11.jpg)
Types of MWEs (I)
• Fixed
• Semi-flexible
• Flexible
11
![Page 12: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/12.jpg)
Fixed MWEs
• Fixed MWEs– Words of the MWE in a fixed order– No variation in lexical item choice– Always contiguous (no other elements in
between)– No inflectional processes except at the edges
12
![Page 13: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/13.jpg)
Fixed MWEs
• Fixed MWEs– ad hoc, stante pede, ter plaatse– Hong Kong, Kuala Lumpur, New York, San Francisco– credit card, travel agency, real estate agency
• NOT– in plaats van (cf. in plaats daarvan) (‘instead of’)– carta telefonica (cf. carte telefoniche)– de plaat poetsen (‘polish the plate’, ‘bolt’)
13
![Page 14: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/14.jpg)
Semi-Flexible MWEs
• Semi-Flexible MWEs– MWEs with fixed order of elements– That are impenetrable for other words– Parts can be inflected
14
![Page 15: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/15.jpg)
Semi-Flexible MWEs
• Examples:– Chambre des représentants
• House of representatives
– Patatas fritas• French fries
– Mise au point automatique• Autofocus
– Calculateur analogique• Analogue computer
15
![Page 16: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/16.jpg)
Semi-Flexible MWEs
• Examples:– Cité plus haut
• Above-stated
– Résistant aux acides• Acid-proof
– Malade en altitude• Airsick
16
![Page 17: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/17.jpg)
Flexible MWEs
• Flexible MWEs• Allow or require inflection in multiple parts, and
• Allow permutations of subphrases, or
• Allow intrusion by other phrases, or
• Have controlled variation (bound pronouns)
17
![Page 18: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/18.jpg)
Flexible MWEs
– de plaat poetsen (‘bolt’)• Hij heeft gisteren de plaat gepoetst• …omdat hij de plaat wilde poetsen• Hij poetste gisteren de plaat
– But of course not just anything:• *Hij gepoetst plaat de heeft• *..omdat wilde poetsen hij de plaat
– to lose one’s temper• He lost his temper• She lost her temper
18
![Page 19: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/19.jpg)
Types of MWEs (II)
• Idioms• Semi-idioms• Support-verb constructions
19
![Page 20: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/20.jpg)
Types of MWEs(II)
• Idioms– Meaning not predictable from the components– The components have no or an unpredictable
meaning– Fixed (or very limited ) lexical item selection
20
![Page 21: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/21.jpg)
Types of MWEs (II)
• Idioms– Non-transparant
• de plaat poetsen, kick the bucket, casser sa pipe• Many restrictions on syntactic behavior (see
handout example (4)
21
![Page 22: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/22.jpg)
Types of MWEs(II)
• Idioms– Semi-transparant
• een bok schieten– Bok (male goat) = blunder (but only with schieten)
– Schieten (shoot) = make (but only with bok)
• dat varkentje wassen– Varkentje (little pig) = problem (only with wassen)
– Wassen (wash) = address, take care of (only with varkentje)
– Little restrictions on syntactic behaviour. See handout example (5)
22
![Page 23: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/23.jpg)
Types of MWEs(II)
• Semi-idioms (collocations)– One element occurs in its normal meaning– The lexical selection of the other element is
fixed or very limited– The other element has a special meaning– Very little restrictions on syntactic behaviour.
See handout example (6)
23
![Page 24: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/24.jpg)
Types of MWEs(II)
• Semi-idioms (collocations)– Examples
• Zware / * sterke tabak (heavy / *strong tobacco) `strong tobacco’
• Scherpe kritiek (sharp criticism) `severe criticism’• Heavy / *strong smoker
24
![Page 25: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/25.jpg)
Types of MWEs(II)
• Support verb constructions– Type I
• Direct object + verb
• Verb idiosyncratically determined by the direct object head noun
• Arguments of the noun often realized outside the NP in the VP. See handout (8), (9)
25
![Page 26: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/26.jpg)
Types of MWEs(II)
• Support verb constructions– Type I Examples
• Een poging wagen ‘dare an attempt’
• Een lezing houden / geven ‘hold / give a lecture’
• With hebben ‘have’: see handout (7)
• To pay attention to (aandacht schenken aan)
• To take advantage of
26
![Page 27: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/27.jpg)
Types of MWEs (II)
– Type II• Predicative complement (AP, PP)
– often itself idiomatic
– expressing a state or property
• Combination with intransitive or transitive verb is idiosyncratic
27
![Page 28: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/28.jpg)
Types of MWEs (II)
28
pred Literal intransitive meaning
In de war In the tangle Zijn / raken / * gaan / *komen / *zitten Confused (of humans)
In de war In the tangle *zijn / * raken / *gaan / komen / zitten Entangled, mixed-up
In zijn nopjes
In his studs-DIM
Zijn / raken / *gaan / *komen / *zitten Delighted,
De pijp uit The pipe out
Zijn / *raken / gaan / *komen / * zitten dead
Be / get / go / come / sit
![Page 29: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/29.jpg)
Overview
• MWEsLexical Representation of MWEs
• DuELME
• DuELME and LMF
• Extensions
• Summary
29
![Page 30: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/30.jpg)
Lexical representation
• Focus on flexible MWEs• Lexical representation for (grammar-based)
NLP systems; • NLP:
– A sequence of words that is an MWE must be parsed / generated
– A sequence of words that is an MWE must be recognized as an MWE
– And mapped to the appropriate semantics / translation
30
![Page 31: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/31.jpg)
Lexical representation
• Flexibility– Can be accounted for by assuming a syntactic
structure for an MWE – Is usually identical to the syntactic structure of
the literal expression no problem to parse or generate sequences
of strings that are MWEs.– Syntactic structure canNOT be determined
automatically by an NLP system (ambiguities)
31
![Page 32: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/32.jpg)
Lexical representation
• Flexibility (cont.)– Restrictions on flexibility must follow from
general principles or additional MWE-specific properties
– But: the syntactic structure is of course highly framework/ theory / implementation-dependent
32
![Page 33: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/33.jpg)
Lexical representation
• Examples of syntactic structures in – Lexical Functional Grammar (LFG): (1)– Tree Adjoining Grammar (TAG): (2)– M-Grammar : (3)
33
![Page 34: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/34.jpg)
Lexical representation
• Syntactic structure contains references to lexical items from the lexicon used in the NLP-system– Otherwise it cannot be parsed / generated
correctly– And the lexical properties must be correct!
• Inflection• Syntactic and semantic selection
– Extremely framework / grammar / implementation-dependent!
34
![Page 35: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/35.jpg)
Lexical representation
• Summary: MWE lexical representation– Syntactic structure compatible with NLP-
system– Correct references to lexical items in the NLP-
system’s lexicon corresponding to the MWE components
– Maximally framework / theory / implementation- independent
35
![Page 36: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/36.jpg)
Overview
• MWEs
• Lexical Representation of MWEsDuELME
• DuELME and LMF
• Extensions
• Summary
36
![Page 37: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/37.jpg)
DuELME
• Dutch Electronic Lexicon of Multiword Expressions
• App. 5000 entries• MWEs of different types:
– Mostly flexible idioms– Collocations (semi-idioms)– Mostly verbal
• Focus on syntax37
![Page 38: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/38.jpg)
DuELME
• Maximally theory-neutral: – (parameterized) Equivalence Class Method
(ECM):• Method to lexically represent MWEs• Procedure to incorporate MWEs thus represented
into a concrete NLP system
• See – Odijk 2004a, 2004b, 2013a, 2013b– Grégoire 2010
38
![Page 39: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/39.jpg)
DuELME Lexical Representation
• Lexical Entries– MWEs with the same syntactic structure
• by means of an MWE pattern id
– Components: sequence of their lemmas• Any order but the same order within one pattern
– Example sentence• Identical syntactic structure for each example in one
equivalence class
39
![Page 40: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/40.jpg)
DuELME Lexical Representation
• MWE Pattern descriptions– Mwe pattern id– Description (free text)
40
![Page 41: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/41.jpg)
DuELME Lexical Representation
• DuELME is a proto-lexicon– Lexical resource from which a lexicon can be
derived automatically or semi-automatically– By a well-defined procedure
• Link to DuELME description • Search GUI, User Documentation• Metadata • Product and license
41
![Page 42: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/42.jpg)
Incorporation Procedure
• Incorporation in some NLP system• Assumes the NLP system contains a parser• For each MWE pattern P do
– Bootstrap part• Contains some manual actions
– Repeat part (for each MWE of pattern P)• Fully automatic
• Procedure and example (no parameters)
42
![Page 43: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/43.jpg)
Further properties
• DuELME does contain models for syntactic structures– Based on de facto standard for Dutch– Used in Alpino, LASSY, CGN treebanks
• DuELME assumes the parameterized ECM• Encodes several lexical properties
– auxiliary used for perfect tenses (conjugation)– Negative and positive polarity (polarity)– Gender of nouns in an MWE– …
43
![Page 44: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/44.jpg)
Further properties
• MWEs have been extracted from corpora– After automatic parsing with Alpino– Using a variety of statistical and (morpho-)syntactic measures
• Corpora statistics have been included in DuELME– E.g., for een rol spelen ‘play a role’, tuple= rol spelen, freq=1612
• Number of ‘rol’: mor1: "sg 1563,pl 49,"• Dim form of ‘rol’: dim1: "nodim 1612,"• Det with ‘rol’: Det1: "een 918,de 311,die 98,zijn 48,NO 44,deze 38,geen
36,hun 31,welk 20,haar 19,"
• Ten example sentences from these corpora have been included for each MWE
44
![Page 45: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/45.jpg)
Overview
• MWEs
• Lexical Representation of MWEs
• DuELMEDuELME and LMF
• Extensions
• Summary
45
![Page 46: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/46.jpg)
DUELME and LMF
• LMF– Abstract metamodel for computational lexicons– Represented through UML class diagrams– Multiple serialisation options
• DuELME-LMF– UML class model created for DuELME– Serialized in XML
46
![Page 47: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/47.jpg)
DuELME Class Model
47
![Page 48: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/48.jpg)
DuELME Lexicon
• Lexicon– Lexical Entry 0..*– MWE Pattern 0..*
• MWE Pattern– MWE Pattern attributes– MappingList– MWE Node
• (see the example MWE and pattern in the handout)
48
![Page 49: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/49.jpg)
DuELME and LMF
• DuELME-LMF v. LMF – Compare DuELME Class Model with LMF
Core Package– Compare DuELME Class Model with LMF
NLP MWE patterns extension (normative)
49
![Page 50: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/50.jpg)
LMF Core Package
50
![Page 51: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/51.jpg)
LMF NLP MWE extension
51
![Page 52: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/52.jpg)
DuELME and LMF
• DuELME Class Model v. LMF Core Package– no Lexical Resource and Global Information
• This is an error
– Lexical Entry: no Form Class (but LMF requires one)
• Not needed for MWEs• Not desirable for components of MWEs since
DuELME is a proto-lexicon
52
![Page 53: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/53.jpg)
LMF Core Package
53
![Page 54: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/54.jpg)
DuELME and LMF
• DuELME Class Model v. LMF NLP MWE Extension– Richer but compatible:
• DataRecords: corpus-derived information• ExampleSentence• Alternative Components in ComponentList
• MWE Pattern
54
![Page 55: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/55.jpg)
LMF MWE Pattern Example
55
![Page 56: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/56.jpg)
Overview
• MWEs
• Lexical Representation of MWEs
• DuELME
• DuELME and LMFExtensions
• Summary
56
![Page 57: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/57.jpg)
NOT in DuELME
• Meaning
• Semantic selection restrictions
• Translation
57
![Page 58: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/58.jpg)
Meaning
• MWEs are described as a special kind of Lexical Entry
• Sense class, and all its dependents, can be used as with single word lexical entries
58
![Page 59: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/59.jpg)
LMF Core Package
59
![Page 60: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/60.jpg)
Meaning
• For collocations and semi-transparent idioms the meaning of each part?– Zware shag (lit. heavy tobacco, ‘strong
tobacco’) -> zwaar-a-3 shag-n-1 – Varkentje wassen (lit. pig-DIM wash)->
varkentje-n-1, wassen-v-7– Flater slaan (lit. blunder hit)-> flater-n-1 slaan-
v-10• (Sense IDs from Cornetto or should be added to Cornetto)
60
![Page 61: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/61.jpg)
Meaning
• And how they are combined(?)– Or maybe this follows from their syntactic
manner of combination?
• LMF makes no specific provisions for this
• Perhaps by adding a MWE in the other languages’ lexicons (‘address problem’)
61
![Page 62: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/62.jpg)
Semantic selection restrictions
• DuELME already specifies– Syntactic variables, and syntactic selection
restrictions– Semantic variables, and semantic selection
restrictions– Their mutual relation
• But not linked to Sense– This should be adapted
62
![Page 63: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/63.jpg)
DuELME Class Model
63
![Page 64: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/64.jpg)
Translation
• Elements for Translation in the Multilingual Notations Model ([ISO 08] Annex I, J, p. 48ff)
• Supports semantics based translation, possibly interlingual, and transfer
• Relations between entries from lexicons of different languages
• Can be adopted straightforwardly for MWEs in DuELME
64
![Page 65: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/65.jpg)
Translation
65
![Page 66: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/66.jpg)
Overview
• MWEs
• Lexical Representation of MWEs
• DuELME
• DuELME and LMF
• ExtensionsSummary
66
![Page 67: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/67.jpg)
Summary
• DuELME– Lexical entries for MWEs– With focus on syntax
• Almost no semantics
• No translational equivalence
– Still very incomplete• Lacks many syntactic restrictions (e.g.
passivisation)
• Semantic restrictions mostly not specified67
![Page 68: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/68.jpg)
Summary
• DuELME– Encoded in LMF
• But some improvements are needed
• Proposes some deviations
– Explicit Semantics:• only partly (ISOCAT, CLARIN Concept Registry)
• not formally encoded in the schema yet
68
![Page 69: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/69.jpg)
Summary
• DuELME– highly theory-neutral but
• Specifically aimed at NLP systems with an explicit grammar
• Some parts are highly Dutch-specific
69
![Page 70: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/70.jpg)
THANKS FOR YOUR ATTENTION
70
![Page 71: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/71.jpg)
References
[Gregoire, 2010] Nicole Gregoire. DuELME: A Dutch electronic lexicon of multiword expressions. Journal of Language Resources and Evaluation, 44(1/2):23-40, 2010.
[ISO 08] ISO. Language Resource Management – Lexical Markup Framework (LMF), ISO working document ISO/TC 37/SC 4 N453, ISO FDIS 24613:2008, 2008.
[Odijk, 2004a] Jan Odijk. Reusable lexical representations for idioms. In LREC-2004, number III, pages 903-906, Lisbon, Portugal, May, 26-28, 2004, 2004. ELRA.
[Odijk, 2004b] Jan Odijk. A proposed standard for the lexical representation of idioms. In Georey Williams and Sandra Vessier, editors, EURALEX 2004 Proceedings, volume I, pages 153-164, Lorient, France, July, 6-10, 2004, 2004. Universite de Bretagne Sud.
[Odijk, 2013a] Jan Odijk. Duelme: Dutch electronic lexicon of multiword expressions. In G. Francopoulo, editor, LMF - Lexical Markup Framework, pages 133-144. ISTE / Wiley, London, UK / Hoboken, US, 2013.
[Odijk, 2013b] Jan Odijk. Identifcation and lexical representation of multiword expressions. In P. Spyns and J.E.J.M Odijk, editors, Essential Speech and Language Technology for Dutch. Results by the STEVIN-programme, Theory and Applications of Natural Language Processing, pages 201-217. Springer, Berlin/Heidelberg, 2013.
[Zonneveld,1978] Wim Zonneveld. A Formal Theory of Exceptions in Generative Phonology. Foris Publ., Dordrecht, 1978.
71
![Page 72: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/72.jpg)
DO NOT ENTER HERE
72
![Page 73: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/73.jpg)
DuELME Lexicon
• Lexical Entry (see also the example)– Lexical Entry attributes– List of Components– DataRecords– Example Sentence– List of SyntacticVariables– List of SemanticVariables– List of SynSemVar Maps
73
![Page 74: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/74.jpg)
DuELME Lexicon
• List of Components– {Component}– Component attributes to express the parameters– Lemma with attributes for the writtenform and
the (separable) particle
74
![Page 75: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/75.jpg)
DuELME Class Model
75
![Page 76: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/76.jpg)
DuELME Lexicon
• Example Sentence– Full sentence and a tokenized version
76
![Page 77: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/77.jpg)
DuELME Class Model
77
![Page 78: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/78.jpg)
DuELME Lexicon
• DataRecords– For tuples identified as candidate MWEs– Contains statistics on occurring arguments,
modifiers, determiners, morphosyntactic properties, etc
– Formally structured but not in the class model hence not in XML
– Tuple =/= MWE
78
![Page 79: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/79.jpg)
DuELME Class Model
79
![Page 80: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/80.jpg)
DuELME Lexicon
• List of SyntacticVariables– syntactic open slots and restrictions– Restrictions: syntactic selection– E.g. HETVP, VP, NOHETSSUB, …
• List of SemanticVariables– semantic open slots and restrictions– Restrictions: limited number semantic selection
restrictions– E.g. ANIM, NONANIM, FEM PL, …
80
![Page 81: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/81.jpg)
DuELME Lexicon
• List of SynSemVar Maps– relates syntactic and semantic open slots
• Analogous to the NLP syntax and NLP Semantics extensions [ISO 08, pp 32, 38]
81
![Page 82: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/82.jpg)
DuELME Class Model
82
![Page 83: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/83.jpg)
DuELME Lexicon
• Lexical Entry attributes– Expression (text)– PatternId (text)– Type: collocation or unspecified– [Conjugation]: H (have), Z (be) or B (both)– [Comments] (text)– [Polarity]: NPI or PPI
83
![Page 84: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/84.jpg)
DuELME Class Model
84
![Page 85: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/85.jpg)
DuELME Lexicon
• MWE Pattern attributes– ID– Description– [comments]
• MappingList– Needed to relate actual example to tree model
• MWE Node – Used to define the syntactic tree model
85
![Page 86: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/86.jpg)
DuELME Class Model
86
![Page 87: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/87.jpg)
Lexical
• Lexical– De plaat poetsen ‘the plate polish’
• NOT any synonym:– Poetsen: afnemen-v-4, doen-v-8, kuisen-v-2 reinigen-v-1, schoonmaken-v-1– Plaat: afbeelding-n-1, plaatje-n-4, plaatje-n-6, draaischijf-n-1, grammofoonplaat-n-
1, bank-n-3, schol-n-3
– Een poging wagen / doen / *maken– *dare / *do / make an attempt– Perdre la tête/ la boule / *la cervelle– Se creuser la tête / * la boule / la cervelle
87
![Page 88: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/88.jpg)
Orthographic
• Orthographic– viz. , Bijv., i.v.m., http://www.uilots.nl
– Yahoo! , Groen!
– Aujourd’hui (v. l’homme)
– ‘s (avonds/morgens/middags)• D-gen evening-gen / morning-gen / afternoon-gen
• In the evenings / mornings / afternoons
• Is dependent on the tokenization rules (cf. the normal rules of combining them)
88
![Page 89: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/89.jpg)
Phonological
• Optional Intervocalic /d/ deletion obligatory in some MWEs [Zonneveld 1978]
89
expression literal meaning
Over de rooie / *rode (gaan/zijn/raken)
Over the red / red (go/be/get) Lose one’s cool
Om de dooie / *dode donder niet
For the dead / deadthunder not
Absolutely not
Je niet in de kouwe / *koude kleren gaan zitten
You not in the cold cloths go sit
Affect you seriously
Een gouwe /* gouden ouwe / *oude
A gold old A classical music hit
![Page 90: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/90.jpg)
Morphological
90
Phenomenon Example Literal Meaning
Obl. diminutive Het lood*(je) leggen The lead-DIM lay ‘die’
Obl. diminutive Dat varken*(tje) wassen That pig-DIM wash ‘address that problem’
Obl. plural De *raap is / rapen zijn gaar
The turnip is / turnips are cooked
‘there is trouble’
Exceptional morphology
Van goeden huize Of good-EN house-E From good homes
Exceptional morphology
Zonder aanzien des persoons
Without regard the-GEN person-GEN
Without respect of persons
![Page 91: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/91.jpg)
Syntactic
91
Syntax Example Literal Meaning
Obl. indefinite (*de) rekening houden met
(*the) count keep with
‘take into account’
Oblig no –e suffix Het bijvoeglijk(*e) naamwoord
(v. het klein*(e) meisje
The adjectival nominal
The little girl
‘the adjective’
‘The little girl’
Exceptional government
Ten gevolg*(e) vanv.Als gevolg(*e) van
To consequence of
As consequence of
‘as a consequence of’
![Page 92: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/92.jpg)
Semantic
92
Expression Literal Meaning
De plaat poetsen Polish the plate ‘bolt’
Dat varkentje wassen Wash that little pig ‘address that problem’
Een bok schieten Shoot a goat ‘make a blunder’
Een flater slaan Hit a blunder ‘make a blunder’
![Page 93: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/93.jpg)
Pragmatic
• Pragmatic– Ladies and Gentlemen– Ik heb gezegd. (lit. I have said)– Eet smakelijk! (Bon appétit!, Enjoy!)– Sincerely yours
93
![Page 94: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/94.jpg)
Translational
• Translational properties
94
Expression Literal Translation
Laten zien Let see E. show, F. montrer
Witte wijn White wine P. vinho verde
Nuclear power plant D. atoomcentrale, G. Kernkraftwerk
Space probe F. Sonde spatiale
Iemand iets laten weten Someone something let know
E. inform someone of something
![Page 95: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/95.jpg)
The normal rules
• Example: MWE?– iemand een zoen geven
– Someone a kiss give
– Give someone a kiss
• Productively related– van iemand een zoen krijgen
– From someone a kiss get
– `be kissed by someone’
95
![Page 96: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/96.jpg)
The normal rules
• Instead of zoen-n-1 one can also have other words meaning ‘body touch’
• kus-n-1 and its hyponyms– lik-n-4, smak-n-3, smok-n-1, afscheidskus-n-1, kushandje-n-1, french kiss-n-1, tongkus-n-
1, tongzoen-n-1, doodskus-n-1, nachtkus-n-1, nachtzoen-n-1, klapzoen-n-1, smakker-n-1, voetkus-n-1, vredekus-n-1, vredeskus-n-1, handkus-n-1, judaskus-n-1, zuigzoen-n-1
• liefkozing-n-1, ‘caress’• Words meaning ‘kick’, ‘slap’ and other forms of ‘body touching’• schop-n-1, trap-n-2, fleer-n-1, haal-n-2, klap-n-2, muilpeer-n-1, opflikker-n-1,
peer-n-4, klets-n-3, mep-n-1, pats-n-2, pets-n-1, tik-n-1, tikje-n-2, duw-n-1, zet-n-1, zetje-n-1, por-n-1, stoot-n-1, schouderduw-n-1, kontje-n-2, bodycheck-n-1, schop-n-1, trap-n-2, doodschop-n-1, hakje-n-1, kukkel-n-1
• knietje
96
![Page 97: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/97.jpg)
The normal rules
• But not:– aanraking-n-2, contact-n-1, gefriemel-n-1, gefrunnik-n-1, gepriegel-n-1, aanslag-
n-5, steek-n-1, touche-n-3, betasting-n-1, kneep-n-1, handtastelijkheid-n-2, aanraking-n-1, beroering-n-2, gewelddadigheid-n-1, geweldpleging-n-1, molest-n-1, molestatie-n-1, bal-n-7, schot-n-2,
– (meaning ‘touch’, ‘contact’, etc.)
• And unclear:– lik-n-1, aai-n-1, streling-n-1
– (‘lick’, ‘caress’, ‘caress’)
97
![Page 98: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/98.jpg)
The normal rules
• describe such constructions by means of properties of the verbs geven and krijgen?– preferable given its productive nature– Only if we can characterize the relevant words by
means of independently required properties
• NLP context– We might invent an ad-hoc feature– But are there resources with this feature? (not Dutch
Wordnet (Cornetto)) 98
![Page 99: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/99.jpg)
Reflexive Verbs
• Example– Hij schaamt *(zich)– He ashamed REFL– ‘he is ashamed’
• Analysis– Schamen: reflexivity=true– Rule that spells out right reflexive pronoun
99
![Page 100: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/100.jpg)
Verb Particle Combinations
• Example– Houden = ‘keep’, transitive– Op + houden = ‘stop’, intransitive
• Analysis– Op + houden:
• houden: particle = op, intransitive
• Rule to introduce / check presence of the right particle
– Houden: particle = _, transitive
100
![Page 101: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/101.jpg)
Prepositional Complements
• Example– Houden `keep’ v.– Houden van (lit. keep of, ‘love’)
• Analysis– houden van, intransitive, takes PCOMP
• houden with property: complprep = van
• Rule to introduce / check presence of van
– Houden: complprep= _, transitive101
![Page 102: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/102.jpg)
Inflection
• Plegen 1, regular conjugation (pleegde) ‘commit’
• Plegen 2, irregular conjugation (placht) ‘do usually’
• Hij pleegde een moord => regular conjugation
• He committed a murder• *Hij placht een moord
102
![Page 103: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/103.jpg)
Selection
• Example 1– Nemen 1 subcat=[subj/NP, obj/NP] ‘take’– Nemen 2 subcat=[subj/NP, obj/NP, compl/PP]
‘accept as’– Iets in acht nemen – something in attention take, ‘obey’(of rules
etc.)– Requires nemen_2
103
![Page 104: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/104.jpg)
Selection
• Example 2– Geven`give’ semantically takes 3 arguments– Syntactically: subj/NP, obj/NP, iobj/NP or PP– Indirect object optional– Absent indirect object still leads to an
interpretation with 3 arguments
• But MWE een gil geven lit. a cry give, `give a shout’ requires 2 syntactic arguments, Idem: de geest geven (the ghost give) ‘die’
104
![Page 105: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/105.jpg)
Selection
• Example 3– Heten `be called’ 2 arguments– Syntactically: subj/NP, predc/NP– Ik heet Jan – I am-called Jan– But MWE iemand welkom heten lit. someone
welcome be-called, `welcome someone’ requires 3 syntactic arguments, subj, obj, predc
105
![Page 106: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/106.jpg)
Selection
• Many such cases with support-verb constructions– Aandacht hebben voor, etc.– See handout (5)– These require special treatment
106
![Page 107: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/107.jpg)
SEQCI
• Example:– Idiom Descriptions
• Idp30;De pijp uit gaan;Hij is de pijp uit gegaan• Idp30;De boot in gaan;Hij is de boot in gegaan• Idp30:Het schip in gaan;Hij is het schip in gegaan
– Idiom pattern definition• Idp30• Idiom headed by a verb taking a postpositional PP
containing a definite singular NP and one free argument as subject
107
![Page 108: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/108.jpg)
SEQCI
• Incorporation Method– Bootstrap part, once for each idiom
pattern – Repeat Part, for each idiom description
108
![Page 109: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/109.jpg)
SEQCI
• Bootstrap part (`hij is de pijp uit gegaan’)1. Parse the example sentence of an idiom description
with idiom pattern P, yielding the Reference Parse 2. Define a transformation to turn the reference parse into
the idiom structure ( Parse Transformation, PT) 3. Determine the list of unique IDs of the lexical items in
the idiom structure for the system derived from the reference parse (Idiom Component ID List, ICIL)
4. Define a transformation to relate ICL and ICIL (Idiom Component Transformation, ICT)
5. Apply the ICT to the ICL, yielding the transformed ICL (TICL) and check that each item in it equals the base form of the corresponding element on the ICIL
109
![Page 110: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/110.jpg)
SEQCI
Repeat part, for each idiom description I(`hij is de boot in gegaan’)1. Parse example sentence (Syntactic Structure)2. Apply IPT and check identity with idiom
structure modulo the lexical items3. Select the component IDs from the parse tree, in
order to obtain the ICIL)4. Apply ICT to the ICL of I, yielding the TICL5. Check that <bf(c1),…bf(cn)>=TICL
where ICIL = <c1, …cn> ( TICL check)
110
![Page 111: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/111.jpg)
SEQCI
• Advantages– Technically Simple– As theory/grammar/implementation-
independent as possible– No need for prescribing syntactic structures– System-specific aspects are derived from the
NLP-system itself
111
![Page 112: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/112.jpg)
SEQCI: Reference Parse
Rdecl[Rperf [Rsubst(j) [Rsent [Rsubst(i)
[RVP[$aV_00_ga, RPPpost [$s_prep1286700, VAR_i ] ]RNPdef [$aN_00_pijp] ],VAR_j ],RNP[$hij_PRON] ]]
112
![Page 113: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/113.jpg)
SEQCI: Idiom Structure
• IPT: IPT: Delete Rdecl, Rperf, Rsubj(j), RNP[$hij_Pron]
• D-tree for vpid30 (simplified):Rsubst,i
[RVP [$aV_00_ga, RPPpost [$s_prep1286700, VAR_i ]
], RNPdef [$aN_00_pijp]]
113
![Page 114: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/114.jpg)
ICIL
< $aV_00_ga, $prep1286700, $aN_00_pijp >
114
![Page 115: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/115.jpg)
ICT
ICL: <de, pijp, uit, gaan>Must be turned into: < gaan, uit, pijp>
ICT: 1 2 3 4 => 4 3 2
115
![Page 116: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/116.jpg)
TICL
TICL = ICT(ICL) = ICT(<de, pijp, uit, gaan>) = <gaan, uit, pijp> = < Bf($aV_00_ga), Bf($prep1286700),
Bf($aN_00_pijp)>
116
![Page 117: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/117.jpg)
Syntactic Structure
Rdecl[Rperf [Rsubst(j) [Rsent [Rsubst(i) [RVP[$aV_00_ga,
RPPpost [$s_prep1286800, VAR_i ] ], RNPdef [$aN_00_boot] ],VAR_j ],RNP[$hij_PRON] ]]
117
![Page 118: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/118.jpg)
Apply IPTRsubst,i
[RVP
[$aV_00_ga,
RPPpost
[$s_prep1286800,
VAR_i
]
],
RNPdef [$aN_00_boot]
]
118
![Page 119: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/119.jpg)
ICIL
ICIL=< $aV_00_ga , $s_prep1286800, $aN_00_boot>)
119
![Page 120: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/120.jpg)
TICL
ICT(ICL) =
ICT(<de, boot, in, gaan>)=
<gaan, in, boot>
120
![Page 121: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/121.jpg)
TICL check
<bf($aV_00_ga), bf($s_prep1286800), bf($aN_00_boot) > =
TICL = <gaan, in, boot>
121
![Page 122: Multiword Expressions and LMF Jan Odijk PARSEME Workshop Iaşi, 21-22 Sep 2015 1.](https://reader036.fdocuments.us/reader036/viewer/2022062322/5697bfb91a28abf838c9fc8b/html5/thumbnails/122.jpg)
The normal rules
• Fixed combinations of open class word and closed class word– Reflexive verbs– Verb particle combinations– Prepositional complements
• Described by means of a property of the open class word + special rules no MWEs in these systems
122