1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
-
Upload
corey-woods -
Category
Documents
-
view
212 -
download
0
Transcript of 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
![Page 1: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/1.jpg)
1
ISOCAT Proposed solutions for
Problems encountered in DUELME-LMF
Jan Odijk
Nijmegen 21 Sep 2010
![Page 2: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/2.jpg)
2
Overview
• General• Standardized DCs?• Multiple relevant DCs in ISOCAT• Overlap with other projects• Container Data Catgegories• Almost Identical DCs• Language Sections• Existing Tagsets
![Page 3: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/3.jpg)
3
General
• Always try to map to an existing ISOCAT DC, – Where possible– Irrespective of whether the ISOCAT DC is part of an
official standard• If not possible, or if there is uncertainty
– Create a new DC, but– Also specify the relation with existing closely related
ISOCAT DCs. Provide • Type of the relation
– dropdown list to be provided by RELCAT developers,» E.g. equals, almost-equals, is hyponym of , is hyperonym of, etc.
• Textual clarification of the deviation
![Page 4: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/4.jpg)
4
General
• Relation to be entered into Relation Registry (RR) as soon as it is available
• Temporarily Proposed notation:– recordset in CSV format with records consisting of 4
fields:• Relation type (from drop-down list; should be ISOCAT DCs
themselves)• Data-category 1 (ISOCAT PID)• Data-category 2 (ISOCAT PID)• Clarification (rich text)• Plus some administrative info: User id, creation date etc.
– To import into RR as soon as available
![Page 5: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/5.jpg)
5
Standardized DCs?
• Ignore +/- standard status of DC in ISOCAT
• If needed, use relations in Relation Registry
![Page 6: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/6.jpg)
6
Multiple ISOCAT DCs
• Map to an existing DC that is identical (wherever possible)
• Use relations to relate it to almost identical DCs in ISOCAT
![Page 7: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/7.jpg)
7
Overlap with other projects
• Consult with other projects
• Registry of topics people/projects are working on– Dieter took some initiative– http://spreadsheets.google.com/ccc?key=0Al5Lw-
npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE&hl=en&authkey=CL_Wl4ID
• This workshop (and others if needed)
![Page 8: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/8.jpg)
8
Container data categories
• ISOCAT might be extended for this
• Probably not really a problem in the short term(?)
![Page 9: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/9.jpg)
9
Almost identical DCs
• For ill-defined DCs in ISOCAT– Suggest better definitions and submit them to the
Thematic Domain Group– Use relations to relate your DC to existing
slightly different DCs (see later)
![Page 10: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/10.jpg)
10
Almost identical DCs
• Example: Noun• Noun is a Part of Speech assigned to words which
share specific morphosyntactic (inflectional), morphological, syntactic (and semantic) properties
– morphosyntactic (inflectional) properties: • person, number, gender/class. declension class, case, …• Specific morphological combinatorial potential (derivation,
compounding), in particular diminutives, augmentatives• specific syntactic combinatorial potential
• Where each language selects a specific subset of these properties (as illustrated in the language sections.
![Page 11: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/11.jpg)
11
Language Sections?
• The highly (Polish) language-specific – http://www.isocat.org/datcat/DC-2704 (noun)
• Noun [subst] contains lexemes infecting for number and case, with a lexically determined grammatical gender, which do not have the category of person, e.g., woda `water', profesor `professor', pięciokrotność 'fivefoldness'; this class also contains defective plurale tantum and singulare tantum lexemes, but not depreciative lexemes. Grammatical categories of noun [subst]: number (http://www.isocat.org/datcat/DC-2709), case (http://www.isocat.org/datcat/DC-2720), gender (http://www.isocat.org/datcat/DC-2728).
• Can now be part of the Polish language section of the DC Noun with the definition given in the previous slide
![Page 12: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/12.jpg)
12
Existing Tagsets
• Make sure all DCs of an existing de facto standard tag set are in ISOCAT
– Either existing DCs– Or newly added DCs
• Assign all DCs from such a tag set to a new closed complex category
– E.g. DC d-coiTagset, ipipanTagset, etc.– (and/or to datacategory set?)
![Page 13: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/13.jpg)
13
More…
• Problems and Proposed solutions– Odijk (2009), “Data Categories and ISOCAT: some remarks from a simple
linguist", presentation held at FLaReNet/CLARIN Standards Workshop, Helsinki, September 27, 2009
– Odijk, J. (2010), ""Relations between Data Categories, presentation held at the CLARIN Relation Registry Workshop, MPI, Nijmegen, January 8, 2010
• Both to be found (inter alia) on http://www.clarin.nl/node/80
![Page 14: 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.](https://reader036.fdocuments.us/reader036/viewer/2022083009/5697c0091a28abf838cc7084/html5/thumbnails/14.jpg)
14
CLARIN-NL
Thanks for your attention!
http://www.clarin.nl/