CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)
description
Transcript of CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)
![Page 1: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/1.jpg)
CLARIN-NL ISOcat workshop 2012part 2 (10-10-2012)
Ineke Schuurman
Menzo Windhouwer
![Page 2: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/2.jpg)
• Issues brought up by participants– Which elements are to be included in ISOcat– (CLARIN) standards, TEI etc– Type of DC– When to create a new DC/adapt an existing one– When to create several DCSs– Name of DC, several DCs with same name– How to deal with larger amounts of data
![Page 3: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/3.jpg)
What to include?
• ALL concepts dealing with linguistics/ metadata– Van Dale EN-NE
include (overgankelijk werkwoord)
1) omvatten
2) (mede) opnemen
==> 'overgankelijk werkwoord' / 'transitive verb' is to be included, same for 'overg.ww', 'trns.v.'
• One and the same DC!
![Page 4: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/4.jpg)
What to include?
‘transitive verb’
• Several entries in ISOcat–DC-1405A verb which takes a direct object; that is, a verb that
expresses an action which directly affects another person or thing.
–DC-3532A transitive verb is a verb that takes a direct object,
and describes a relation between two participants [Crystal 1997: 397; Payne 1997: 171]
– And several more, so... which one to select?
![Page 5: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/5.jpg)
• When (not) to adopt an existing DC– It should ‘match’ with the way you use a
specific notion in your annotation scheme, application, …
– It should come with the same profile and type
• That being said– Reuse a CLARIN NL/VL DC when possible
(contact Ineke when such a definition is incorrect)
![Page 6: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/6.jpg)
Same name
• Not really a problem when it are good DCs, not even when coming with the same profile
• PositivePolarity– In general, positive polarity refers to an
assertion that contains no marker of negation [Crystal 1980: 299]. (DC-3405)
– the property of a word or concept to express positive sentiment (myDC-xx)
• Whether you can reuse DC-3405 depends on your use of the concept!
![Page 7: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/7.jpg)
Same name
• Do not avoid reuse of a name when it is the name commonly used!
• Another type of duplicate names where one concept entails the other one:
– meewerkend voorwerp – meewerkend en belanghebbend voorwerp
– event (also called 'eventuality', and including 'state')
– event (sister of 'state')
![Page 8: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/8.jpg)
What defines a good DC?
Reusable definition
NOT
conversation (DC-2661)Communication event with more than two
participants
mother tongue (DC-2955)[…] a speaker’s mother tongue
![Page 9: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/9.jpg)
What defines a good DC?
Correct definition
NOT (?)Actor (DC-4146)
a participant in an action or process
Question: is an addressee to be considered an actor? (used in DC-4158, no proper definition yet)
![Page 10: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/10.jpg)
What defines a good DC?
Meaningful definition
NOT
annotation format (DC-2562)Specifies the annotation format that is used …
source language (DC-2494)Indicates if a language is a source language
![Page 11: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/11.jpg)
Not that good examples
Mother tongue (DC-2955)Specifies whether the language is a speaker’s mother
tongue
Mother’s language (DC-4516)[…] NOT necessarily the mother tongue […]
- There is no definition of concept ‘mother tongue’
(Relation with /home language/ , /primary language/,
/heritage language/?)
- And why ‘speaker’?
![Page 12: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/12.jpg)
Rule
Make your definition• as general as possible• as specific as necessary
![Page 13: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/13.jpg)
Standards
• Within ISOcat currently there are little or no standards,
Therefore
• CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge, selecting new flag “recommended by CLARIN NL/VL”
![Page 14: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/14.jpg)
Standards
Another issue wrt standards 'included' in ISOcat
- Athens Core DC's (recommended by metadata/CMDI): we are currently adapting them in order to avoid tautologies and/or correct smaller ‘errors’
Target language: indicates if the language is the target language
Conversation: […] three or more participants
Same may be necessary for TEI Headers etc
![Page 15: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/15.jpg)
DC/DCS and profile
• Profiles are not added automatically, a DCS may contain elements with various profiles (although you may decide to create several DCSs) (do select proper names!)
• In case the profile you need is not yet available, contact Menzo and Ineke
![Page 16: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/16.jpg)
Part B: do’s & don’tsDo’s:• Create a DCS for your scheme (name
project, ann.scheme, …)• Provide clear definition (short, to the point)
for your scheme, application, …. • Take care not to leave concepts used in your
definition undefined or vague• Use appropriate vocabulary (per profile)
• Check ‘adopted’ DC’s regularly till standardization !
![Page 17: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/17.jpg)
Do’s (continued)
When creating a DC, fill out• Justification: used in XYZ, part of tagset
N• Language section
– Always English language section– Strong recommendation: sections for object
language(s), for working language manual– Sections in the various languages should
match (+/- be translations of each other)
![Page 18: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/18.jpg)
Do’s (continued)
When creating a DC, fill out
• Example section – Note that *negative* examples may be very
helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
![Page 19: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/19.jpg)
Example sections
Suppose you want to illustrate a German phenomenon:
• Ex.sec. in EN language section– German ex with transl in English
• Ex.sec. in NL language section– German ex with transl in Dutch
• Ex.sec. in EN linguistic section– EN example
• Ex.sec. in NL linguistic section– NL example with translation in English
![Page 20: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/20.jpg)
Don’ts
• Confuse Language and Linguistic section– Latter contains language specific values for
closed domains
• Be (too) language specific in definition
• Mention scheme in definition
• Use several definitions in one DC
• Circular definitions
• Rely on authority
• Rely on standardized status– Definition should fit YOUR scheme, etc
![Page 21: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/21.jpg)
Procedure - 1
![Page 22: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/22.jpg)
Procedure - 2
![Page 23: CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012)](https://reader035.fdocuments.us/reader035/viewer/2022062804/56814932550346895db67824/html5/thumbnails/23.jpg)
.
-- End --