What do cats have to do with explicit semantics?
-
Upload
menzo-windhouwer -
Category
Documents
-
view
379 -
download
1
Transcript of What do cats have to do with explicit semantics?
www.isocat.org
What do cats have to do with explicit semantics?
Menzo WindhouwerMPI for Psycholinguistics
Ineke SchuurmanKU Leuven & Utrecht University
www.isocat.org
TTNWW and ISOcat
• TTNWW: TST Tools voor het Nederlands als Web services in een Workflow
• CLARIN-NL and VL pilot project
• Goal: to enable researchers in the humanties to use our tools and resources in an easy way, even when a whole series of tools and resources is involved.
20 January 2012 CLIN22 - TTNWW Project 2
www.isocat.org
TTNWW and ISOcat
• Issues when making use of such a ‘chain’:
– Is the meaning of notion X in resource/tool A the same as that in resource/tool B ?
– Is the meaning of notion X in resource/tool A and that of Y in resource/tool B the same?
– Or, if not the same, are they related? If so, how?
= ISOcat and friends to the rescue !
20 January 2012 CLIN22 - TTNWW Project 3
www.isocat.org
Explicit semantics
• Language resources are valuable assets
– store them in an archive to assure persistency!
– later generations can research material that only now can still be collected
• Problem: used terminology might ‘rot’
– terms get a (slightly) different meaning over (long) periods of time
– later generations need to know the meaning of today
• Solution: make semantics explicit
20 January 2012 CLIN22 - TTNWW Project 4
www.isocat.org
The ISOcat Data Category Registry
http://www.isocat.org/
• An ISOcat data category is “an elementary descriptor in a linguistic structure or an annotation scheme” (ISO 12620:2009)
• ISOcat data categories have unique and persistent identifiers, which can be resolved over the web
http://www.isocat.org/datcat/DC-78
20 January 2012 CLIN22 - TTNWW Project 5
www.isocat.org
Annotate all elements in a linguistic resource
20 January 2012 CLIN22 - TTNWW Project 6
/language/ /alphabet/
/writtenForm/
/japanese/ /ipa/
/lexicon/
/entry/
/lemma/
www.isocat.org
Sharing structure
• Using ISOcat data category references specifications of elementary descriptors can be shared between structures
• How to share (annotated) structures?
• A companion registry for ISOcat is under development: SCHEMAcat
• This registry should persistently store any kind of schema, e.g., XML schemata, EBNF grammars
20 January 2012 CLIN22 - TTNWW Project 7
www.isocat.org
20 January 2012 CLIN22 - TTNWW Project 8
Annotated CGN/DCOI grammartag = pos '(' feat* ')'
# @dcr:datcat ‘WW’ http://www.isocat.org/datacat/DC-1424
# @dcr:datcat ‘TW’ http://www.isocat.org/datacat/DC-1334
# @dcr:datcat ‘VG’ http://www.isocat.org/datacat/DC-1226
# @dcr:datcat ‘TSW’ http://www.isocat.org/datacat/DC-2717
pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW'
feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' | 'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE'
NTYPE = 'soortnaam' | 'eigennaam'
GETAL = 'enkelvoud' | 'meervoud' | 'getal'
GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief'
GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig'
NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief'
POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij'
BUIGING = 'zonder' | 'met-e' | 'met-s'
GETAL-N = 'zonder-n' | 'meervoud-n'
WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘
# @dcr:datcat PVTIJD http://www.isocat.org/datacat/DC-1286
# @dcr:datcat ‘verleden’ http://www.isocat.ord/datacat/DC-1347
# @dcr:datcat ‘conjunctie’ http://www.isocat.ord/datacat/DC-1843
PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief'
PVAGR = 'enkelvoud' | 'meervoud' | 'met-t'
NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord'
VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald'
PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar'
PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o'
STATUS = 'vol' | 'gereduceerd' | 'nadruk'
NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv'
LWTYPE = 'bepaald' | 'onbepaald'
VZTYPE = 'initieel' | 'versmolten' | 'naal'
CONJTYPE = 'nevenschikkend' | 'onderschikkend'
SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect'
www.isocat.org
Sharing relations
• Among data categories and (other) concepts ontological relationships can be defined
• These relationships allow crosswalks between various resource models– discover related resources which use (different levels
of) semantically close data categories
• RELcat is a companion registry which will allow storing (and sharing) a linguists individual view on these relationships
http://lux13.mpi.nl/relcat/ (alpha)
20 January 2012 CLIN22 - TTNWW Project 9
www.isocat.org
Semantic network
20 January 2012 CLIN22 - TTNWW Project 10
Data Category Registry - ISOcat
Linguistic knowledge baseLinguistic resource (schema)Data categories
Containers
Concepts
Concept Registry
Relation
Relation Registry - RELcat
Schema Registry - SCHEMAcat
www.isocat.org
Conclusion
• CLARIN(-NL/-VL), including TTNWW, is working towards a set of registries that enable the community to collaboratively make semantics explicit by:
– sharing elementary descriptors: data categories
• persistently
– sharing structure: schemata
• persistently
– sharing ontological relations
• individual world views20 January 2012 CLIN22 - TTNWW Project 11
www.isocat.org
What do cats have to do with explicit semantics?
20 January 2012 CLIN22 - TTNWW Project 12
www.isocat.org
20 January 2012 CLIN22 - TTNWW Project 13
Thank you for your attention!
Visitwww.isocat.org
Questions?www.isocat.org/forum/