The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data...

8
The ISO 12620 Data Category Registry • ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future) container Data Categories (DCs) – ISO DIS 24619 compliant Persistent IDentifiers (PIDs) for each DC, e.g., http://www.isocat.org/datcat/DC-396 – The DC Reference schema, a small XML vocabulary, to embed these DC PIDs in XML documents, e.g., <rng:element name="POS" dcr:datcat="http://.../DC-396" />

Transcript of The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data...

Page 1: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

The ISO 12620 Data Category Registry

• ISO 12620:2009 introduces– A web-based electronic Data Category Registry

(DCR) for simple, complex and (in the future) container Data Categories (DCs)

– ISO DIS 24619 compliant Persistent IDentifiers (PIDs) for each DC, e.g.,

http://www.isocat.org/datcat/DC-396

– The DC Reference schema, a small XML vocabulary, to embed these DC PIDs in XML documents, e.g.,

<rng:element name="POS" dcr:datcat="http://.../DC-396" />

Page 2: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

Standards and Data Category references

• Some standards already provide their own constructs for embedded DC references

• However, these constructs sometimes– Use ambiguous DC identifiers instead of PIDs– Are not able to handle the current DC PIDs– Do not cover all DC types, i.e., container, complex

and simple DCs

Page 3: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

Specification Can handle DC PIDs? Handles DC types Suggestion

DTDs No None Use Relax NG or XML Schema instead

Relax NG Yes All Use the DC Reference vocabulary

XML Schema Yes All Use the DC Reference vocabulary

TEI ODD Yes All Use <equiv/>

TMF Yes Complex DCs Use Relax NG of XML Schema instead, and use the DC Reference vocabulary

LMF Unspecified Unspecified Use the DC Reference vocabulary for an LMF compliant schema

TBX XCS Yes Complex DCs Value picklist needs to be opened up and may need provisions for the upcoming container DCs

Geneter No None Use Relax NG or XML Schema instead or use the DC Reference vocabulary in the instance

MAF Yes Complex and simple DCs May need provisions for the upcoming container DCs

LAF Yes Complex DCs Needs provisions for the other DC types

Page 4: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

Improving the current situation• Use Relax NG, XML Schema or ODD instead of DTD• Create open schemas, which allow adding attributes

and/or elements from foreign namespaces, or embed dcr:datcat or dcr:valueDatcat hooks at the proper places in the schemas

• The DC Reference vocabulary can then be used to embed DC references for various DC types at the right places

• For existing specifications with some support for DC references, make sure all relevant DC types can be covered, and make use of DC PIDs

Page 5: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

References

• Latest version of the DC References vocabulary– http://www.isocat.org/12620/

• Survey of the support for DC references– M.A. Windhouwer, S.E. Wright, M. Kemps-Snijders.

Referencing ISOcat data categories. In proceedings of the LRT standards workshop (LREC 2010), Malta, May 18, 2010.

– http://www.lrec-conf.org/proceedings/lrec2010/workshops/W4.pdf

Page 6: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

ODD example<elementSpec xmlns="http://www.tei-c.org/ns/1.0" module="header” ident="availability">

<equiv name="availability" uri="http://lux13.mpi.nl/datcat/DC-12094"/>… <attList> <attDef ident="status" usage="opt"> <equiv name="availabilityStatus" uri="http://lux13.mpi.nl/datcat/DC-12019"/> <defaultVal>unknown</defaultVal> <valList type="closed"> <valItem ident="free"> <equiv name="availabilityStatusFree" uri="http://lux13.mpi.nl/datcat/DC-12020"/> <desc>the text is freely available.</desc>…

Note: this example does use PIDs from the ISOcat test server.

Page 7: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

LMF example<LexicalResource xmlns:dcr="http://www.isocat.org/ns/dcr">

… <LexicalEntry> <feat att="partOfSpeech" dcr:datcat="http://www.isocat.org/datcat/DC-1345" val="commonNoun" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"/> <Lemma> <feat att="writtenForm" dcr:datcat="http://www.isocat.org/datcat/DC-1836" val="clergyman"/> </Lemma>

Note: once the DCR supports container data categories LexicalResource, LexicalEntry and Lemma could also have dcr:datcat attributes.

Page 8: The ISO 12620 Data Category Registry ISO 12620:2009 introduces – A web-based electronic Data Category Registry (DCR) for simple, complex and (in the future)

LAF example<typeDescription loc="http://www.isocat.org/...">

<name><description><supertypeName><features>

<featureDescription loc="http://www.isocat.org/..."><name><description><values>

<valueDescription loc="http://www.isocat.org/...">…

Note: each value needs it’s own DC reference hence the addition of the valueDescription element.