Platform Interoperability
Guidelines March 16, 2017
Deliverable Code: D5.5
Version: 1.0 – Final Dissemination level: Public
First version of the guidelines for infrastructure interoperability structured into sets that target the stakeholder groups (providers of content and software resources)
H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 1 of 30
Document Description D5.5 – Platform Interoperability Guidelines
WP5 – Interoperability Framework
WP participating organizations: ARC, USFD, UNIMAN, AK, UoG, GRNET
Contractual Delivery Date: 9/2016 Actual Delivery Date: 3/2017
Nature: Report Version: 1.0 Final
Public Deliverable
Preparation slip Name Organization Date
From Penny Labropoulou Dimitris Galanis Angus Roberts Matt Shardlow Giulia Dore Thomas Margoni Byron Georgantopoulos Panagiotis Zervas Pythagoras Karampiperis Richard Eckart de Castilho
ARC ARC USFD UNIMAN UoG UoG GRNET AK AK UKP-TUDA
21/02/2017
Edited by Penny Labropoulou ARC 16/03/2017 Reviewed by Vangelis Floros
Christian O'Reilly Mappet Walker Lucas Anastasiou
GRNET EPFL FRONTIERS OU
07/03/2017
Approved by Androniki Pavlidou ARC 16/03/2017 For delivery Mike Hatzopoulos ARC 21/03/2017
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 2 of 30
Document change record Issue Item Reason for Change Author Organization V0.1 Draft version Initial version sent for comments Penny Labropoulou ARC
V0.2
Draft version Version sent to internal reviewers Penny Labropoulou ARC
V0.3 Draft version Version from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker
GRNET EPFL FRONTIERS
V0.4 Draft version Version sent to internal reviewers (second round)
Penny Labropoulou ARC
V0.5 Draft version Versions from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker Lucas Anastasiou
GRNET EPFL FRONTIERS OU
V0.9 Pre-final version Version incorporating the internal reviewers' comments; pending final approval
Penny Labropoulou ARC
v1.0 Final version Final version; incorporating all comments Penny Labropoulou ARC
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 3 of 30
Table of Contents
1. Introduction 14
2. The OpenMinTeD platform 14
3. Target audience 17
4. Background and methodology of work 18
5. The OMTD-SHARE metadata schema 20
6. Structure of the guidelines 23
Appendix A - References 25
Appendix B – Acknowledgements & Contributors 26
Appendix C - Guidelines 29
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 4 of 30
Table of Figures
Figure 1. Overview of the OMTD-SHARE data model .............................................................................. 22
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 5 of 30
Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.
In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.
The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.
This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union.
The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. (http://europa.eu.int/)
OpenMinTeD is a project funded by the European Union (Grant Agreement No 654021).
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 6 of 30
Acronyms API Application Programming Interface LR Language Resource NLP Natural Language Processing ML Machine Learning OA Open Access OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting OKFN Open Knowledge Foundation OMTD OpenMinTeD OWL Web Ontology Language PDF Portable Document Format RDF Resource Description Framework REST Representational State Transfer RI Research Infrastructure SKOS Simle Knowledge Organization System SOAP Simple Object Access Protocol TDM Text and Data Mining VM Virtual Machine WP Workpackage XML Extensible Markup Language XSD XML Schema Definition
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 7 of 30
Glossary annotation (text/corpus annotation) A note by way of explanation or comment added to a text or diagram [OED, https://en.oxforddictionaries.com/definition/annotation]. In OpenMinTeD, the term refers mainly to text or corpus annotation, which is the practice of adding interpretative linguistic information grounded in a knowledge resource to a text or corpus respectively. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which lexical units in a text belong; these tags come from a predefined set (e.g. Noun, Verb, Preposition, etc.). Semantic labeling with terms and concepts from an ontology is another common example of annotation. Relationships such as syntactic dependencies or semantic relations that link entities of the text are also annotations.
annotation resource Any resource that can be used for annotating a text, including part-of-speech tagsets, annotation schemes, domain-specific ontologies, etc.
annotation scheme A set of elements and values designed to annotate data. An annotation scheme usually aims to represent a specific level of information, such as morphological features of words, syntactic dependency relations between phrases, discourse level information, etc. It can consist of a flat structure of elements and values (e.g. part-of-speech tags) or it can be more complex with interrelated elements (e.g. specific morphological features to be used for each part-of-speech).
application Any software program (or group of programs seen as a whole) intended for the end-user and addressing one or multiple related user needs.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 8 of 30
component (software component) An algorithm wrapped in a standard way so that it can be integrated as a reusable tool or service within a particular component-oriented framework such as UIMA, GATE, etc.
corpus A structured collection of pieces of data (textual, audio, video, multimodal/multimedia, etc.) typically of considerable size and selected according to criteria external to these data (e.g. size, type of language, type of producers or expected audience, etc.) to represent as comprehensively as possible the object of study.
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. [Wikipedia, https://en.wikipedia.org/wiki/Data_model]
distribution Any form by which a resource can be shared; it can be a downloadable PDF or a plain text file, a form of a corpus accessible only through a web interface, or the source code of a software, etc.
document A piece of written, printed, or electronic matter that is primarily intended for reading.
interoperability Interoperability describes the extent to which systems and devices can work together, exchange data, and interpret that shared data. For two systems to be interoperable, they must be able to exchange data and subsequently present that data such that it can be understood by a user. [Research Data Alliance, http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 9 of 30
licence A permission or a written evidence of a permission that confers the licensee the right to do something that otherwise would be prevented by the law.
licence compatibility/interoperability The condition or state in which two or more licences can co-exist or be combined without conflicting with each other. In OpenMinTeD, licence compatibility and licence interoperability are used as synonyms.
knowledge resource A resource (data and/or tool) containing, producing or representing knowledge; knowledge is specific information that is relevant for the linguistic and conceptual interpretation of data. For OpenMinTeD purposes, this information is exploited or produced by TDM modules and tools, or exchanged between them.
language description The resource describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures. [Open Language Archives Community, http://www.language-archives.org/REC/type.html#language_description] Examples include sketch grammar, computational grammar, etc.
language resource Language Resources (LRs) encompass (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, used to assist and augment language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications, and (b) tools/technologies/services used for their processing.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 10 of 30
lexical/conceptual resource A resource organised on the basis of lexical or conceptual entries (lexical items, terms, concepts, etc.) with their supplementary information (e.g. grammatical, semantic, statistical information, etc.). In OpenMinTeD, they can be used for annotation purposes.
machine learning (ML) model The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process. [http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]
metadata Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. [National Information Standards Organization, Understanding metadata, http://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdf]
open access (OA) The free and online availability of literature, which allows to read, download, copy, distribute, print, search, or link to the full text, crawl articles for indexing, pass them as data to software, or use them for any other useful purpose. An availability that is granted without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself, and those related to giving authors control over the integrity of their work and the right to be properly acknowledged and cited [Budapest OA Initiative 2002; Bethesda Statement on OA Publishing 2003; Berlin Declaration on OA Knowledge in Science and Humanities 2003]
OpenMinTeD infrastructure An infrastructure refers to the basic structures and facilities required for the operation of a system. The
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 11 of 30
OpenMinTeD infrastructure consists of different layers of resources: content resources that can be mined, ancillary knowledge resources, tools and web services. Any resource that can be registered in the OpenMinTeD registry is part of the underlying infrastructure.
OpenMinTeD platform The OpenMinTeD platform brings together all the services that facilitate the interoperability aspects of the underlying infrastructure (e.g. registration, search and browsing, creation of workflows, processing, annotation, etc.) and, thus, becomes an infrastructural service of the wider research ecosystem.
publication A book, article, etc., that has been made available to the public either via a formal publication service or over the internet and is stored at an archive or repository. For OpenMinTeD purposes, this mainly covers scholarly publications.
resource Something that you can use to help you to achieve something, especially in your work or study. [MacMillan dictionary, http://www.macmillandictionary.com/dictionary/british/resource_1]
rights statement Formal or official statement asserting the copyright status and/or the licensing conditions for a given resource. It can be issued by an authoritative body (e.g. http://rightsstatements.org/). For OpenMinTeD purposes, it can be deemed similar to a "licence category", grouping licences that share similar features.
Text and Data Mining Text and Data Mining (TDM) was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings” (Hearst, 1999), in other words, “an exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 12 of 30
which the answer is not currently known” (Hearst, 1999). [FutureTDM, http://www.futuretdm.eu/news/tdm-definition/]
service / web service A piece of software accessible through remote invocation typically using some REST-style APIs or SOAP protocols.
tool Piece of (standalone) software typically for a very limited technical purpose, such as a particular implementation of a part-of-speech tagger (e.g. TreeTagger), a tree parsing program (e.g. mstparser), etc. Preferred terms in OpenMinTeD include 'component' and 'workflow'.
workflow A series of software components assembled together in order to perform a specific task.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 13 of 30
Publishable Summary The current deliverable brings together the guidelines that interested stakeholders must follow in order to be compatible with OpenMinTeD interoperability specifications.
The guidelines intend to present in a user-friendly way the specifications set for empowering interoperability between content and software resources, especially in the framework of the OpenMinTeD platform. It is, therefore, based on input from
● D5.2 and its updated version D5.3 - Interoperability Requirements Reports (in-progress) that includes the interoperability specifications set for OpenMinTeD,
● D6.1 - Platform Architectural Specification that describes the architecture and functions of the OpenMinTeD platform, and
● the data model adopted by OpenMinTeD for describing resources involved in TDM and implemented in the OMTD-SHARE metadata schema.
The deliverable presents the work and methodology according to which the guidelines have been created, while the actual guidelines are annexed to this report and published online at https://guidelines.openminted.eu.
Four guidelines have been created, targeting respectively the providers of publications, corpora, ancillary knowledge resources and TDM software resources. The specifications determine technical (e.g. data representation formats, transfer protocols), legal and documentation (metadata) issues. Two levels of compliance are foreseen, corresponding to mandatory and recommended specifications, allowing for a gradual adoption by stakeholder groups.
Public review will be solicited from the stakeholder groups and their comments, together with additional requirements from the ongoing work on the project, will be taken into account for the next version of the guidelines (D5.6).
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 14 of 30
1. Introduction OpenMinTeD enables the creation of an infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world and beyond, by both application domain users (i.e., scientists, technicians, etc.) and text mining experts. OpenMinTeD builds upon existing tools and text mining platforms. It aims at rendering them discoverable through the OpenMinTeD registry, and interoperable through the interoperability layer, also based on existing and emerging standards and best practices.
The current deliverable puts together the guidelines that interested parties must follow in order to be compatible with OpenMinTeD interoperability specifications. To serve better the needs of the target stakeholder groups and the peculiarities of each resource type, separate guidelines are available per resource type and provider group. Thus, the deliverable is structured as follows:
● a short presentation of the OpenMinTeD platform and the objectives it serves, ● a short presentation of the audience targeted by the guidelines ● background and methodology of the work ● a synopsis of the OMTD-SHARE metadata schema, which is used for the documentation of all
resources in OpenMinTeD, and the data model it supports. The guidelines themselves are presented in Appendix C, while an online version is available at: https://guidelines.openminted.eu. Given that the project is still in progress, there will be two new releases in the next twelve months, taking into account stakeholders' feedback and additional specifications coming from the project; backwards compatibility of the new versions will be a priority and, where needed, conversion tools to the new version will be made available.
2. The OpenMinTeD platform TDM involves a wide range of resource types:
● the content resources to be mined (scholarly publications in the OpenMinTeD project), ● the text mining software and ● ancillary knowledge resources used for the operation of the software (e.g. annotation schemas,
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 15 of 30
linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models, annotated textual corpora).
The OpenMinTeD platform1 integrates these resources and supports their interaction through appropriate services:
● a Registry service for storing, browsing, downloading, searching and managing the various resources, which will be registered in OpenMinTeD by using a set of specifications/protocols (e.g. OAI-PMH [https://www.openarchives.org/pmh/], Maven [https://maven.apache.org/]) and documented with high-quality metadata;
● the Workflow Editor service of the platform to guide users (via an appropriate User Interface) in creating interoperable workflows of TDM components, which will be executed by the Workflow Execution service in a cloud infrastructure (or on a local machine);
● the Annotation Editor service to allow users to annotate the publications (texts) in order to create datasets that can be used in workflows, e.g. for evaluation purposes.
The OpenMinTeD platform was designed and is being implemented as a facilitator of TDM in an ecosystem of e-infrastructures and repositories, collecting, transforming and making available resources only as needed for TDM purposes. In other words, it is not one more registry of content and services, and it doesn't seek to collect and provide information about resources that might be of interest to TDM stakeholders. Resources are uploaded and stored only as required to accommodate the processing process. Thus, for instance, knowledge resources can be registered at the OpenMinTeD registry and continue to reside at locations outside the platform, only to be accessed at the time of processing. Publications, on the other hand, are harvested and locally stored at OpenMinTeD storage facilities to ensure processing requirements and improve processing time.
Resources are to be registered into OpenMinTeD only if they can be accessed and deployed in the context of a TDM processing operation.
For this reason, it is imperative that
• the resource itself can be accessed in a single step process and in a transparent way through the
1 For a full description of the platform, see D6.1 - Platform Architectural Specification.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 16 of 30
OpenMinTeD mechanisms;
• the resource is properly described with the metadata schema adopted by OpenMinTeD, i.e. the OMTD-SHARE schema (see section 5), at least at the minimal level, ensuring that it can be discovered through the search-or-browse interface by the platform users, and that it can be instantiated when required by the software components at the time of execution of a workflow
• the resource is in a form that can be exploited as is in the OpenMinTeD context (or can be easily transformed into one of the OpenMinTeD acceptable forms through one of the conversion tools included in the platform)
• the resource adheres to the specifications set by OpenMinTeD (at least at the minimal level) that seek to achieve interoperability among all resources, as described in the guidelines.
The resources will be registered into OpenMinTeD by trustworthy sources, i.e. registered individuals. Bilateral agreements with repositories, infrastructures and other registries containing useful resources will also be made to facilitate this process.
In addition, new resources created using the OpenMinTeD toolbox and resources (i.e. corpora built by users by selecting scholarly publications, workflows created by TDM developers with components registered in OpenMinTeD, and outputs of running TDM tools and services in the platform), are also registered, stored and made available to the end-users through OpenMinTeD2 and must follow the above principles. The descriptions of new resources are produced semi-automatically, based on information from the resources used in their composition, and can be edited and enriched by users.
Providers of resources interact with the OpenMinTeD Registry service through a specially designed interface, guiding them through the process of registration (uploading resources and their descriptions). All users can browse resources through the catalogue, select a specific resource and view its detailed description; moreover, resources are fed internally through the system into the Workflow and Annotation services, where they are presented to expert users for further operations.
2 There's an ongoing discussion on the archiving and distribution of the output resources; more information on this will be made available when decisions are reached on this issue.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 17 of 30
3. Target audience OpenMinTeD targets the following groups:
● End users as consumers of the e-Infrastructure, which are further divided to: o Domain specific researchers and research communities (e.g. research labs around the world):
Users that are not knowledgeable about TDM and who want to find end-to-end applications (e.g. web services) that fulfill their needs in an off the shelf type of situation.
o Application developers / Research e-Infrastructures data scientists: People who understand the basic usage of NLP and TDM services, but not the (algorithmic) details. They are aware of the research community needs, limitations and goals. They know how to connect and configure components, and which content they must use to get the required results. They need to develop end-to-end applications.
o e-Infrastructure operators: Users agnostic to the internal specifics of TDM, but who need to integrate and operate TDM services into daily workflows which serve their constituency; the group includes, for instance, researchers of an RI, of a national e-Infrastructure or of a research institution.
● Contributors of content and software resources: o For content to be mined (scholarly publications), a potentially wide group of stakeholders
can be envisaged; in the current phase, the focus is on publishers and repository managers (research libraries).
o For TDM software resources, two subgroups are identified: ▪ A well-established community of expert language technology oriented people, who
are using specific technologies and frameworks (e.g. UIMA, GATE) to develop and enhance their software, which can be used for TDM purposes. Examples of software include Named Entity Recognizers and Term Extractors that incorporate grammatical taggers and parsers.
▪ Non-NLP expert developers, who are creating TDM modules based on off the shelf libraries and tools (e.g. Python NLTK3, Tidytext4, Scikit-learn5, Genism6, OKFN’s
3 https://pypi.python.org/pypi/nltk 4 https://cran.r-project.org/web/packages/tidytext/index.html 5 http://scikit-learn.org/stable/ 6 https://radimrehurek.com/gensim/
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 18 of 30
relevant initiative ContentMine7). These are not familiar with NLP frameworks and terminology but are eager to publish their TDM software.
o For ancillary resources, contributions are expected from two main sources: ▪ The TDM software developers (see above) who are usually bundling the required
resources in their software, but also make them available as separate entities; this includes, for instance, ML models that come together with the software that uses them but may also be distributed separately and, thus, re-used with other software.
▪ Language resources developers (e.g. terminologists, lexicographers, NLP experts producing annotation resources) and members of the various domain communities that already use resources such as ontologies, terminological lexica, thesauri etc. in their work. For this phase, the focus is on the communities targeted by the OpenMinTeD use cases, i.e. research analytics, life sciences, agriculture & biodiversity, social sciences.
The guidelines, at the present stage, are targeting only the second group, i.e. contributors of resources. It supplies instructions and advice on the registration and uploading process, as well as on the proper packaging and documentation of the resources required for importing resources in the OpenMinTeD platform. It also provides recommendations on technical features and properties that contribute to interoperability.
It should be noted, though, that the needs, expertise, habits and expectations of the first group have also influenced the descriptive schema of the resources as well as the functionalities and services supported by the platform. In addition, to further assist the end-users, the creation of guidelines targeting them, with examples and suggested pathways on the use of the OpenMinTeD platform will be investigated during the second phase of the project.
4. Background and methodology of work
The guidelines provide instructions on how to prepare, package and add new resources using the Registry interface. Their production has been based mainly on the interoperability specifications (WP5), taking into account the overall OpenMinTeD architecture and the platform implementation (WP6) as
7 http://contentmine.org/
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 19 of 30
well as the user requirements (cf. Appendix B for acknowledgements).
The four working groups participating in WP5 have set a number of abstract requirements which are described in D5.2 - Interoperability Standards and Specifications Report. The requirements specify ways of assessing and improving interoperability between content resources and software components involved in TDM operations. The next step in this endeavour has been the formulation of concrete requirements (which will be included in the updated version of this deliverable, i.e. D5.3 - Interoperability Standards and Specifications Report (2nd edition)) recommending specific implementation strategies, techniques and features that ensure interoperability as envisaged in OpenMinTeD. These requirements have fed and will continue to feed the Guidelines, given that this is still an ongoing work and that updated versions will be released during the subsequent phases of project.
An important instrument construed to support interoperability in OpenMinTeD is the OMTD-SHARE metadata schema, which is used for the description of the resources (see next section). The Guidelines include separate sections on the use of the metadata schema for each resource type, focusing in the first phase on the minimal level, which includes mandatory and strongly recommended elements. In the next release of the Guidelines, we will also include a full documentation with examples for all resource types, FAQ's and tips/advice. Given the size and complexity of the schema, we have decided to adopt this stepwise process in order to have a first testbed regarding the user-friendliness of the guidelines, and then build upon them following recommendations from the stakeholders.
Additional input for the Guidelines will come from discussions on policies regarding the registration of providers and resources in the platform. Key issues include:
● the interaction with other infrastructures, data and software repositories, in order to manually or automatically harvest all or selected resources from them,
● the involvement of organizations vs. individuals in the process of registering and uploading resources,
● the criteria for accepting resources, ● the criteria for assigning user privileges.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 20 of 30
The structure and the content of the Guidelines reflect these decisions by addressing issues related to specific types of providers in specific sections.
The current version is limited to the resource providers listed hereafter, but next releases will broaden this scope to cover additional stakeholders (e.g. the non-NLP expert software developers). More specifically:
● for content resources (scholarly publications), we expect to get input through big aggregators, i.e. OpenAIRE and CORE, who are aggregating open access content from various sources, such as repositories, publishers, journals etc. To further support the task of data collection, a connector is implemented in OpenMinTeD targeting specifically content from traditional publishers of open access publications.
● for software resources, we expect input mainly from the Consortium partners, collected through software repositories (e.g. Maven Central), but also through MetaShare that hosts resources intended for Language Technology development. In both cases, these belong mainly to the expert language technology oriented communities of developers;
● for ancillary resources, such as lexica, ontologies, ML models etc., we expect input from (a) the TDM software developers, who are wrapping ancillary resources (especially typesystems, models and tagsets) with their software modules; in the first phase, again we are focusing on the Consortium partners; ( b) developers of language resources, who are describing and storing their resources in repositories intended for that purpose, such as MetaShare, and/or in discipline repositories (especially as regards terminologies and ontologies); the main focus will be on the disciplines targeted by the WP9 use cases.
5. The OMTD-SHARE metadata schema The OMTD-SHARE metadata schema8 is the recommended schema for the description of the resources. It has been designed in order to support interoperability between the various resources used in TDM processes. This interoperability is achieved by homogenising descriptions of TDM resources from the different scientific communities using a common core vocabulary, which is linked to pre-
8 The full OMTD-SHARE schema is documented at: https://openminted.github.io/releases/omtd-share/.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 21 of 30
existing domain-specific vocabularies. Standards and best practices of the source communities are integrated whenever possible. The main principles and strategies employed in the design of the OMTD-SHARE schema consist of the following:
● cover needs of resource discoverability and TDM processing ● cover documentation needs of all resource types involved in TDM ● be flexible enough to support varying degrees of documentation completeness ● organize the schema elements and accommodate common vs. particular features of resources ● reuse what is available vs. create new elements and values ● normalize user input vs. allow for free user input ● document processing procedure and outputs.
It has largely been based on the META-SHARE metadata schema9 [Gavrilidou et al. 2012], which caters for the description of language resources, encompassing both data (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) and technologies (tools/services) used for their processing. The OMTD-SHARE is more restricted in the sense that it focuses on text resources only, while it also extends the basic schema in order to include TDM-specific concepts, and enhances the description of processing procedures and workflows.
As in META-SHARE, the schema documents the full lifecycle of a resource, including at least a minimal documentation of its satellite entities (see Figure 1), especially their interrelations. The OMTD-SHARE data model thus comprises of the following entities:
● the resources, further classified into: ○ corpora, i.e. datasets of text documents - mainly scholarly publications in OMTD-SHARE ○ lexical/conceptual resources, including lexica, ontologies, term lists, gazetteers, etc., but
also tagsets and annotation schemas, which are used for annotating corpora ○ language descriptions, which mainly refer to computational grammars ○ machine learning and statistical models10 ○ software components, pieces of software, tools offered as locally executable codes or as
9 http://metashare.ilsp.gr/knowledgebase/homePage 10Models could be considered as a subtype of language descriptions, but we decided to keep it distinct because it had a lot of properties that differentiated it from grammars; it was also considered better to keep them apart as it would enhance their discoverability.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 22 of 30
web services, wrapped in a workflow or as standalone end-to-end applications, and, finally, ○ publications, which constitute a peculiar resource type, as they are viewed in OpenMinTeD
only in a collective form, as a "corpus", ● the satellite entities, such as actors, be it persons or organizations that have created the resources,
or the projects using or funding them.
Figure 1. Overview of the OMTD-SHARE data model
The schema is composed of metadata elements that are used to describe properties and relationships. Some of these elements, especially those that pertain to administrative features, are common to all types of resources (e.g. identification, contact, licensing information, etc.) while others, mainly technical features about the contents and format of resources, differ across types. As aforesaid, publications differ from other resources types: their recommended metadata elements mainly describe criteria used for their selection in the corpus building process.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 23 of 30
One of the characteristic features of the META-SHARE family of schemas11 is the adoption of the component-based mechanism (Component MetaData Infrastructure, CMDI), according to which semantically coherent elements are grouped together to form components12 [Broeder et al., 2008]. For instance, the licensing module includes elements such as the name and URL of a licence, attribution text, copyright holders, etc. For the sake of simplicity, the container elements used for this grouping will not be presented in the guidelines unless required.
The OMTD-SHARE schema classifies elements into three levels of optionality:
● mandatory: elements that are necessary for intended purposes, i.e. for discovering resources and for triggering operations between content and software components
● recommended: elements that can help the current or future use of the resource, or useful information that providers have not yet standardized
● optional: all remaining information related to the lifecycle of a resource.
The XML Schema Definition (XSD) that formally describes the schema has been made publicly available13. An important difference from META-SHARE lies in the organisation vis-a-vis the different resource types covered: while META-SHARE describes all resources types in one common XSD, in OMTD-SHARE, the resource types are described in a more modular way as separate sets of XSDs.
Work is ongoing for producing also an RDF/OWL version, which will be documented in the next release of the guidelines.
6. Structure of the guidelines The current release includes four guidelines (cf. Appendix C), which correspond to the three major
11 Based on the META-SHARE schema, four more adaptations are now available: ELRC-SHARE, clarin:el, and OMTD-SHARE. The META-SHARE schema has also been implemented as an RDF/OWL ontology with the collaboration of the ld4lt W3C group. 12 To avoid confusion with the term "component" also used for software components, we will from now on refer to this concept as "modules". 13 The current version of XSD's is available at: https://github.com/openminted/omtd-share_metadata_schema and the documentation of v1.0.0 at: https://openminted.github.io/releases/omtd-share/.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 24 of 30
distinctions of resources involved in TDM processes:
● content resources to be mined, i.e. scholarly publications, ● ancillary (knowledge) resources used for the operation of the software (e.g. annotation schemas,
linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models)
● TDM (-related) software, and one more for
● corpora as they can be used either as an ancillary resource or as a resource to be mined.
Each set of guidelines contains the following information:
● a brief introduction, specifying the resources expected, potential sources, minimal requirements for the contributions
● packaging and registering instructions for the OpenMinTeD registry ● technical and metadata requirements that empower interoperability ● for each resource type, an overview of the OMTD-SHARE metadata schema (minimal level) with
definitions, explanations, recommended usage and mappings to other widespread metadata schemas
● further instructions per type of contributors or resource type/subtype where required.
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 25 of 30
Appendix A - References
Broeder, D., T. Declerck, E. Hinrichs, S. Piperidis, L. Romary, N. Calzolari and P. Wittenburg,“Foundation of a Component-based Flexible Registry for Language Resources and Technology”, Proceedings of the 6th International Conference of Language Resources and Evaluation, 2008. Available at: http://www.lrec-conf.org/proceedings/lrec2008/
Gavrilidou M., P. Labropoulou, E. Desipri, S. Piperidis, H. Papageorgiou, M. Monachini, F. Frontini, T. Declerck, G. Francopoulo, V. Arranz, V. Mapelli (2012) "The META-SHARE Metadata Schema for the Description of Language Resources", LREC 2012, Istanbul, Turkey. http://www.lrec-conf.org/proceedings/lrec2012/pdf/998_Paper.pdf
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 26 of 30
Appendix B – Acknowledgements & Contributors The guidelines have been the product of work carried out mainly in the OpenMinTeD WP5 Interoperability Framework. The following internal and external experts have exchanged ideas and participated in discussions that have formulated the interoperability requirements, which these guidelines purport to describe:
Internal experts
• Sophia Ananiadou (University of Manchester, UK) • Lucas Anastasiou (Open University, UK) • Sophie Aubin (INRA, France) • Mouhamadou Ba (INRA, France) • Kalina Bontcheva (University of Sheffield, UK) • Robert Bossy (INRA, France) • Jacob Carter (University of Manchester, UK) • Louise Deléger (INRA, France) • Giulia Dore (University of Glasgow, UK) • Richard Eckart de Castilho (TU Darmstadt, Germany) • Fred Fenter (Frontiers Media S.A, Switzerland) • Dimitris Galanis (Athena RC, Greece) • Maria Gavriilidou (Athena RC, Greece) • Patricia Geretto (INRA, France) • Mark Greenwood (University of Sheffield, UK) • Lucie Guibault (University of Amsterdam, Netherlands) • Masoud Kiaeeha (TU Darmstadt, Germany) • Petr Knoth (Open University, UK) • Penny Labropoulou (Athena RC, Greece) • Antonis Lempesis (Athena RC, Greece) • Miguel Madrid (CNIO) • Natalia Manola (Athena RC, Greece) • Thomas Margoni (University of Glasgow, UK) • John McNaught (University of Manchester, UK) • Claire Nedellec (INRA, France) • Wim Peters (University of Sheffield, UK) • Stelios Piperidis (Athena RC, Greece) • Prokopis Prokopidis (Athena RC, Greece)
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 27 of 30
• Piotr Przybyla (University of Manchester, UK) • Angus Roberts (University of Sheffield, UK) • Matt Shardlow (University of Manchester, UK) • Mappet Walker (Frontiers Media SA, Switzerland)
External experts
• Giulia Ajmone Marsan (The Organisation for Economic Co-operation and Development) • Enrique Alonso (Consejo de Estado) • Geoffrey Bilder (CrossRef) • Lukasz Bolikowski (University of Warsaw, Poland) • Maurizio Borghi (Bournemouth University, UK) • Steve Cassidy (Macquarie University Sydney, Australia) • Christopher Cieri (LDC, USA) • Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany) • Liam Earney (JISC, UK) • Kristofer Erickson (CREATe) • Dominique Estival (Western Sydney University, Australia) • Gwen Franck (Creative Commons, EIFL) • Thilo Götz (IBM) • Nancy Ide (Vassar College, USA) • Pawel Kamocki (Institut für Deutsche Sprache, Germany) • Andreas Kempf (Deutsche Zentralbibiothek für Wirtschaftswissenschaften, Germany) • Jin-Dong Kim (Database Center for Life Science, Research Organization of Information and
Systems) • John McCrae (National University of Ireland, Galway, Ireland) • Federico Morando (Nexa Center for Internet & Society, Italiae) • Eric Nyberg (Carnegie Mellon University, USA) • Mark Perry (University of new England, Australia) • Diane Peters (Creative Commons HQ) • Rafal Rak (UberResearch, UK) • Jochen Schirrwagen (Universität Bielefeld, Germany) • Ineke Schuurman (CCL, University of Leuven) • Peter Suber (Berkman Klein Centre, Harvard University) • Keith Suderman (Vassar College, LAPPS) • Prodromos Tsiavos (The Media Institute) • Paul Uhlir (National Academy of Sciences) • Maarten van Gompel (Radboud University Nijmegen)
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 28 of 30
• Marc Verhagen (Brandeis University, LAPPS) • Piek Vossen (VU University Amsterdam, Netherlands) • Menzo Windhouwer (MPI for Psycholinguistics, Netherlands) • Maarten Zeinstra (Kennisland)
Platform Interoperability Guidelines
∙ ∙ ∙
Public Page 29 of 30
Appendix C - Guidelines
1.1
1.2
1.3
1.3.1
1.3.2
1.3.2.1
1.3.2.2
1.3.2.3
1.3.3
1.3.3.1
1.3.3.2
1.3.4
1.3.5
1.3.5.1
1.3.5.2
1.3.5.3
1.3.5.4
1.3.5.5
1.3.5.6
1.3.5.7
1.3.5.8
1.3.5.9
1.3.5.10
1.3.5.11
1.3.5.12
1.3.5.13
1.3.5.14
1.3.5.15
1.3.5.16
1.3.5.17
TableofContentsOpenMinTeDguidelines
Acknowledgements&Contributors
Guidelinesforprovidersofpublications
Introduction
Instructionsforpublicationrepositories,libraries,journals,publishers,etc.
Howtoregisteryourresources
Howtomakeyourresourcesinteroperable
Howtodocumentyourresources
InstructionsforaggregatorsHowtoregisteryourresources
Howtodocumentyourresources
Furtherrequirementsforannotatedpublications
RecommendedschemaforpublicationsdocumentType
publicationType
identifier
title
licence
rightsStmtName
rightsStmtURL
nonStandardLicenceName
nonStandardLicenceTermsURL
versionoflicence
distributionMedium
downloadURL
documentLanguage
fullText
abstract
author
publisher
1
1.3.5.18
1.3.5.19
1.3.5.20
1.3.5.21
1.3.5.22
1.3.5.23
1.3.5.24
1.3.5.25
1.3.5.26
1.3.5.27
1.3.5.28
1.3.5.29
1.3.5.30
1.3.5.31
1.3.5.32
1.3.6
1.3.6.1
1.3.6.2
1.3.6.3
1.3.6.4
1.3.6.5
1.3.6.6
1.3.6.7
1.3.6.8
1.3.6.9
1.3.6.10
1.3.6.11
1.4
1.4.1
1.4.2
1.4.2.1
1.4.2.2
1.4.2.3
1.4.2.4
journal
mimeType
characterEncoding
publicationDate
subject
keyword
collectedFromrepositoryNameorrepositoryIdentifier
sourceMetadataLink
originalDataProviderType
originalDataProviderRepository
originalDataProviderJournal
originalDataProviderPublisher
relationType
relatedResource1
relatedResource2
Metadataschemaforannotatedpublications
annotationLevel
annotationStandoff
mimeType
documentationURL
dataFormatSpecific
characterEncoding
typesystem
tagset
annotationMode
isAnnotatedBy
annotationDate
Guidelinesforprovidersofcorpora
IntroductionInstructionsforprovidersofcorpora
Howtoregisteryourresources
Howtomakeyourresourcesinteroperable
Howtodocumentyourresources
Furtherrequirementsforannotatedcorpora
2
1.4.2.5
1.4.2.5.1
1.4.2.5.2
1.4.2.5.3
1.4.2.5.4
1.4.2.5.5
1.4.2.5.6
1.4.2.5.7
1.4.2.5.8
1.4.2.5.9
1.4.2.5.10
1.4.2.5.11
1.4.2.5.12
1.4.2.5.13
1.4.2.5.14
1.4.2.5.15
1.4.2.5.16
1.4.2.5.17
1.4.2.5.18
1.4.2.5.19
1.4.2.5.20
1.4.2.5.21
1.4.2.5.22
1.4.2.5.23
1.4.2.5.24
1.4.2.5.25
1.4.2.5.26
1.4.2.5.27
1.4.2.5.28
1.4.2.5.29
1.4.2.5.30
1.4.2.5.31
1.4.2.5.32
RecommendedschemaforcorporaresourceName
resourceType
description
identifier
version
licence
rightsStmtName
rightsStmtURL
versionoflicence
nonStandardLicenceName
nonStandardLicenceTermsURL
distributionMedium
downloadURL
contactEmail
landingPage
contactPerson(identifierorpersonName)
contactGroup(identifierororganizationName)
mustBeCitedWith
resourceCreator
creationDate
corpusType
mediaType
lingualityType
multilingualityType
language
sizePerLanguage
size
mimeType
characterEncoding
domain
subject
keyword
3
1.4.2.5.33
1.4.2.5.34
1.4.2.5.35
1.4.2.5.36
1.4.2.6
1.5
1.5.1
1.5.2
1.5.2.1
1.5.2.2
1.5.2.3
1.5.2.4
1.5.2.4.1
1.5.2.4.2
1.5.2.4.3
1.5.2.4.4
1.5.2.4.5
1.5.2.4.6
1.5.2.4.7
1.5.2.4.8
1.5.2.4.9
1.5.2.4.10
1.5.2.4.11
1.5.2.4.12
1.5.2.4.13
1.5.2.4.14
1.5.2.4.15
1.5.2.4.16
1.5.2.4.17
1.5.2.4.18
1.5.2.4.19
1.5.2.4.20
1.5.2.4.21
userQuery
relationType
relatedResource1
relatedResource2
Metadataschemaforannotatedcorpora
Guidelinesforprovidersofknowledgeresources
Introduction
Instructionsforprovidersofancillaryknowledgeresources
Howtoregisteryourknowledgeresources
Howtomakeyourknowledgeresourcesinteroperable
Howtodocumentyourknowledgeresources
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
resourceType
resourceName
description
identifier
licence
rightsStmtName
rightsStmtURL
nonStandardLicenceName
nonStandardLicenceTermsURL
versionoflicence
distributionMedium
downloadURL
accessURL
contactEmail
landingPage
contactPerson(identifierorpersonName)
contactGroup(identifierororganizationName)
mustBeCitedWith
lexicalConceptualResourceType
encodingLevel
linguisticInformation
4
1.5.2.4.22
1.5.2.4.23
1.5.2.4.24
1.5.2.4.25
1.5.2.4.26
1.5.2.4.27
1.5.2.4.28
1.5.2.4.29
1.5.2.4.30
1.5.2.4.31
1.5.2.4.32
1.5.2.5
1.5.2.5.1
1.5.2.5.2
1.5.2.5.3
1.5.2.5.4
1.5.2.5.5
1.5.2.5.6
1.5.2.5.7
1.5.2.5.8
1.5.2.5.9
1.5.2.5.10
1.5.2.5.11
1.5.2.5.12
1.5.2.5.13
1.5.2.5.14
1.5.2.5.15
1.5.2.5.16
1.5.2.5.17
1.5.2.5.18
1.5.2.5.19
1.5.2.5.20
1.5.2.5.21
conformanceToStandardsBestPractices
lingualityType
language
metalanguage
size
domain
characterEncoding
mimeType
relationType
relatedResource1
relatedResource2
RecommendedschemaformodelsresourceType
resourceName
identifier
description
version
licence
rightsStmtName
rightsStmtURL
nonStandardLicenceName
nonStandardLicenceTermsURL
versionoflicence
distributionMedium
downloadURL
contactEmail
landingPage
contactPerson(identifierorpersonName)
contactGroup(identifierororganizationName)
mustBeCitedWith
resourceCreator(personororganization,describedwithidentifierorname)
variantName
tagset
5
1.5.2.5.22
1.5.2.5.23
1.5.2.5.24
1.5.2.5.25
1.5.2.5.26
1.5.2.5.27
1.5.2.5.28
1.5.2.5.29
1.5.2.5.30
1.5.2.5.31
1.5.2.5.32
1.5.2.5.33
1.6
1.6.1
1.6.2
1.6.2.1
1.6.2.2
1.6.2.3
1.6.2.4
1.6.3
1.6.4
1.6.4.1
1.6.4.2
1.6.4.3
1.6.4.4
1.6.4.5
1.6.4.6
1.6.4.7
1.6.4.8
1.6.4.9
1.6.4.10
1.6.4.11
1.6.4.12
1.6.4.13
typesystem
algorithm
trainingCorpusDetails
mediaType
lingualityType
language
size
mimeType
characterEncoding
relationType
relatedResource1
relatedResource2
GuidelinesforprovidersofsoftwareresourcesIntroduction
Instructionsforprovidersofsoftwarecomponents
Howtoregisteryourcomponents
Howtomakeyourcomponentsinteroperable
Howtodocumentyourcomponents
GuidefordeployingUIMAcomponentsintheArgoplatform
Recommendedancillaryknowledgeresources
Recommendedschemaforsoftwareresources
resourceType
resourceName
description
identifier
version
componentType
licence
rightsStmtName
rightsStmtURL
nonStandardLicenceTermsURL
versionoflicence
componentDistributionMedium
accessURL
6
1.6.4.14
1.6.4.15
1.6.4.16
1.6.4.17
1.6.4.18
1.6.4.19
1.6.4.20
1.6.4.21
1.6.4.22
1.6.4.25
1.6.4.26
1.6.4.29
1.6.4.30
1.6.4.31
1.6.4.33
1.6.4.34
1.6.4.35
1.6.4.36
1.6.4.37
1.6.4.38
1.6.4.39
1.6.4.23
1.6.4.24
1.6.4.27
1.6.4.28
1.6.4.32
1.7
1.8
downloadURL
contactEmail
landingPage
contactPerson(identifierorpersonName)
contactGroup(identifierororganizationName)
mailingListInfo
onlineHelpURL
issueTracker
mustBeCitedWith
resourceCreator(personororganization,describedwithidentifierorname)
mediaTypeinsideinputContentResourceInfooroutputResourceInfo
resourceTypeinsideinputContentResourceInfooroutputResourceInfo
languageinsideinputContentResourceInfooroutputResourceInfo
characterEncodinginsideinputContentResourceInfooroutputResourceInfo
mimeTypeinsideinputContentResourceInfooroutputResourceInfo
dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo
typesysteminsideinputContentResourceInfooroutputResourceInfo
tagsetinsideinputContentResourceInfooroutputResourceInfo
annotationLevelinsideinputContentResourceInfooroutputResourceInfo
typesysteminsidecomponentDependencies
tagsetinsidecomponentDependencies
annotationResourceinsidecomponentDependencies
framework
relationType
relatedResource1
relatedResource2
TheOMTD-SHAREmetadataschema
Glossary
7
OpenMinTeDguidelinesWelcometotheOpenMinTeDGuidelines!
OpenMinTeDenablesthecreationofaninfrastructurethatfostersandfacilitatestheuseofTextandDataMining(TDM)technologiesinthescientificpublicationsworld,buildsonexistingTDMtoolsandplatforms,andrendersthemdiscoverableandinteroperablethroughappropriateregistriesandastandards-basedinteroperabilitylayer,respectively.
Thisiswhereyou'llfindinformationon
howtomakeyourresourcesinteroperablewithotherresourcesforTDMpurposeshowtoregisteryourresourcesattheOpenMinTeDplatform(https://services.openminted.eu/)howtocontributetotheguidelines.
TDMinvolvesawiderangeofresourcetypes:
thecontentresourcestobemined,i.e.scholarlypublicationsinthecurrentphase,theTDMsoftwareandancillaryknowledgeresourcesusedfortheoperationofthesoftware(e.g.annotationschemes,linguistictagsets,lexicalorontologicalresourcesusedforannotatingtheresourcestobemined,machinelearningmodels,annotatedtextualcorpora).
Fourguidelinesarereleasedtargetingprovidersoftheseresources:
GuidelinesforprovidersofpublicationsGuidelinesforprovidersofcorporaGuidelinesforprovidersofsoftwareresourcesGuidelinesforprovidersofknowledgeresources
TheOpenMinTeDplatformservesasafacilitatorofTDMinanecosystemofe-infrastructuresandrepositories,collecting,transformingandmakingavailableresourcesonlyasneededforTDMpurposes.Inotherwords,itisnotonemoreregistryofcontentandservices,anditdoesn'tseektocollectandprovideinformationaboutresourcesthatmightbeofinteresttoTDMstakeholders.
Importantnotice
ResourcesaretoberegisteredintoOpenMinTeDonlyiftheycanbeaccessedanddeployedinthecontextofaTDMprocessingoperation.
OpenMinTeDguidelines
8
Eachsetofguidelinescontainsthefollowinginformation:
abriefintroduction,specifyingtheresourcesexpected,potentialsources,minimalrequirementsforthecontributionspackagingandregisteringinstructionsfortheOpenMinTeDregistrytechnicalandmetadatarequirementsthatempowerinteroperabilityforeachresourcetype,anoverviewoftheOMTD-SHAREmetadataschema(minimallevel)withdefinitions,explanations,recommendedusageandmappingstootherpopularmetadataschemasfurtherinstructionspertypeofcontributorsorresourcetype/subtypewhererequired.
OpenMinTeDguidelines
9
Acknowledgements&ContributorsTheguidelineshavebeentheproductofworkcarriedoutmainlyintheOpenMinTeDWP5InteroperabilityFramework.Thefollowinginternalandexternalexpertshaveexchangedideasandparticipatedindiscussionsthathaveformulatedtheinteroperabilityrequirements,whichtheseguidelinespurporttodescribe:
Internalexperts
SophiaAnaniadou(UniversityofManchester,UK)LucasAnastasiou(OpenUniversity,UK)SophieAubin(INRA,France)MouhamadouBa(INRA,France)KalinaBontcheva(UniversityofSheffield,UK)RobertBossy(INRA,France)JacobCarter(UniversityofManchester,UK)LouiseDeléger(INRA,France)GiuliaDore(UniversityofGlasgow,UK)RichardEckartdeCastilho(TUDarmstadt,Germany)FredFenter(FrontiersMediaS.A,Switzerland)DimitrisGalanis(AthenaRC,Greece)MariaGavriilidou(AthenaRC,Greece)PatriciaGeretto(INRA,France)MarkGreenwood(UniversityofSheffield,UK)LucieGuibault(UniversityofAmsterdam,Netherlands)MasoudKiaeeha(TUDarmstadt,Germany)PetrKnoth(OpenUniversity,UK)PennyLabropoulou(AthenaRC,Greece)AntonisLempesis(AthenaRC,Greece)MiguelMadrid(CNIO)NataliaManola(AthenaRC,Greece)ThomasMargoni(UniversityofGlasgow,UK)JohnMcNaught(UniversityofManchester,UK)ClaireNedellec(INRA,France)WimPeters(UniversityofSheffield,UK)SteliosPiperidis(AthenaRIC,Greece)ProkopisProkopidis(AthenaRC,Greece)PiotrPrzybyla(UniversityofManchester,UK)AngusRoberts(UniversityofSheffield,UK)
Acknowledgements&Contributors
10
MattShardlow(UniversityofManchester,UK)MappetWalker(FrontiersMediaSA,Switzerland)
Externalexperts
GiuliaAjmoneMarsan(TheOrganisationforEconomicCo-operationandDevelopment)EnriqueAlonso(ConsejodeEstado)GeoffreyBilder(CrossRef)LukaszBolikowski(UniversityofWarsaw,Poland)MaurizioBorghi(BournemouthUniversity,UK)SteveCassidy(MacquarieUniversitySydney,Australia)ChristopherCieri(LDC,USA)ChristianChiarcos(Goethe-UniversitätFrankfurtamMain,Germany)LiamEarney(JISC,UK)KristoferErickson(CREATe)DominiqueEstival(WesternSydneyUniversity,Australia)GwenFranck(CreativeCommons,EIFL)ThiloGötz(IBM)NancyIde(VassarCollege,USA)PawelKamocki(InstitutfürDeutscheSprache,Germany)AndreasKempf(DeutscheZentralbibiothekfürWirtschaftswissenschaften,Germany)Jin-DongKim(DatabaseCenterforLifeScience,ResearchOrganizationofInformationandSystems)JohnMcCrae(NationalUniversityofIreland,Galway,Ireland)FedericoMorando(NexaCenterforInternet&Society,Italiae)EricNyberg(CarnegieMellonUniversity,USA)MarkPerry(UniversityofnewEngland,Australia)DianePeters(CreativeCommonsHQ)RafalRak(UberResearch,UK)JochenSchirrwagen(UniversitätBielefeld,Germany)InekeSchuurman(CCL,UniversityofLeuven)PeterSuber(BerkmanKleinCentre,HarvardUniversity)KeithSuderman(VassarCollege,LAPPS)ProdromosTsiavos(TheMediaInstitute)PaulUhlir(NationalAcademyofSciences)MaartenvanGompel(RadboudUniversityNijmegen)MarcVerhagen(BrandeisUniversity,LAPPS)PiekVossen(VUUniversityAmsterdam,Netherlands)MenzoWindhouwer(MPIforPsycholinguistics,Netherlands)MaartenZeinstra(Kennisland)
Acknowledgements&Contributors
11
Acknowledgements&Contributors
12
GuidelinesforprovidersofpublicationsIntroductionInstructionsforpublicationrepositories,libraries,publishersetc.InstructionsforaggregatorsFurtherrequirementsforannotatedpublicationsRecommendedschemaforpublications
Guidelinesforprovidersofpublications
13
IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.Scholarlypublicationscomefromawidebulkofstakeholders,e.g.institutionalanddisciplinerepositories,academicjournals,scientificpublishers,etc.Forthefirstphase,thefocusisonliteraturerepositoriesandpublishers,asregardssources,andonOpenAccesscontent,asregardsaccessconditions.
Importantnotice
Itshouldbenotedthatonlypublicationsthatprovidethefulltextor,atleast,anabstractarecandidateforinclusioninOpenMinTeD.
OpenMinTeDreliesonexistinginfrastructuresandstandards/bestpracticesforitsoperation.Thus,toaccessscholarlypublications,itreliesonthetwomainaggregatorsofsuchcontent,OpenAIREandCORE.Providersofscholarlypublicationsareaskedtocontributetheirresourcesbydepositingthematoneofthesestakeholders,followingtheirrespectiveguidelinesandprocedures.Inaddition,OpenAIREandCOREaredevelopingacontentconnectorthatallowsharvestingofopenaccesspublicationsthroughtheAPIsofpublishersthatallowthis.
ScholarlypublicationsareimportedintoOpenMinTeDforTDMprocessingviathecreationofcorporauponqueriessubmittedbytheend-users.ResearcherscometoOpenMinTeDnottoreadpublications,buttobuildacorpusbyselectingpublicationsfromvarioussourcesbasedonspecificcriteria,e.g."acorpusofEnglisharticlesinthebiomedicinearea",inordertorunTDMservicesonthem.
OpenMinTedhaselaboratedseveralarchitecturaloptionsofhowtointegrateexistingcontentproviders(suchasOpenAIREandCOREbutnotlimitedto)andchooseanapproachwherebycontentismanagedinthoseexternalservicesbutisaccessibleintheOpenMinTeDplatformthroughafederatedsearchstrategy.ContentismadeavailabletoOpenMinTedplatformthroughasimpleAPI,definingsimpleoperationstosearchandretrievecontent.
Asoneofthefirststepsofbuildingacorpusofscholarlypublications,end-usersareexpectedtoissueaqueryintheOpenMinTedregistry:infact,theyarepresentedwithafacetedviewoftheOpenMinTeDcontents(i.e.ofallregisteredcontentproviders)and,byselectingfromarangeofcriteria,aqueryisgraduallybuilt.Resultsfromallregisteredcontentprovidersarepresentedtotheend-userand,afterrefinementandcarefulelicitation
Introduction
14
ofthefinalquery,theassociatedcontentistransferredtoOpenMinTeD’sregistryandbecomesavailableforthesubsequentstepsofaTDMworkflow.Alazydeposit/cachingstrategyhasbeenemployedtoavoidredundantqueries(insimpletermsarecordisfetchedonlythefirsttimeitisrequestedandremainspersistentlocallyforfurtherrequests).Extracareistakentoensurereproducibilityofthecreatedcorpusbystoringanexactversionofthecontentusedinit.
Thus,acorpusincludedintheOpenMinTeDRegistryessentiallyconsistsofalistofpublications.Eachpublicationisidentified(equivalenttoaprimarykey)byitscontent(fulltextpdf)hashvalueandasetofmetadatafiles(intheOMTD-SHAREschema)thatdescribetheresource.Inmostcases,thissetconsistsofjustoneitembutthecasethatmultiplemetadatafilesdescribethesameresourceispossible(forexampledifferentmetadatafilesfromCOREorOpenAIRE,updateinmetadatafields,richermetadatafromacontentprovider,etc.)
Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatpublicationsmustmeettointeractwithTDMresources.
Introduction
15
Instructionsforpublicationrepositories,libraries,journals,publishers,etc.
HowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresources
Instructionsforpublicationrepositories,libraries,journals,publishers,etc.
16
Howtoregisteryourresources
IfyouwishtoregisterpublicationsthatcanbeharvestedforTDMpurposesthroughOpenMinTeD,youcandoso
byregisteringthroughOpenAIRE,followingproceduresandguidelinesat:https://www.openaire.eu/validator/welcome.action
OR
byregisteringthroughCORE,followingproceduresat:https://core.ac.uk/join
ForeachpublicationtobevalidforimportintoOpenMinTeD,ametadatarecordconformantwiththeOMTD-SHAREminimalschema,andafilewiththecontentsmustbedelivered.
Howtoregisteryourresources
17
Howtomakeyourresourcesinteroperable
TobefullycompatiblewithOpenMinTeD,youmust:
provideafilewiththeactualcontentsofeachpublicationinanyformatyoudesire(e.g.PDF,HTML,etc.).
Inaddition,ifyouwishyourmaterialtobeeasilyprocessableandinteroperablewithTDMtoolsandservices,youshouldadoptthefollowingrecommendations:
Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.
Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadysupportedbyreadersandconvertersincludedintheOpenMinTeDregistry(cf.dataFormatSpecific).
ThepreferredcharacterencodingisUTF-8.
Please,notethatnotalloftheaboverequirementsareabsolute:ifyourmaterialisnotcompliantwiththem,itmaystillbeprocessable,buttheiradoptionmakesitbetterequippedforTDMandNLPprocessing.
Howtomakeyourresourcesinteroperable
18
Howtodocumentyourresources
TobefullycompatiblewithOpenMinTeD,youmust
provideametadatarecordforeachpublicationwithatleastbibliographicinformationaboutit,inpreferencefollowingtheOpenAIREguidelinesensurethatthepublicationsaredistributedunderOpenAccessconditionsincludeinthemetadatarecordofeachpublicationalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththepublicationifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).
Thefollowingrecommendationswillhelpinteractionwithyourresources,buttheyarenotmandatory.
FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureofpublicationsisrecommended.Usestandardclassificationvocabularies,suchasMeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.
Howtodocumentyourresources
19
InstructionsforaggregatorsForthefirstphaseoftheproject,OpenAIREandCOREwillbringcontentresourcesintoOpenMinTeDthroughuserqueries.Fornextversions,interestedcontentproviderswillbeabletocontributedirectlytoOpenMinTeDiftheyimplementthefollowing:
MapthemetadataoftheircontentstotheOMTD-SHAREschemaProvidesearchcapabilitiesonthemetadataProvidetheactualcontent(e.g.fulltextinthecaseofpublications)
Morespecificinstructionsarefoundinthenextsection.
HowtoregisteryourresourcesHowtodocumentyourresources
Instructionsforaggregators
20
Howtoregisteryourresources
InterestedcontentprovidersmustimplementaJavainterface,calledContentConnector,whichcanbefoundathttps://github.com/openminted/content-connector-api.TheimplementationisthenincludedinthecodeoftheContentServiceoftheOpenMinTeDplatform.Thisinterfacespecifiesthreemethods:
search,whichacceptsaQueryobjectdescribingaqueryandreturnsapageofmetadata.Thismethodisusedforbrowsingthemetadataoftheproviderandsupportskeywordsearch,advancedsearchinanumberoffieldsandalsofacetedsearch.Theresultofthemethodis(a)apage(ofuserspecifiedsize)ofmetadata,(b)thestatisticsoftheresults(totalnumberofhits,etc),and(c)thefacets(ifrequested).
fetchMetadata,whichacceptsaQuery,but,unlikethepreviousmethod,returnsallthemetadataoftheresult,withoutanystatisticsorfacets.Theresultisastreamcontainingasinglexmlelement(called“publications”),whichinturncontainsallthemetadataofthecontent.Thismethodiscalledwhenacorpusisbeingbuilt.
downloadFullText,whichgivenapublicationidentifier(ascontainedinthemetadata)returnsastreamcontainingtheactualcontent.Thismethodisagainusedwhentheplatformisbuildingacorpus.
AdditionaltechnicalinformationisprovidedintheJavacodeoftheinterface.
Howtoregisteryourresources
21
Howtodocumentyourresources
Inthecaseofpublications,therequiredmetadatarecordscomeattwolevels:
oneforthewholequery-generatedcorpusofpublications,incompliancewiththeOMTD-SHAREschemaforcorpora,whichisautomaticallyconstructedonthebasisoftheuserfiltersandmanuallyenrichedbytheuser;oneperpublication,withaminimalsetofmetadataelementsincompliancewiththeOMTD-SHAREschemaforpublications,automaticallyconvertedfromthecurrentschemasoftheproviders.
Itshouldbenotedthattheoriginalresourceproviders(e.g.publicationrepositories,publishersetc.)thatofferpublicationsviaOpenAIREandCOREdonothavetochangetheircurrentschemas.MappingsandconversionsbetweentheOpenAIRE andCOREmetadataandtheOMTD-SHAREschemaaremadebytheprovidersthemselvesintheframeworkofOpenMinTeD .
AllmetadatarecordsforpublicationsmustbedeliveredinXMLformat.
.TheOpenAIREschemaandguidelinesarecurrentlyunderrevision;collaborationwiththerelevantactorshasbeenestablishedtotakeintoaccountthenewfeaturesand,wheredesired,influencethechangessoastosupportTDMprocessesinaccordancetotheinteroperabilityrequirements.↩
.Mappingswithothermetadataschemas,includingOpenAIREandCORE,areincludedinthepresentationoftherecommendedmetadataschema.↩
1
2
1
2
Howtodocumentyourresources
22
Furtherrequirementsforannotatedpublications
ScholarlypublicationswillnormallybeimportedintotheOpenMinTeDplatforminanunprocessedformatandwillbeannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatform.
However,certainprovidersmaydecidetoruntheTDMorannotationsoftwareattheirownpremisesanduploadtheresultsoftheprocessingdirectlyintoOpenMinTeD(e.g.annotatingthepublicationswithstructuralmarkup,extractingacknowledgementsorcitationssectionsetc.).
Inthesecases,theannotatedoutputisconsideredanewresourceand,therefore,shouldberegistered
asaseparateresourcefromtherawpublicationinafoldercalled"annotatedfiles"withitsownmetadatarecord,followingtheinstructionsforannotatedpublications.
ItshouldbenotedthatpublicationsannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.
Furtherrequirementsforannotatedpublications
23
RecommendedschemaforpublicationsThissectionincludestheoverviewoftherecommendedOMTD-SHAREschemaforpublications,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Onlyelementsrelatedtothedescriptionoftheresourcearepresentedhere;additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arehandledinternallybytheOpenMinTeDplatform.
Forannotatedpublications,seehere.
OMTD-SHAREelement Usage
documentType M
publicationType M
identifier M
title M
licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M
nonStandardLicenceName Rwhenapplicable
nonStandardLicenceTermsURL Mwhenapplicable
versionoflicence Μ
distributionMedium M
downloadURL Μwhenapplicable
documentLanguage M
fullText R
abstract R
author R
publisher R
journal R
mimeType R
characterEncoding R
publicationDate R
Recommendedschemaforpublications
24
subject R
keyword R
collectedFromrepositoryNameorrepositoryIdentifier R
sourceMetadataLink R
originalDataProviderType R
originalDataProviderRepository Rwhenapplicable
originalDataProviderJournal Rwhenapplicable
originalDataProviderPublisher Rwhenapplicable
relationType R
relatedResource1 Mwhenapplicable
relatedResource2 Mwhenapplicable
Recommendedschemaforpublications
25
documentType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:documentType:bibliographicRecordOnly,abstract,fullText
Definition/Explanations
Specifieswhetherthemetadatarecordprovidesaccesstothefulltext,theabstractorservesonlyasabibliographicrecord(i.e.includesonlymetadata)
Recommendedusage
Please,selectoneofthevaluesprovidedtoindicatewhetherthemetadatarecordincludesthefulltext(eitherasalinkorasafreetextfieldinsidetherecord),theabstract(again,asalinkorasafreetextdescriptioninametadataelement)ornoneatall.Iftherecordincludesboththeabstractandthefulltext,thepreferredoptionistoselect"fullText".
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:type
documentType
26
publicationType
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:publicationType:article,bachelorThesis,masterThesis,doctoralThesis,book,bookPart,review,conferenceObject,lecture,workingPaper,prePrint,report,annotation,contributionToPeriodical,patent,inProceedings,booklet,manual,techReport,inCollection,unpublished,other
Definition/Explanations
Specifiesthetypeofthepublication(e.g.whetherit'sajournalarticle,oralpaperorposterintheproceedingsofaconferenceetc.)
Recommendedusage
Please,selectoneofthevaluesfromthelist(compatiblewiththeCASRAIresearch/scholarlyoutputtypesIhttp://dictionary.casrai.org/Output_Types));ifnoneofthevaluesfits,pleaseuse"other"
Relationtoothermetadataschemas
OpenAIREcurrentversion:computedfrominstanceTypeOpenAIREv4.0:dc:typeCORE:article.typesDCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforpublicationsistouse"text"fordatacite:resourceTypeGeneralandoneoftheCASRAIvaluesfordatacite:resourceType(e.g.text/ConferenceObject)
publicationType
27
identifier
Usage
Mandatory
Type
freetext
Attributes
ms-omtd:publicationIdentifierSchemeNameorms-omtd:schemeURI
Definition/Explanations
ReferencetoaDOI(recommended)oranykindofidentifierusedforthepublication
Recommendedusage
Provideauniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither
theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURIthatdocumentstheschemeitadheresto.
Relationtoothermetadataschemas
OpenAIREcurrentversion:doi/pmc/etc.identifiersOpenAIREv4.0:dc:identifierCORE:article.id&article.identifiersDCMI:skos:closeMatchdct:identifierDataCite4.0:datacite:contributorwithskos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)contributorType="ContactPerson",contributorName(familyName&givenName)ornameIdentifierandnameIdentifierSchemeandschemeURI
identifier
28
title
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:langandms-omtd:titleType
Definition/Explanations
Thetitleofthepublication
Recommendedusage
Pleaseprovidethetitleasintheoriginalmetadatarecord;the"lang"attributecanbeusedtospecifythelanguageofthetitle,andthe"titleType"attribute(afterDataCite)todifferentiatebetweenmaintitle,alternativeortranslatedtitleandsubtitle.
Relationtoothermetadataschemas
OpenAIREcurrentversion:titleOpenAIREv4.0:dc:titleCORE:article.titleDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title
title
29
licence
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms
Definition/Explanations
Thelicenceofusefortheresource
Recommendedusage
Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:
usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption
licence
30
usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForpublicationsharvestedfromOpenAIREandCORE,pleaseprovidetheoriginallicencevalueifitwasincludedintheoriginalmetadatarecord;inanycase,the"rightsStmtName"elementmustadditionallybeusedforallpublications.
Relationtoothermetadataschemas
OpenAIREcurrentversion:bestlicenseprovidesinfoforNonStandardLicenceTermsandRightsStatementInfoOpenAIREv4.0:dc:rights&file/dc:accessRightsDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights
licence
31
rightsStmtName
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess
Definition/Explanations
Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.
ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
OpenAIREcurrentversion:conversionfrombestlicenceclassnameDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights
rightsStmtName
32
rightsStmtURL
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
URLpattern
Definition/Explanations
LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
OpenAIREcurrentversion:http://api.openaire.eu/vocabularies/dnet:access_modesDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI
rightsStmtURL
33
nonStandardLicenceName
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
freetext
Definition/Explanations
Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences
Recommendedusage
Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.
Relationtoothermetadataschemas
OpenAIREcurrentversion:bestlicenseDCMI:skos:closeMatchdct:title(fordct:licenseDocument)
nonStandardLicenceName
34
nonStandardLicenceTermsURL
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
URLpattern
Definition/Explanations
Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices
Recommendedusage
Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.
Relationtoothermetadataschemas
OpenAIREcurrentversion:bestlicenseclassidDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI
nonStandardLicenceTermsURL
35
versionoflicence
Usage
Mandatory
Type
freetext
Definition/Explanations
Theversionofthelicence
Recommendedusage
Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)
versionoflicence
36
distributionMedium
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other
Definition/Explanations
Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource
Recommendedusage
Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForpublicationsharvestedfromOpenAIREandCORE,thedefaultvalueis"downloadable",ifthedocumentTypeis"abstract"or"fullText".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.
Relationtoothermetadataschemas
OpenAIREv4.0:distributionInfoarerelatedtowebresourceorurlDCMI:skos:closeMatchdct:medium
distributionMedium
37
downloadURL
Usage
Recommendedunderconditions
Conditionsforusage
ifdistributionMedium=downloadable
Definition/Explanations
AnyURLwheretheresourcecanbedownloadedfrom
Recommendedusage
Please,useforpublicationswhoseactualcontentisnotalreadyuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthepublicationandnottoalandingpage.ForpublicationsharvestedfromOpenAIRE&CORE,thefullcontentmustbeuploadedinOpenMinTeDaccordingtotheapprovedguidelinesfortheuserbuiltcorporaofpublications.
Relationtoothermetadataschemas
OpenAIREcurrentversion:urlCORE:article.fulltextURLs
downloadURL
38
documentLanguage
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:documentLanguage(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
ThelanguagethedocumentiswritteninaccordingtoIETFBCP47guidelines
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines
Relationtoothermetadataschemas
OpenAIREcurrentversion:language(buttobemappedfromISO639-23-lettercodestous)OpenAIREv4.0:dc:languageCORE:article.languageDCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language
documentLanguage
39
fullText
Usage
Recommended
Type
freetext
Attributes
xs:lang
Definition/Explanations
Thefulltextofthepublicationinsimpletextformat
Recommendedusage
Youcanusethismetadataelementtoincludethefulltextofthepublicationinsimpletextformatinsteadofuploadingitasaseparatefile.
Relationtoothermetadataschemas
OpenAIREv4.0:file/objectTypeCORE:article.fulltext
fullText
40
abstract
Usage
Recommended
Type
freetext
Attributes
xs:lang
Definition/Explanations
Theabstractofthedocumentinplaintextformat
Recommendedusage
Youcanusethismetadataelementtoincludetheabstractofthepublicationinsimpletextformat;theelementcanberepeatedforthedifferentlanguageversionsusingthe"lang"attributetospecifythelanguage.
Relationtoothermetadataschemas
OpenAIREcurrentversion:dc:descriptionOpenAIREv4.0:dc:descriptionCORE:article.descriptionDCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType
abstract
41
author
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)thathas/haveauthoredthepublication
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.
Relationtoothermetadataschemas
OpenAIREcurrentversion:rels/relOpenAIREv4.0:datacite:creatorCORE:article.authorsDCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI
author
42
publisher
Usage
Recommended
Type
personororganization,bothencodedwithidentifierormultilingualfreetext
Attributes
forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)ororganization(s)thathas/havepublishedthepublication
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.
Relationtoothermetadataschemas
OpenAIREcurrentversion:publisherOpenAIREv4.0:dc:publisherCORE:article.publisherDCMI:skos:exactMatchdct:publisher
publisher
43
DataCite4.0:skos:exactMatchdct:Publisher
publisher
44
journal
Usage
Mandatoryifapplicable
Conditionsforusage
Ifthearticlecomesfromajournal
Type
identifierormultilingualfreetext
Attributes
ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)
Definition/Explanations
Groupsinformationonthejournalwherethepublicationhasappeared
Recommendedusage
Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
OpenAIREcurrentversion:journalCORE:article.journalsDCMI:skos:exactMatchdct:title(forjournals)
journal
45
mimeType
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)
Relationtoothermetadataschemas
OpenAIREv4.0:format&file/mimetypeDCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format
mimeType
46
mimeType
47
characterEncoding
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.
characterEncoding
48
publicationDate
Usage
Recommended
Type
datepattern(yearoryearandmonthorfulldate)
Definition/Explanations
Thepublicationdateor,foranunpublishedwork,thedateitwaswritten
Recommendedusage
Ifpossible,provideatleasttheyearofpublication(orcreation)
Relationtoothermetadataschemas
OpenAIREcurrentversion:dateofacceptanceOpenAIREv4.0:datacite:datewithdateType:acceptedCORE:Article.datePublishedDCMI:skos:closeMatchdct:createdDataCite4.0:skos:closeMatchdatacite:CreationDate
publicationDate
49
subject
Usage
Recommended
Type
freetext
Attributes
ms:classificationSchemeNameandms:schemeURI
Definition/Explanations
Subjectortopicofthedocument
Recommendedusage
Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.
Relationtoothermetadataschemas
OpenAIREcurrentversion:subjectwithschemeid&schemename(aftermappingtoourvalues)OpenAIREv4.0:dc:subjectCORE:article.subjects&article.topicsDCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI
subject
50
keyword
Usage
Recommended
Type
freetext
Definition/Explanations
Wordsusedforindexingthedocument
Recommendedusage
Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.
Relationtoothermetadataschemas
OpenAIREcurrentversion:subjectwithclassidequaltokeywordDCMI:skos:narrowMatchdct:subject
keyword
51
collectedFromrepositoryNameorrepositoryIdentifier
Usage
Recommended
Type
identifier(repositoryIdentifier)ormultilingualfreetext(repositoryName)
Attributes
ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)
Definition/Explanations
Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvestedintoOMTD
Recommendedusage
Therecommendedwayforreferringtoarepositoryisbygivingitsidentifier(e.g.openDOAR);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
OpenAIREv4.0:dc:source
collectedFromrepositoryNameorrepositoryIdentifier
52
sourceMetadataLink
Usage
Recommended
Type
URLpattern
Definition/Explanations
Alinktotheoriginalmetadatarecord,incasesofharvesting
Recommendedusage
ThiselementcanbeencodedautomaticallybyOMTDincasesofharvesting.
Relationtoothermetadataschemas
CORE:article.idDCMI:skos:narrowMatchdct:source
sourceMetadataLink
53
originalDataProviderType
Usage
Recommended
Type
closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:originalDataProviderType:repository,journal,publisher
Definition/Explanations
Referstothetypeoftheoriginaldataprovider(repository/journal/publisher),incasethemetadatarecordcarriesinformationtakenfrompreviousrepositories/journals/publishers(e.g.incasetheOMTDrecord'ssourceisanaggregator)
Recommendedusage
Please,selectoneofthepredefinedvaluesasappropriate.ForrecordsharvestedfromOpenAIREandCORE,thisistheelementwheretheoriginaldataprovider(i.e.therepo/journal/publisher)fromwhichtheythemselveshaveharvestedtherecord.
Relationtoothermetadataschemas
OpenAIREcurrentversion:hastobecomputedfromtheidentifierofcollectedFrominOpenAIRE
originalDataProviderType
54
originalDataProviderRepository
Usage
Recommendedunderconditions
Conditionsforusage
iforiginalDataProviderType=repository
Type
identifierormultilingualfreetext
Attributes
ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)
Definition/Explanations
Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvested
Recommendedusage
Therecommendedwayforreferringtoarepositoryisbygivingtheiridentifier(e.g.fromOpenDOAR,re3dataetc.);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
OpenAIREcurrentversion:collectedFromCORE:article.repositoriesDCMI:skos:narrowMatchdct:source
originalDataProviderRepository
55
originalDataProviderJournal
Usage
Recommendedunderconditions
Conditionsforusage
iforiginalDataProviderType=journal
Type
identifierormultilingualfreetext
Attributes
ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)
Definition/Explanations
Referstothejournalfromwhichthemetadatarecordhasbeenharvested
Recommendedusage
Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
OpenAIREcurrentversion:collectedFromCORE:article.journalsDCMI:skos:narrowMatchdct:source
originalDataProviderJournal
56
originalDataProviderPublisher
Usage
Recommendedunderconditions
Conditionsforusage
iforiginalDataProviderType=publisher
Type
organizationencodedwithidentifierormultilingualfreetext
Attributes
ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Referstothepublisherfromwhichthemetadatarecordhasbeenharvested
Recommendedusage
Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
OpenAIREcurrentversion:collectedFromDCMI:skos:narrowMatchdct:source
originalDataProviderPublisher
57
relationType
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith
Definition/Explanations
Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit
Recommendedusage
Forpublications,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchhasVersionDataCite4.0:skos:closeMatchdatacite:relationType
relationType
58
relatedResource1
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
relatedResource1
59
relatedResource2
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
relatedResource2
60
Metadataschemaforannotatedpublications
Annotatedpublicationsaredocumentedasseparateresourceswithalinktotherawpublicationandtheirownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.
OMTD-SHAREelement Usage
publicationIdentifier M
annotationLevel M
annotationStandoff R
mimeType R
dataFormatSpecific R
documentationURL R
characterEncoding R
typesystem R
tagset R
annotationMode R
isAnnotatedBy R
annotationDate R
Metadataschemaforannotatedpublications
61
annotationLevel
Usage
Mwhenapplicable
Conditionsforusage
forallannotatedresources
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other
annotationLevel
62
Definition/Explanations
Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput
annotationLevel
63
annotationStandoff
Usage
Recommended
Type
boolean
Definition/Explanations
Indicateswhethertheannotationiscreatedinlineorinastand-offfashion.
Forinteroperabilityreasons,therecommendedformatisthestand-offmode
annotationStandoff
64
mimeType
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.
mimeType
65
documentationURL
Usage
Recommended
Type
urlpattern
Definition/Explanations
Linktothedocumentationforthespecificdataformat(explanationsandexamples)
documentationURL
66
dataFormatSpecific
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:dataFormatSpecific:aclAnthology,aimedCorpus,alvisEnrichedDocument,bioNLP,bioNLP;,format-variant=ST2013a1_a2bnc,cadixeJSON,conll2000,conll2002,conll2006,conll2007,conll2009,conll2012,dataSift,factoredTagLem,gate,genia,graf,html5Microdata,i2b2,imsCwb,jdbc,keaCorpus,lll,negraExport,pml,ptb;,format-variant=chunked,ptb;,format-variant=combined,relp,tiger,tupp-dz,twitter,uimaBinaryCas,uimaCASDump,web1t,xces;,format-variant=ilsp
Definition/Explanations
Thesupplementarylevelofdataformat
dataFormatSpecific
67
characterEncoding
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.
characterEncoding
68
typesystem
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
typesystem
69
tagset
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
tagset
70
annotationMode
Usage
Recommended
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:annotationMode:manual,automatic,mixed,interactive
Definition/Explanations
Indicateswhethertheresourceisannotatedmanuallyorbyautomaticprocesses
annotationMode
71
isAnnotatedBy
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothecomponentusedfortheannotationoftheresource
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
isAnnotatedBy
72
annotationDate
Usage
Recommended
Type
dateorrangeofdates
Definition/Explanations
Thedates(eitherdateorrangeofdates)inwhichtheannotationprocesshastakenplace
annotationDate
73
GuidelinesforprovidersofcorporaIntroductionInstructionsforprovidersofcorpora
Guidelinesforprovidersofcorpora
74
IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.
CorporaintheOpenMinTeDframeworkrefermainlytothecollectionsofpublicationsthatwillbeusedasminingsourceintheTDMprocess.Infact,theOpenMinTeDplatformincludesamechanismforautomaticallygeneratingcorporabasedonusercriteriaselectedfromafacetedviewofallpublicationsprovidedbytheOpenMinTeDpartners-moredetailsareincludedintheGuidelinesforpublications.
Corporamayalsocomefromrepositoriesoflanguageresources,suchasMETA-SHAREandCLARIN,ordiscipline-specificrepositories,inwhichcasetheydonothavetobecomposedofscholarlypublications.Examplesincludereferencecorpora(i.e.corporadeemedrepresentativeofgenerallanguageorasublanguageusage),newscorpora,collectionsofdomain-specifictexts,suchasmanuals,etc.aswellasannotatedcorpora,suchastreebanks,morphologicallytaggedgoldencorporaetc.Thesecorporaarenottargetedassourceofminingbutcanbeusedfortrainingcomponents(e.g.trainalanguagemodel)orforevaluatingtheirperformanceorforancillarypurposes.
TobevalidforregistrationintoOpenMinTeD,allcorporamustbeaccompaniedwithametadatarecordconformantwiththeOMTD-SHAREschema,andafilewiththecontentsmustbemadereadilyaccessibleduringtheprocessingoperation.
Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatcorporamustmeettointeractwithTDMresources.
Introduction
75
InstructionsforprovidersofcorporaHowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresourcesFurtherrequirementsforannotatedcorporaRecommendedschemaforcorpora
Instructionsforprovidersofcorpora
76
Howtoregisteryourresources
Corporacanberegisteredbyauthorisedusers.
Ifyouwishtoregisteracorpus,youmust:
provideametadatarecordcompliantwiththeOMTD-SHAREschemaforcorpora,atleastattheminimallevelwhichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditorprovideazippedfilewiththecontentsofthecorpusoralinktoaURLwherethecorpusisdirectlyaccessible(i.e.notalandingpage);wherepossible,thezippedfileshouldfollowthefolderstructurerecommendedforOpenMinTeDpublications,i.e.separatefoldersforcontents,metadatarecordsandlicencedocuments.
Ifthecorpusisstoredattherepositoryofanetworkorinfrastructurethatallowsharvesting(normallyuponagreementsmadewithOpenMinTeD),youcanalsoprovidetherelevantidentifierandthiswillbeuploadedwiththeappropriatedescription.Wherepossible(andthiswillbeappropriatelyindicated),themetadatadescriptionwillbeautomaticallyconvertedtotheOMTD-SHAREschemaandpresentedtotheuserforfurtherediting.
Howtoregisteryourresources
77
Howtomakeyourresourcesinteroperable
InordertoensurethatyourcorporacanbeminedintheOpenMinTeDplatform,youmustfollowthesamerequirementsthataresetforscholarlypublications.Youmusttherefore
providedirectaccesstothecontentsofeachcorpusdescribeeachcorpuswithametadatarecordcompatiblewiththeOTMD-SHAREminimalschema.
Inaddition,thefollowingrecommendationscontributetointeroperabilityandmakeyourcorporaeasiertoprocess:
Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.
Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadycoveredbyreadersandconverters(cf.dataFormatSpecific).
ThepreferredcharacterencodingisUTF-8.
Ifyoufailtoabidetothem,itmightstillbepossibletoprocessyourcorporaviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.
So,ifyouintendtocreateanewcorpus,itisimportantthatyoutakeintoaccountfromtheearlystepsofitsdesign,therequirements,standards,bestpracticesandrecommendationspromotedbyOpenMinTeDandothercooperatinginfrastructures.
Please,notethattherearenogeneralrequirementsyetforcorporatobeusedforancillarypurposes(e.g.fortrainingatool),asthesearedependentontherequirementsofthesoftwarethatwillusethemandonthepurposeofuse.
Howtomakeyourresourcesinteroperable
78
Howtodocumentyourresources
TobefullycompliantwithOpenMinTeD,youmust
ensurethatthecorpusisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).
Furtherrecommendationswillcontributetotheinteroperabilityofyourresources:
FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureoftextsisrecommended.Please,ensurethatyouversionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththecorpusandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Usestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.
Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.
1
1
Howtodocumentyourresources
79
Howtodocumentyourresources
80
Furtherrequirementsforannotated/processedcorpora
CorporacanberegisteredintheOpenMinTeDplatform
inanunprocessedformatandannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatformand/orinanalreadyprocessedformat;inthiscase,theymustbeincludedasaseparateresourcewithitsownmetadatarecordincludingaspecificsetofmetadataelements(thesameasforannotatedpublications).
ItshouldbenotedthatcorporaannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.
Furtherrequirementsforannotatedcorpora
81
Recommendedschemaforcorpora
Overview
Thissectionincludesasynopsisoftheminimalschemaforcorpora,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.
Forannotatedcorpora,seehere.
OMTD-SHAREelement Usage
resourceType Μ
resourceName Μ
description Μ
identifier Μ
version M
licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) Μ
nonStandardLicenceName Rwhenapplicable
nonStandardLicenceTermsURL Μwhenapplicable
versionoflicence Μ
distributionMedium Μ
downloadURL Μwhenapplicable
contactEmailorlandingPage(oneofthetwomustbeprovided) Μ
contactPerson(identifierorpersonName) R
contactGroup(identifierororganizationName R
mustBeCitedWith R
resourceCreator R
creationDate R(Mforquery-builtcorpora)
corpusType Μ
mediaType Μ
Recommendedschemaforcorpora
82
lingualityType Μ
multilingualityType Μwhenapplicable
language Μ
sizePerLanguage Μ
size Μ
mimeType R
characterEncoding R
domain R
subject R
keyword R
userQuery Μwhenapplicable
relationType R
relatedResource1 R
relatedResource2 R
Recommendedschemaforcorpora
83
resourceName
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Thefullnamebywhichtheresourceisknown
Recommendedusage
Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“BritishNationalCorpus”insteadofjust“corpusofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,pleaseuseanindicativenamewiththesourcesandthedates(e.g."SubcorpusofOpenAIREwithbiochemistryarticlescreatedon4/10/2016")
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title
Recommendedschemaforcorpora
84
resourceType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Attributes
Controlledvocabularyreferenceand/orvalues
ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component
Definition/Explanations
Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput
Recommendedusage
Forcorpora,thefixedvalue"corpus"mustbeaddedautomatically
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusagefortextcorporaistouse"dataset"butthevalues"collection"and"text"canalsobeused
Recommendedschemaforcorpora
85
description
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Providesthedescriptionoftheresourceinprose
Recommendedusage
Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType
Recommendedschemaforcorpora
86
identifier
Usage
Mandatory
Type
freetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI
Definition/Explanations
ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource
Recommendedusage
Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither
theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theidentifiermustbeassignedautomatically.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)
Recommendedschemaforcorpora
87
version
Usage
Recommended
Type
freetext
Definition/Explanations
Anystring,usuallyanumber,thatidentifiestheversionofaresource
Recommendedusage
Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version
Recommendedschemaforcorpora
88
licence
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms
Definition/Explanations
Thelicenceofusefortheresource
Recommendedusage
Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:
usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption
Recommendedschemaforcorpora
89
usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thelicencevaluescanbeautomaticallyaggregatedfromthelicencevaluesofthemetadatarecordsincludedinthem;inanycase,the"rightsStmtName"canalsobecomputedautomatically.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaforcorpora
90
rightsStmtName
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess
Definition/Explanations
Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.
ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaforcorpora
91
rightsStmtURL
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
URLpattern
Definition/Explanations
LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaforcorpora
92
versionoflicence
Usage
Mandatory
Type
freetext
Definition/Explanations
Theversionofthelicence
Recommendedusage
Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)
Recommendedschemaforcorpora
93
nonStandardLicenceName
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
freetext
Definition/Explanations
Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences
Recommendedusage
Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:title(fordct:licenseDocument)
Recommendedschemaforcorpora
94
nonStandardLicenceTermsURL
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
URLpattern
Definition/Explanations
Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices
Recommendedusage
Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaforcorpora
95
distributionMedium
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other
Definition/Explanations
Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<
Recommendedusage
Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thedefaultvalueis"downloadable".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.
Recommendedschemaforcorpora
96
downloadURL
Usage
Mandatoryunderconditions
Conditionsforusage
ifdistributionMedium=downloadable
Definition/Explanations
Anyurlwheretheresourcecanbedownloadedfrom
Recommendedusage
Please,useforcorporawhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthecorpusandnottoalandingpage.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thefullcontentisalreadyuploadedinOpenMinTeD,andthereforethedownloadURLisautomaticallyinserted(publicurllinkfromwhichthecorpuscanbedownloaded).
Recommendedschemaforcorpora
97
contactEmail
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
emailpattern
Definition/Explanations
Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,acontactEmailisinsertedautomaticallyfilledinwiththeemailaddressoftheuserthathasbuiltit.
Recommendedschemaforcorpora
98
landingPage
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
URLpattern
Definition/Explanations
AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,alandingPagewillalsobeautomaticallycreatedwithinformationontheuserqueryandthecontentsoftheresults.
Recommendedschemaforcorpora
99
contactPerson(identifierorpersonName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.
Relationtoothermetadataschemas
DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)
Recommendedschemaforcorpora
100
contactGroup(identifierororganizationName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.
Recommendedschemaforcorpora
101
mustBeCitedWith
Usage
Recommended
Type
freetextoridentifier
Definition/Explanations
Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)
Recommendedusage
Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither
theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.
Recommendedschemaforcorpora
102
resourceCreator
Usage
Recommended
Type
personororganization,bothencodedwithidentifierormultilingualfreetext
Attributes
forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&
Recommendedschemaforcorpora
103
datacite:schemeURI
Recommendedschemaforcorpora
104
creationDate
Usage
Recommended
Type
datepatternordaterange
Definition/Explanations
Thedateofthecreationofhteresource,expressedasarangebetweenstartingandenddateorexactdate
Recommendedusage
Please,indicateatleastyearofcreation,ortimeinterval.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thecreationDateisautomaticallyinserted.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:createdDataCite4.0:skos:exactMatchdatacite:CreationDate
Recommendedschemaforcorpora
105
corpusType
Usage
Mandatory
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:corpusType:raw,annotated,annotations
Definition/Explanations
Thesubtypeofthecorpusintermsofprocessing(i.e.whetheritisraw/unprocessed,annotatedorcomposedonlyofannotationswithlinkstotheoriginalrawcorpus
Recommendedusage
Please,selecttheappropriatevalue.ForcorporacreatedthroughthecorpusbuildingprocessofOMTD,thevalueisautomaticallysetto"raw"
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdc:type
Recommendedschemaforcorpora
106
mediaType
Usage
Mandatory
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mediaType:text,audio,video,image
Definition/Explanations
Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.
Recommendedusage
OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".
Recommendedschemaforcorpora
107
lingualityType
Usage
Mandatory
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:lingualityType:monolingual,bilingual,multilingual
Definition/Explanations
Indicateswhethertheresourcecontainsone,twoormorelanguages
Recommendedusage
Please,selectoneofthevalues.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbeautomaticallycomputed.
Recommendedschemaforcorpora
108
multilingualityType
Usage
Mandatoryunderconditions
Conditionsforusage
iflingualityType=bilingualormultilingual
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:multilingualityType:parallel,comparable,multilingualSingleText,originalTranslationsInSameText,other
Definition/Explanations
Indicateswhetherthecorpusisparallel,comparableormixed
Recommendedusage
Please,selectoneofthevalues.
Recommendedschemaforcorpora
109
language
Usage
Mandatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
Thelanguage(s)ofthecorpusaccordingtoIETFBCP47guidelines.
ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbecomputedautomatically.
Theelementcanberepeatedtoencodemultiplelanguages.
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language
Recommendedschemaforcorpora
110
sizePerLanguage
Usage
Recommended
Type
sizepattern(sizeandsizeUnit)
Definition/Explanations
Providesinformationonthesizeperlanguagesubset
Recommendedusage
Youmayindicatethesizeofthesubsetsofthecorpusperlanguage;todothat,fillintheappropriatenumber(withoutspaces)andselecttheappropraitesizeUnit(e.g.20000words).ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size
Recommendedschemaforcorpora
111
size
Usage
Mandatory
Type
sizepattern(sizeandsizeUnit)
Definition/Explanations
Providesinformationonthesizeoftheresourceorofresourceparts.
Recommendedusage
Youmayindicatethesizeoftheentirecorpus(orcorpusparts)byfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size
Recommendedschemaforcorpora
112
mimeType
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format
Recommendedschemaforcorpora
113
Recommendedschemaforcorpora
114
characterEncoding
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.
Recommendedschemaforcorpora
115
domain
Usage
Recommended
Type
freetext
Attributes
ms:classificationSchemeNameandms:schemeURI
Definition/Explanations
Domainofthecorpus
Recommendedusage
Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI
Recommendedschemaforcorpora
116
subject
Usage
Recommended
Type
freetext
Attributes
ms:classificationSchemeNameandms:schemeURI
Definition/Explanations
Subjectortopicofthecorpus
Recommendedusage
Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI
Recommendedschemaforcorpora
117
keyword
Usage
Recommended
Type
freetext
Definition/Explanations
Wordsusedforindexingthecorpus
Recommendedusage
Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:subject
Recommendedschemaforcorpora
118
userQuery
Usage
Mandatorywhenapplicable
Type
freetext
Definition/Explanations
Thequerytextthathascreatedthecorpusofscholarlypublications
Recommendedusage
TobefilledinautomaticallyduringtheOMTDcorpusbuildingprocess
Recommendedschemaforcorpora
119
relationType
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith
Definition/Explanations
Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit
Recommendedusage
Forcorpora,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.
Relationtoothermetadataschemas
DataCite4.0:skos:closeMatchdatacite:relationType
Recommendedschemaforcorpora
120
relatedResource1
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaforcorpora
121
relatedResource2
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaforcorpora
122
Metadataschemaforannotatedcorpora
Annotatedcorporaaredocumentedasseparateresources
includingonlytheannotateddata,withalinktotherawcorpusanditsownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.orasasetofrawandannotatedfilestogether,withametadatarecordthatincludesalltheappropriateelementsforrawcorpora(cf.above)withtheadditionalsetofmetadataelementsforannotations,i.e.allthefollowingelementsexceptfor"resourceIdentifier".
OMTD-SHAREelement Usage
resourceIdentifier M
annotationLevel M
annotationStandoff R
mimeType R
dataFormatSpecific R
documentationURL R
characterEncoding R
typesystem R
tagset R
annotationMode R
isAnnotatedBy R
annotationDate R
Metadataschemaforannotatedcorpora
123
Guidelinesforprovidersofancillaryknowledgeresources
IntroductionInstructionsforprovidersofancillaryknowledgeresources
Guidelinesforprovidersofknowledgeresources
124
IntroductionManyTDMtoolsandservicesmakeuseofancillaryknowledgeresources.Byknowledgeresources,wemeaninformationfromsomedomainorareaofhumanendeavor(e.g.linguistics,agriculture,orthesocialsciences),representedinaformthatcanbeusedtosolveproblemscomputationallyinthatdomainorarea .Creationofsuchknowledgeresourcesiswidespreadinbothlinguistics,andinmanydomainswhereinformaticsisapplied.Theseknowledgeresourcestypicallyincludecontrolledvocabularies,terminologies,lexica,ontologies,andsoon.
AsOpenMinTeDisaboutapplyingTDMtoend-userdomains,theresourcesusedinthosedomainsareofprimaryimportance.Similarly,astextisimportanttoOpenMinTeDtoolsandservices,solinguisticresources(e.g.resourcesthatdescribepartsofspeech)arealsoimportant.
OpenMinTeDtoolsandservicesmaymakeuseoftheseresourcesinordertoprocesstext.Forexample,aservicemaymakeuseofadictionaryofarchaeologicaltermswhenprocessingobjectdescriptions.Or,aservicemaymakeuseofpartsofspeechtofindtheadjectivesinadocument,andusethisinformationtohelpdeterminethesentimentofthedocument.
InordertomakeiteasiertosharetheresultsofTDM,andinordertoallowTDMtoolsandservicestoworktogether,OpenMinTeDmakesanumberofrecommendationsabouthowknowledgeresourcesarerepresented.Knowledgeresourcesthatdonotfollowtheserecommendationscanofcoursebeused;however,interoperabilitywillbereduced.
TheOpenMinTeDrecommendationsonknowledgeresourcesarebasedontheLinkedDataparadigm.By"LinkedData",wemeandatathatiscreatedandmadeavailablewiththeuseofsemanticwebtechnologiesandformats(e.g.RDF,OWL,SPARQL)and,mostimportantly,thatisinterrelatedwithotherdata.
.Poole,DavidandAlanMackworth(2010)ArtificialIntelligence,CambridgeUniversityPress↩
1
1
Introduction
125
Instructionsforprovidersofancillaryknowledgeresources
HowtoregisteryourknowledgeresourcesHowtomakeyourknowledgeresourcesinteroperableHowtodocumentyourknowledgeresourcesRecommendedschemaforlexical/conceptualresources,incl.annotationresourcesRecommendedschemaformodels
Instructionsforprovidersofancillaryknowledgeresources
126
HowtoregisteryourknowledgeresourcesAncillaryknowledgeresourcescanberegisteredbyauthorisedusersasdecidedintheOpenMinTeDPolicies.
Ifyouwishtoregistersucharesource,dependingonthemodeofregisteringtheresource,thefollowingrequirementsareinorder:
iftheresourceisbeingprovidedforuploadtotheOpenMinTeDregistry,pleasepackageitasazipfilepreservingtherecommendedfolderstructureiftheresourceisavailableaspartofaMavenartifact,pleaseprovidetheappropriateMavencoordinatesiftheresourceisofferedwithaSPARQLendpointorataURL,pleasetypeintherelevantlink.
Inallcases,youmustalso
provideametadatarecordcompliantwiththeOMTD-SHAREschema.
Wherepossible,e.g.inthecaseofprovidingaMavenartifact,metadatamaybe,atleastpartially,convertedfromtheexistingdescriptors.Inallcases,youwillbenotifiedoftheavailabilityofconvertedmetadataatthetimeofuploading.
Howtoregisteryourknowledgeresources
127
Howtomakeyourknowledgeresourcesinteroperable
Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethat
youprovidetheresourceinastandardformat,preferablyXMLorJSON-basedsyntax,oranyotherRDFserialisationformat(e.g.TurtleorN3)allelementsintheknowledgeresourceareidentifiedwithaURI;forLinkedDataresources,thefollowingidentifiersshouldbeused:
JSON-LD-the@idkeywordRDF/XML-theattributesxml:base,rdf:IDandrdf:aboutXML-thexml:idattribute
youregisterknowledgeresourcesindependentlyofanycomponentthatusesthem,e.g.inaseparateMavenartifact.
Inthecasethatyouprovidetheresource
inanotherformat,giventhatadherencetoLinkedDatastandardsisnotimposedpackagedinMavenartifactswiththecomponentsthatuseit,attheexpense,however,ofreusability
youstillqualifyforpartialcompliance.
Howtomakeyourknowledgeresourcesinteroperable
128
Howtodocumentyourknowledgeresources
TobefullycompatiblewithOpenMinTeD,youmust
ensurethattheresourceisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation).providelinkagebetweenyourresourceandotherresources(domain-specificorgenericresources);forlinksbetweenknowledgeresourcesintheLinkedDataparadigm,mappingshouldbeexpressedthroughRDFstatements,usingrelationsfromSKOS,togetherwiththefollowingOWLandRDFobjectproperties:owl:sameAs,owl:equivalentClass,owl:equivalentProperty,rdfs:subClassOf,rdfs:subPropertyOf.versionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.
Thefollowingrecommendationscontributetointeroperabilitybutarenotyetenforced:
Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththeknowledgeresourceandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributorofUsestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.
Thefollowingsectionsincludeasynopsisoftheminimalschemasforancillaryknowledgeresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelementsperresourcetype,giventhatknowledgeresourcesmaytakeoneofthefollowingresource
1
Howtodocumentyourknowledgeresources
129
types:
lexical/conceptualresource:reservednotonlyforlexica,ontologies,termlists,glossariesetc.butalsoforanyresourcethatcanbeusedforannotationpurposes,i.e.linguistictagsets,typesystemsetc.languagedescription:reservedmainlyforcomputationalgrammarsmodel:formachinelearningandstatisticalmodels.
Itshouldalsobenotedthatadditionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.
Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.
1
Howtodocumentyourknowledgeresources
130
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
131
OMTD-SHAREelement Usage
resourceType M
resourceName M
description M
identifier M
version M
distributionMedium M
licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided)] M
versionoflicence M
distributionMedium M
downloadURL Mwhenapplicable
contactEmailorlandingPage(oneofthetwomustbeprovided) M
contactPerson(identifierorpersonName) R
contactGroup(identifierororganizationName) R
mustBeCitedWith R
lexicalConceptualResourceType M
encodingLevel R
linguisticInformation R
conformanceToStandardsBestPractices R
lingualityType M
language M
metalanguage R
size&sizeUnit M
mimeType R
characterEncoding R
domain R
relationType R
relatedResource1 R
relatedResource2 R
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
132
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
133
resourceType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Attributes
Controlledvocabularyreferenceand/orvalues
ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component
Definition/Explanations
Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatacomponenttakesasinputorproducesasoutput
Recommendedusage
Forlexical/conceptualresources,thefixedvalue"lexicalConceptualResource"mustbeaddedautomatically
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforlexical/conceptualresourcesistouse"dataset"butthevalues"collection"and"text"canalsobeused
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
134
resourceName
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Thefullnamebywhichtheresourceisknown
Recommendedusage
Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“GreekPAROLElexicon”insteadofjust“amonolinguallexiconofGreek”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
135
description
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Providesthedescriptionoftheresourceinprose
Recommendedusage
Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
136
identifier
Usage
Mandatory
Type
freetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI
Definition/Explanations
ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource
Recommendedusage
Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither
theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
137
licence
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms
Definition/Explanations
Thelicenceofusefortheresource
Recommendedusage
Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:
usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
138
usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
139
rightsStmtName
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess
Definition/Explanations
Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.
ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
140
rightsStmtURL
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
URLpattern
Definition/Explanations
LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
141
nonStandardLicenceName
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
freetext
Definition/Explanations
Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences
Recommendedusage
Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:title(fordct:licenseDocument)
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
142
nonStandardLicenceTermsURL
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
URLpattern
Definition/Explanations
Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices
Recommendedusage
Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
143
versionoflicence
Usage
Mandatory
Type
freetext
Definition/Explanations
Theversionofthelicence
Recommendedusage
Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
144
distributionMedium
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other
Definition/Explanations
Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<
Recommendedusage
Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Forinteroperabilityreasons,therecommendedwayofprovidingannotationresources(e.g.tagsets,ontologiesetc.)istodistributetheminadownloadableformorinawaythatcanbeeasilyaccessedbythes/w
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
145
downloadURL
Usage
Mandatoryunderconditions
Conditionsforusage
ifdistributionMedium=downloadable
Definition/Explanations
Anyurlwheretheresourcecanbedownloadedfrom
Recommendedusage
Please,useforresourceswhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheresourceitselfandnottoalandingpage.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
146
accessURL
Usage
Mandatoryunderconditions
Conditionsforusage
ifdistributionMedium=webExecutableoraccessibleThroughInterface
Definition/Explanations
Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted
Recommendedusage
Pleaseuseforresourcesthatare"accessibleThroughInterface"or"webExecutable"
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
147
contactEmail
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
emailpattern
Definition/Explanations
Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
148
landingPage
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
URLpattern
Definition/Explanations
AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
149
contactPerson(identifierorpersonName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.
Relationtoothermetadataschemas
DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
150
contactGroup(identifierororganizationName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
151
mustBeCitedWith
Usage
Recommended
Type
identifierorfreetext
Definition/Explanations
Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)
Recommendedusage
Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither
theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
152
lexicalConceptualResourceType
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:lexicalConceptualResourceType:wordList,computationalLexicon,ontology,wordnet,thesaurus,framenet,terminologicalResource,machineReadableDictionary,lexicon,typesystem,tagset,mappingOfResources,other
Definition/Explanations
Specifiesthetypeoflexical/conceptualresources
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:type
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
153
encodingLevel
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:encodingLevel:phonetics,phonology,semantics,morphology,syntax,pragmatics,other
Definition/Explanations
InformationonthecontentsofthelexicalConceptualResourceasregardsthelinguisticlevelofanalysis
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
154
linguisticInformation
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:linguisticInformation:accentuation,lemma,lemma-MultiWordUnits,lemma-Variants,lemma-Abbreviations,lemma-Compounds,lemma-CliticForms,partOfSpeech,morpho-Features,morpho-Case,morpho-Gender,morpho-Number,morpho-Degree,morpho-IrregularForms,morpho-Mood,morpho-Tense,morpho-Person,morpho-Aspect,morpho-Voice,morpho-Auxiliary,morpho-Inflection,morpho-Reflexivity,syntax-SubcatFrame,semantics-Traits,semantics-SemanticClass,semantics-CrossReferences,semantics-Relations,semantics-Relations-Hyponyms,semantics-Relations-Hyperonyms,semantics-Relations-Synonyms,semantics-Relations-Antonyms,semantics-Relations-Troponyms,semantics-Relations-Meronyms,usage-Frequency,usage-Register,usage-Collocations,usage-Examples,usage-Notes,definition/gloss,translationEquivalent,phonetics-Transcription,semantics-Domain,semantics-EventType,semantics-SemanticRoles,statisticalProperties,morpho-Derivation,semantics-QualiaStructure,syntacticoSemanticLinks,other
Definition/Explanations
AmoredetailedaccountofthelinguisticinformationcontainedinthelexicalConceptualResource
Relationtoothermetadataschemas
DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
155
conformanceToStandardsBestPractices
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:conformanceToStandardsBestPractices:AgroVoc,ALVIS,ARGO,BML,CES,DKPro_Core,EAGLES,EDAMontology,ELSST,EML,EMMA,GATE,GESIS,GMX,GrAF,HamNoSys,HASSET,InkML,ILSP_NLP,ISO12620,ISO16642,ISO1987,ISO26162,ISO30042,ISO704,JATS,LAF,LAPPS,Lemon,LMF,MAF,MLIF,MOSES,MULTEXT,MUMIN,multimodalInteractionFramework,OAXAL,OLIA,OWL,PANACEA,pennTreeBank,pragueTreebank,RDF,SemAF,SemAF_DA,SemAF_NE,SemAF_SRL,SemAF_DS,SKOS,SRX,SynAF,TBX,TMX,TEI,TEI_P3,TEI_P4,TEI_P5,TimeML,XCES,XLIFF,UD,WordNet,othe
Definition/Explanations
Specifiesthestandardsorthebestpracticestowhichthetagsetusedfortheannotationconforms
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
156
lingualityType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:lingualityType:monolingual,bilingual,multilingual
Definition/Explanations
Indicateswhethertheresourcecontainsone,twoormorelanguages
Recommendedusage
Please,selectoneofthevalues.Please,notethattheelementconcernsthelanguageoftheresourceitselfandnotthelanguageusedforitsdescription;forinstance,alexiconofEnglishwithdefinitionsbothinEnglishandFrenchisconsideredmonolingual.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
157
language
Usage
Mandatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
Thelanguage(s)oftheresourceaccordingtoIETFBCP47guidelines.
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
158
metalanguage
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
Thelanguage(s)usedtodescribethecontentsoftheresource(the"metalanguage")accordingtoIETFBCP47guidelines.
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
159
size
Usage
Mandatory
Type
sizepattern(sizeandsizeUnit)
Definition/Explanations
Providesinformationonthesizeoftheresourceorofresourceparts.
Recommendedusage
Youmayindicatethesizeofthelexical/conceptualresourcebyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
160
domain
Usage
Recommended
Type
freetext
Attributes
ms:classificationSchemeNameandms:schemeURI
Definition/Explanations
Domainofthelexical/conceptualresource
Recommendedusage
Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
161
characterEncoding
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
162
mimeType
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
163
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
164
relationType
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith
Definition/Explanations
Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit
Recommendedusage
Forlexical/conceptualresources,therecommendedrelationsareisVersionOfandrequiresSoftware,butanyrelationTypecanbeusedasappropriate.
Relationtoothermetadataschemas
DataCite4.0:skos:closeMatchdatacite:relationType
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
165
relatedResource1
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
166
relatedResource2
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaforlexical/conceptualresources,incl.annotationresources
167
Recommendedschemaformodels
Recommendedschemaformodels
168
OMTD-SHAREelement Usage
resourceType M
resourceName M
description M
identifier M
version M
distributionMedium M
licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M
versionoflicence M
distributionMedium M
downloadURL Mwhenapplicable
contactEmailorlandingPage(oneofthetwomustbeprovided) M
contactPerson(identifierorpersonName) R
contactGroup(identifierororganizationName) R
mustBeCitedWith R
resourceCreator(personororganization,describedwithidentifierorname) R
variantName M
tagset R
typesystem R
algorithm R
trainingCorpusDetails R
mediaType M
lingualityType M
multilingualityType Mwhenapplicable
language M
size M
relationType=isCompatibleWith(externalrelationbetweenmodelsandcomponentsthatcanusethem) R
Recommendedschemaformodels
169
Recommendedschemaformodels
170
resourceType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Attributes
Controlledvocabularyreferenceand/orvalues
ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component
Definition/Explanations
Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput
Recommendedusage
Formodels,thefixedvalue"model"mustbeaddedautomatically
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageformodelsistouse"model"butthevalue"dataset"canalsobeused
Recommendedschemaformodels
171
resourceName
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Thefullnamebywhichtheresourceisknown
Recommendedusage
Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPPOStaggermodelforEnglish”insteadofjust“modelforEnglishPOStags”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title
Recommendedschemaformodels
172
identifier
Usage
Mandatory
Type
freetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI
Definition/Explanations
ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource
Recommendedusage
Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither
theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)
Recommendedschemaformodels
173
description
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Providesthedescriptionoftheresourceinprose
Recommendedusage
Giveabriefyetinformativedescriptionofthemodel,e.g.thelanguage(s)itappliesto,thecorpusithasbeentrainedon,theoreticalapproachesusedetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType
Recommendedschemaformodels
174
version
Usage
Recommended
Type
freetext
Definition/Explanations
Anystring,usuallyanumber,thatidentifiestheversionofaresource
Recommendedusage
Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)
Relationtoothermetadataschemas
DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version
Recommendedschemaformodels
175
licence
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms
Definition/Explanations
Thelicenceofusefortheresource
Recommendedusage
Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:
usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption
Recommendedschemaformodels
176
usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaformodels
177
rightsStmtName
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess
Definition/Explanations
Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.
ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights
Recommendedschemaformodels
178
rightsStmtURL
Usage
Mandatoryunderconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
URLpattern
Definition/Explanations
LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaformodels
179
nonStandardLicenceName
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
freetext
Definition/Explanations
Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences
Recommendedusage
Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:title(fordct:licenseDocument)
Recommendedschemaformodels
180
nonStandardLicenceTermsURL
Usage
Mandatoryunderconditions
Conditionsforusage
tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)
Type
URLpattern
Definition/Explanations
Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices
Recommendedusage
Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI
Recommendedschemaformodels
181
versionoflicence
Usage
Mandatory
Type
freetext
Definition/Explanations
Theversionofthelicence
Recommendedusage
Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)
Recommendedschemaformodels
182
distributionMedium
Usage
Mandatory
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other
Definition/Explanations
Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<
Recommendedusage
Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Formodels,theexpectedvalueis"downloadable".Please,notethatIfthemodelisdistributedindifferentmediumsand/orunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelementstodescribethem.
Recommendedschemaformodels
183
downloadURL
Usage
Mandatoryuponconditions
Conditionsforusage
ifdistributionMedium=downloadable
Type
urlpattern
Definition/Explanations
Anyurlwheretheresourcecanbedownloadedfrom
Recommendedusage
Please,indicatewherethemodelcanbedownloaded;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository
Recommendedschemaformodels
184
contactEmail
Usage
Mandatoryunderconditions
Conditionsforusage
Anemailoralandingpagemustbeprovided
Type
emailpattern
Definition/Explanations
Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
Recommendedschemaformodels
185
landingPage
Usage
Mandatoryunderconditions
Conditionsforusage
Anemailoralandingpagemustbeprovided
Type
URLpattern
Definition/Explanations
AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
Recommendedschemaformodels
186
contactPerson(identifierorpersonName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.
Relationtoothermetadataschemas
DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)
Recommendedschemaformodels
187
contactGroup(identifierororganizationName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.
Recommendedschemaformodels
188
mustBeCitedWith
Usage
Recommended
Type
freetextoridentifier
Definition/Explanations
Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)
Recommendedusage
Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither
theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.
Recommendedschemaformodels
189
resourceCreator(personororganization,describedwithidentifierorname)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName
Recommendedschemaformodels
190
&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI
Recommendedschemaformodels
191
variantName
Usage
Mandatory
Type
freetext
Definition/Explanations
variantnameusedforthemodel
Recommendedschemaformodels
192
tagset
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaformodels
193
typesystem
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
xs:resourceIdentifierSchemeNameorxs:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaformodels
194
algorithm
Usage
Recommended
Type
freetext
Definition/Explanations
Trainingalgorithmusedforthemodel(e.g.maximumentropy,svmetc.)
Recommendedusage
Please,provideanameandnotdetailsaboutit
Recommendedschemaformodels
195
trainingCorpusDetails
Usage
Recommended
Type
freetext
Definition/Explanations
Detaileddescriptionofthetrainingcorpus(e.g.size,numberoffeaturesetc.)
Recommendedschemaformodels
196
mediaType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mediaType:text,audio,video,image
Definition/Explanations
Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.
Recommendedusage
OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".
Recommendedschemaformodels
197
lingualityType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:lingualityType:monolingual,bilingual,multilingual
Definition/Explanations
Indicateswhethertheresourcecontainsone,twoormorelanguages
Recommendedusage
Please,selectoneofthevalues.
Recommendedschemaformodels
198
language
Usage
Mandatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
Thelanguage(s)forwhichthemodelhasbeentrained,expressedaccordingtoIETFBCP47guidelines.
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguagethatthemodelcanbeusedfor(e.g.en-US)accordingtotheIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language
Recommendedschemaformodels
199
size
Usage
Mandatory
Type
sizepattern(sizeandsizeUnit)
Definition/Explanations
Providesinformationonthesizeoftheresourceorofresourceparts.
Recommendedusage
YoumayindicatethesizeoftheentiremodelbyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size
Recommendedschemaformodels
200
mimeType
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format
Recommendedschemaformodels
201
Recommendedschemaformodels
202
characterEncoding
Usage
Recommended
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.
Recommendedschemaformodels
203
relationType
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith
Definition/Explanations
Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit
Recommendedusage
Formodels,therecommendedrelationisisCompatibleWithholdingwithsoftwarecomponents,butanyrelationTypecanbeusedasappropriate.
Relationtoothermetadataschemas
DataCite4.0:skos:closeMatchdatacite:relationType
Recommendedschemaformodels
204
relatedResource1
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaformodels
205
relatedResource2
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
Recommendedschemaformodels
206
Guidelinesforprovidersofsoftwareresources
IntroductionInstructionsforprovidersofsoftwarecomponentsRecommendedancillaryknowledgeresourcesRecommendedmetadataschemaforsoftwareresources
Guidelinesforprovidersofsoftwareresources
207
IntroductionOpenMinTeDtargetsscholarlyresearcherswhoareagnostictosoftwaredetailsandpeculiaritiesaswellasTDMdevelopers.Itallows,therefore,theregistrationof
applications,thatcanbeusedas-istoperformTDMoperationsoncontentresources,andsoftwarecomponents,i.e.piecesofsoftwarethatcan,bymeansoftheOpenMinTeDWorkflowEditor,beputtogetherandtunedwithvariousancillaryresourcesinordertocreateworkflowsthatwillbedeliveredtotheend-usersand/orfurtherintegratedintootherworkflows.
Allofthesewillbemadeavailabletotheresearchersinawaythatwillnotrequireanykindofexpertisefromthem,bothaslocallydownloadableandexecutabletoolsoraswebservices.
TheOpenMinTeDplatform,atthecurrentstage,supportstheintegrationofsoftwarecomponentswrappedfortheGATEorUIMA/uimaFITframeworks.
TobefullycompatiblewithOpenMinTeD,youmustprovide
ametadatarecordcompliantwiththeOMTD-SHAREschema,atleastattheminimallevel(whichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditor),thesoftwareinanexecutableform,byuploadingitinacompressedfileorprovidingalinktoaURLlocationfromwhichitcanbedirectlyaccessed(i.e.notalandingpage).
Introduction
208
Instructionsforprovidersofsoftwarecomponents
HowtoregisteryourcomponentsHowtomakeyourcomponentsinteroperableHowtodocumentyourcomponentsGuidefordeployingUIMAcomponentsintheArgoplatform
Instructionsforprovidersofsoftwarecomponents
209
Howtoregisteryourcomponents
TherecommendedwayofprovidingsoftwarecomponentsisthroughtheMavenCentralrepositoryaccordingtothefollowinginstructions:
Please,puttogetherinasinglefolder(intheformthatisrequiredfromtheusedtechnologies/frameworks)
allfilesthatimplementthecomponent(e.g.Javaclassesetc.)licencetext(s),preferablynamedas"LICENCE.TXT"inordertobeunambiguouslyrecognised;inthecaseofmultiplelicences,theyshouldbeallaggregatedinthesamefileareadmenotice,thatdescribesthecontentsofthefolderaswellasanyimportantnoticeforthecompilationandexecutionofthecomponentalldescriptors(UIMA/uimaFIT,GATECREOLE ,OMTD-SHAREetc.)availableforthecomponentaccordingtotheimplementationframework,aMavenPOMXMLfile.
PackthemasaJARusingtherespectiveMavenplugin.UploadthemtotheMavenrepositoryaccordingtotheMavenguidelinesFinally,submittheMavencoordinatesintheOMTDregistry;inthiscase,themetadatarecordwillbepartiallyconvertedfromtheMavenPOMfileand,potentiallyfromelementsincludedinthemetadatadescriptorssupportedbyOpenMinTeD(UIMA/uimaFIT,CREOLE,andthenyoucanenrichitusingtheOpenMinTeDeditor.
.DetailsofGATEdescriptorscanbefoundathttps://gate.ac.uk/userguide/sec:creole-model:configalthoughtheydonotcurrentlycontainmany(ifany)oftheinformationneededtocompletetheOMTD-SHAREmetadatadescriptor.NotethatthisischangingtoincludemoreOpenMinTedlikeinformationmuchofwhichwillbespecifiedinaMavenPOMratherthanasCREOLEmetadata.ThisiscurrentlynotdocumentedasitrelatestothenextversionofGATEthatisstillunderactivedevelopment.↩
1
1
Howtoregisteryourcomponents
210
Howtomakeyourcomponentsinteroperable
Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethatyouadoptthefollowingrules;ifyoufailtoabidetothem,itmightstillbepossibletooperateyoursoftwareresourcesviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.
Please,keepancillaryknowledgeresources,e.g.models,annotationresources,etc.,separatefromthecomponentitself;documentanduploadthesealsointheOpenMinTeDRegistryfollowingtheproceduredescribedinGuidelinesforprovidersofancillaryknowledgeresources.Ifyouwanttorefertotheseresourcesfromthesoftwaremetadatarecord,pleaseusetheresourceidentifierforthelinking.Toensurethatprovidedsoftwarecomponentscanbescaledasrequiredfordifferentworkloads,itisrecommendedthattheyareimplementedinastatelessfashion,i.e.withouttheneedtomaintaininformationaboutoneormoredocumentsandtheneedtosharethisinformationwithotherinstancesofthesamecomponent.E.g.acomponentthatcountsalltokensinacorpuscannotbetriviallyscaled.InadditiontoplainUIMA/uimaFITandGATE-CREOLEdescriptors,OpenMinTeDalsosupportsArgodescriptors;furtherinstructionsfordeployingUIMAcomponentsinArgoarefoundhere.
Howtomakeyourcomponentsinteroperable
211
Howtodocumentyourcomponents
TobefullycompatiblewithOpenMinTeD,youmust
ensurethatthesoftwareisdistributedunderaperpetual,world-wide,no-charge,royalty-freecopyright/patentlicencethatpermitsunrestricteduseandallowsunlimitedredistributionincludeinthemetadatarecordalinktothelicencedocument(s)withthetermsandconditionsunderwhichitisprovided,andattachthelicencedocument(s)togetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation)ensurethatyouversionallyoursoftwareresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendationsensurethatyouprovidewithyoursoftwareresourceappropriatemachine-readablemetadataembeddedinthesourcecode(wherepossible)andaccordingtotherelevantframework(e.g.uimaFITJavaannotationsetc.);makesurethatthemetadatadescriptorsareproperlyidentifiedinanunambiguouswaythatmakesthemeasytodistinguishandextractforJava-basedcomponents,ensurethatyouusetheJavafullyqualifiedclassnamingconventionsfornamingyourcomponents;togetherwiththeMavenpracticesforregisteringpackagingandversion,thiscontributestouniqueidentifiersofthecomponentsdescribealltheexecutionalrequirementsfortheproperoperationofthesoftware,i.e.requiredsoftwarelibraries,ancillaryresources,annotationschemadependencies,etc.describetheinputandoutputrequirementsforyoursoftware,atleastasregardsthetypeofresource,thelanguage(ifrequired),dataformatandcharacterencoding,andannotationtypesoftheinput/outputresourcedeclarewhetherthesoftwareisdownloadableorcanonlybeaccessedasawebserviceinthemetadataensurethatyoudescribeappropriatelythefunctionalitiesofthesoftware,boththroughtheOMTD-SHAREcomponenttypevocabularyaswellasinafreetextdescription,supplyingmoreinformationfortheuser.
Furtherrecommendationsthatcontributetointeroperabilityincludethefollowing:
Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.manuals,helpfilesetc.),whichyoushouldalsoversionalongwiththesoftwareandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.
Howtodocumentyourcomponents
212
Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.),pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.
.Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.↩
1
1
Howtodocumentyourcomponents
213
GuidefordeployingUIMAcomponentsintheArgoplatformArgoisabletousestandardJavaUIMAcomponents,howevertheymustbefirstpackagedasUIMAPEAR(ProcessingEngineARchive)filesbeforetheycanbedeployedwithintheArgoplatform.
ItisstronglyrecommendedtouseMaven,abuildautomationtool,tomanageUIMAcomponentprojects,andaMavenpom.xmltemplate(seefurtherbelow)isavailable.Thehighlightedvalueswithinthepom.xmltemplatearethoseexpectedtobeconfiguredbycomponentdevelopers.
TheveryminimumfilesrequiredtoproduceaworkingUIMAcomponentare:
1. AstandardUIMAXMLdescriptor(locatedunderthedescfolderattherootoftheproject).
2. AJavaclasscontainingtheimplementationofthecomponent(locatedundersrc/main/Java).
3. AMavenpom.xml(adaptedfromthetemplate).
Figure1showstherecommendedlayoutofaverysimplecomponentprojectmanagedbyMaven,usingtheexampleplaceholdervaluesfoundintheMavenpom.xmltemplate.TheUIMAXMLdescriptorshouldbenamedusingtheMavenartifactIdvalue(e.g.uima-component)andresideunderthedescdirectoryandthenanestedsetofdirectoriesrepresentingtheMavengroupIdvalue(e.g.xyz.company.uima).
Figure1:BasiclayoutofaMaven-basedUIMAcomponentproject
ItisrecommendedtousetheMavenartifactIdandgroupIdtoproducetheUIMAComponentID(e.gthegroupIdxyz.company.uimaandartifactIduima-componentshouldresultinaComponentIDofxyz.company.uima.uima-component).Thedefaultconfigurationofthe
GuidefordeployingUIMAcomponentsintheArgoplatform
214
PEARPackagingMavenplugin,withinthepom.xmltemplate,automatesthisprocedure.AComponentIDisintendedtobeuniqueandisnotintendedtobevisibletoArgoend-users.
AnyJavadependenciesofaUIMAcomponentareexpectedtobeincludedwithinacomponent’sPEARfile.Thepom.xmltemplateisconfiguredtoautomaticallypackagetheMavendependencieswhenbuildingaPEARfile.However,toachieveArgocompatibility,itisimportanttoexcludetheuimaj-coreartifactandanyartifactsrepresentingUIMATypeSystems.Inthepom.xmltemplatethisisachievedbysupplyingtheexcludeArtifactIdsconfigurationparameterofthetheMavenDependencypluginwithacomma-delimitedlistoftheaffectedartifactIds.ArgoexpectsUIMATypeSystemstobeinstalledseparatelyandpackagedasPEARfiles,asforUIMAcomponents.
AcomponentmayalsocontainanArgoXMLdescriptorfile,althoughthisisentirelyoptional.Itisintendedtoprovideadditionalmetadataforacomponent.AnArgoXMLdescriptormust:
Resideinthesamedirectoryasthecomponent’sUIMAXMLdescriptor.HavethesamefilenameastheUIMAXMLdescriptor,butwitha.argo.xmlsuffix.
Figure2showsthelocationandnameofanArgoXMLdescriptorfileforacomponentwiththeIDofxyz.company.uima.uima-component,whileFigure3showsthegeneralformatofthedescriptorfileitself.
Figure2:ExamplefilestructureofacomponentcontainingArgoXMLdescriptorfile
<argoDescriptor>
<tags>
<tag>{string}/tag>
...
</tags>
<minimumMemoryInMbs>{integer}</minimumMemoryInMbs>
<interactive>[true/false]</interactive>
<configurationParametersMetaData>
<configurationParameterMetaData>...</configurationParameterMetaData>
...
<configurationParametersMetaData>
</argoDescriptor>
Figure3:StructureofanArgoXMLdescriptor
GuidefordeployingUIMAcomponentsintheArgoplatform
215
WithinanArgodescriptorfile,allofthesub-elementsdirectlyundertheargoDescriptorelementareoptional.
Thetagselementcancontainmultipletagelements,eachcontainingastringvalue.ThesetagvaluesareintendedtobeusedwithinArgo’scomponentsearchfacility,toassistend-usersinfindingrelevantcomponents.
TheminimumMemoryinMbselementholdsanintegervalue,settingthedefaultvaluefortheminimumofamountofmemory(inMegabytes)requiredbythiscomponentwhenitisraninadistributedworkflow.Thisisimportantfordeterminingtheallocationofcomponentstomachines.
Theinteractiveelementcontainsabooleanvalue.Thisvalueissettotruewhenacomponentcontainsacustomwebuserinterface,whichrequiresinteractionwiththeannotationmodelduringaworkflowexecution.TheonlyexistingArgocomponentwiththisvaluesettotrueistheManualAnnotationEditor.
TheconfigurationParametersMetaDatacancontainmultipleconfigurationParameterMetaDataelements,eachoneprovidingadditionalinformationaboutcomponentconfigurationparametersfoundwithinthematchingUIMAXMLdescriptor.AconfigurationParameterMetaDataelementmustcontainanamesubelement(whichhasthesamenameastheconfigurationparameteritisreferencingintheUIMAdescriptor)andauiTypesubelement(whichisusedbyArgotoprovidethemostappropriateUIwidgettotheend-user).ValidvaluesforuiTypearetime,date,datetime.enum,password,type,documentandtext.
Figure4showshowconfigurationParameterMetaDataelementsareconfigurediftheiruiTypevalueiseithertime,dateordatetime.ThecorrespondingUIMAconfigurationparametermustbeoftypestring.Argoneedstoknowhowtoformatthetimechosenbytheend-userusingacalendarUIwidget,sothishastobespecifiedintheformatsubelement,asdemonstratedinFigure4.
<configurationParameterMetaData>
<name>timeParam</name>
<uiType>time</uiType>
<uiConfiguration>
<format>HH:mm:ss</format>
</uiConfiguration>
</configurationParameterMetaData>
Figure4:Adate,timeordatetimeconfigurationparameter
GuidefordeployingUIMAcomponentsintheArgoplatform
216
Forconfigurationparametersthathaveafixedsetofvalues,auiTypevalueofenumisrequired.Thesefixedvaluesshouldbelistedasasetofvalueelements,nestedwithinavalueselement,asshowninFigure5.
<configurationParameterMetaData>
<name>enumParam</name>
<uiType>enum</uiType>
<values>
<value>red</value>
<value>green</value>
<value>blue</value>
</values>
</configurationParameterMetaData>
Figure5:Anenumconfigurationparameter
Configurationparameterscontainingsensitiveinformation,suchaspasswords,shoulduseauiTypevalueofpassword.Thishidesthevalueoftheparameterfromtheuserand,onceentered,doesnotgettransmittedbacktotheArgoUI,foradditionalsecurity.Additionally,itisalsopossibletospecifytheminimumand/orthemaximumnumberofcharacterswhichthisvaluecanhold,usingminandmaxelementswithinthevalueConstraintselement.SeeFigure6foranexample.
<configurationParameterMetaData>
<name>passwordParam</name>
<uiType>password</uiType>
<valueConstraints>
<min>5</min>
<max>10</max>
</valueConstraints>
</configurationParameterMetaData>
Figure6:Apasswordconfigurationparameter
TomakeiteasierforausertoselectUIMAtype(s)withintheArgoUI,anyconfigurationparametersrepresentingtypesshouldhaveuiTypevalueoftype.Thiswillresultinasearchablelistofalltypes,knowntoArgo,beingdisplayedtotheend-userwhentheyareconfiguringthecomponent,fromwhichtherequiredtypescanbeselected.SeeFigure7foranexample.
<configurationParameterMetaData>
<name>typeParam</name>
<uiType>type</uiType>
</configurationParameterMetaData>
GuidefordeployingUIMAcomponentsintheArgoplatform
217
Figure7:Atypeconfigurationparameter
Configurationparameterswhichrefertolocalfilesand/ordirectoriesshouldhavetheuiTypevalueofdocument.Thiswillallowanend-usertoselectfilesfromtheArgoFileStoreusingafileselectordialog.Figure8showsanexampleconfigurationandatabledeclaringtheUIconfigurationparametersavailabletoconfigurethefiledialogcanbefoundinFigure9.
<configurationParameterMetaData>
<name>documentParam</name>
<uiType>document</uiType>
<uiConfiguration>
<selectFile>true</selectFile>
<selectFolder>false</selectFolder>
<selectFilesRecursively>false</selectFilesRecursively>
<hideFiles>false</hideFiles>
<windowCaption>Savefileas...</windowCaption>
</uiConfiguration>
</configurationParameterMetaData>
Figure8:Adocumentconfigurationparameter
selectFile Boolean Allowsausertoselectafileinthedialog
selectFolder Boolean Allowsausertoselectafolderinthedialog
selectFilesRecursively Boolean Recursivelyselectsallofthefilesand/orfolders,undertheselectedfolders.
hideFiles Boolean Onlyshowdirectoriesinthedialog
windowCaption Boolean Acaptiontodisplayinthefilebrowserwindow
Figure9:uiConfigurationelements
ConfigurationparametersthatarelikelytoholdalargeamountoftextshoulduseauiTypevalueoftext.Thiswillresultinalargertextboxbeingmadeavailabletotheend-user.ThesizeofthetextareaisconfiguredusingcharacterWidthandvisibleLineselements,nestedwithintheuiConfigurationelement,asshowninFigure10.
<configurationParameterMetaData>
<name>textAreaParam</name>
<uiType>text</uiType>
<uiConfiguration>
<characterWidth>30</characterWidth>
<visibleLines>5</visibleLines>
</uiConfiguration>
</configurationParameterMetaData>
Figure10:Atextareaconfigurationparameter
GuidefordeployingUIMAcomponentsintheArgoplatform
218
AnexampleofaUIMAXMLdescriptor,alongwithitscorrespondingArgoXMLdescriptor,canbefoundfurtherbelow.
Mavenpom.xmltemplateforArgocomponents
<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/X
MLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.
apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>xyz.company.uima</groupId>
<artifactId>uima-component</artifactId>
<version>1.0</version>
<build>
<resources>
<resource>
<directory>desc</directory>
</resource>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<stripVersion>true</stripVersion><outputDirectory>${project.build.directory}/pearPac
kaging/lib</outputDirectory>
<overWriteReleases>true</overWriteReleases>
<overWriteSnapshots>true</overWriteSnapshots>
<includeScope>runtime</includeScope>
<excludeArtifactIds>U_compareTypeSystem,uimaj-core</excludeArtifactId
s>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.uima</groupId>
<artifactId>PearPackagingMavenPlugin</artifactId>
<version>2.4.0</version>
<extensions>true</extensions>
<executions>
GuidefordeployingUIMAcomponentsintheArgoplatform
219
<execution>
<phase>package</phase>
<configuration><mainComponentDesc>desc/xyz/company/uima/uima-component.xml</mainComp
onentDesc><componentId>${project.groupId}.${project.artifactId}</componentId>
</configuration>
<goals>
<goal>package</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-install-plugin</artifactId>
<version>2.3.1</version>
<executions>
<execution>
<phase>install</phase>
<configuration>
<packaging>pear</packaging>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<file>${project.build.directory}/${project.groupId}.${project.artifactId}.pear
</file>
</configuration>
<goals>
<goal>install-file</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<versionRange>[1.0.0,)</versionRange>
<goals>
<goal>copy-dependencies</goal>
</goals>
</pluginExecutionFilter>
<action>
GuidefordeployingUIMAcomponentsintheArgoplatform
220
<execute>
<runOnIncremental>false</runOnIncremental>
</execute>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
<dependencies>
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-core</artifactId>
<version>2.7.0</version>
</dependency>
<dependency>
<groupId>org.u_compare</groupId>
<artifactId>U_compareTypeSystem</artifactId>
<version>1.1</version>
</dependency>
</dependencies>
</project>
ArgoXMLDescriptorexample
<argoDescriptor>
<tags>
<tag>categoryA</tag>
<tag>finance</tag>
</tags>
<minimumMemoryInMbs>256</minimumMemoryInMbs>
<interactive>false</interactive>
<configurationParametersMetaData>
<configurationParameterMetaData>
<name>timeParam</name>
<uiType>time</uiType>
<uiConfiguration>
<format>HH:mm:ss</format>
</uiConfiguration>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>dateParam</name>
<uiType>date</uiType>
<uiConfiguration>
<format>yyyy/MM/dd</format>
GuidefordeployingUIMAcomponentsintheArgoplatform
221
</uiConfiguration>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>dateTimeParam</name>
<uiType>datetime</uiType>
<uiConfiguration>
<format>yyyy/MM/ddHH:mm:ss</format>
</uiConfiguration>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>enumParam</name>
<uiType>enum</uiType>
<values>
<value>red</value>
<value>green</value>
<value>blue</value>
</values>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>passwordParam</name>
<uiType>password</uiType>
<uiConfiguration>
</uiConfiguration>
<valueConstraints>
<min>5</min>
<max>10</max>
</valueConstraints>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>typeParam</name>
<uiType>type</uiType>
<uiConfiguration>
</uiConfiguration>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>documentParam</name>
<uiType>document</uiType>
<uiConfiguration>
<selectFile>true</selectFile>
<selectFolder>false</selectFolder>
<selectFilesRecursively>false</selectFilesRecursively>
<hideFiles>false</hideFiles>
<windowCaption>Savefileas...</windowCaption>
</uiConfiguration>
</configurationParameterMetaData>
<configurationParameterMetaData>
<name>textAreaParam</name>
<uiType>text</uiType>
<uiConfiguration>
<characterWidth>30</characterWidth>
<visibleLines>5</visibleLines>
</uiConfiguration>
</configurationParameterMetaData>
GuidefordeployingUIMAcomponentsintheArgoplatform
222
</configurationParametersMetaData>
</argoDescriptor>
UIMAAnalysisEngineXMLDescriptorreferencedbytheArgoXMLDescriptor
<?xmlversion="1.0"encoding="UTF-8"?>
<analysisEngineDescriptionxmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>org.apache.uima.Java</frameworkImplementation>
<primitive>true</primitive>
<annotatorImplementationName>xyz.company.uima.UimaComponent</annotatorImplementationNa
me>
<analysisEngineMetaData>
<name>UIMAComponent</name>
<description/>
<version>1.0</version>
<vendor/>
<configurationParameters>
<configurationParameter>
<name>timeParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>dateParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>dateTimeParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>enumParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>passwordParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>typeParam</name>
<type>String</type>
<multiValued>false</multiValued>
GuidefordeployingUIMAcomponentsintheArgoplatform
223
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>documentParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
<configurationParameter>
<name>textAreaParam</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings/>
<typeSystemDescription/>
<typePriorities/>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs/>
<outputs/>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>
GuidefordeployingUIMAcomponentsintheArgoplatform
224
Recommendedancillaryknowledgeresources
Inordertofurtherencourageinteroperability,OpenMinTeDmakesspecificrecommendationsaboutparticularknowledgeresourcesthatTDMtoolsandservicesshoulduse.TheserecommendationsareintheareasoflinguisticsandoftheinitialdomainsofusetargetedbyOpenMinTeD.Thecurrentrecommendationsshouldnotbeseenasafinalandstaticset.Theywillevolvewithexperience,andasOpenMinTeDisusedforTDMofnewdomains.Usersarethereforeencouragedtousetheexistingrecommendations,buttomakeuseofotherswherethesearenotsuitable.
TDMtoolsandservicesshoulduseresourcesfromthefollowinginitiallistwherepossible.Wherethisisnotpossible,knowledgeresourceauthorsareencouragedtoprovidelinkagesbetweentheirownresourceandthosegivenhere,ortoanyotherwidelyusedorstandardLinkedDataknowledgeresource.Thislistofrecommendedresourcesshouldbeseenasafirstversion,andwillbeextended.
SocialsciencesresourcesTheSoz
AgricultureandagronomyresourcesAgrovocOntologiesfromAgroPortal
LifesciencesresourcesOboInOwlMeSH(availableinLOD)BioCNeuroLexBioLexicon
LinguisticresourcesLAPPS(vocabularyofcorelinguisticobjects)UniversalDependencies(partofspeechtags,featuresformorphologyandsyntacticdependencies)OLIA(referencemodelandannotationmodelsformorphology,morphosyntax,dependencies)PennTreebank(partofspeechtagsandfeaturesofmorphology)ISOcat/CCR(linguisticandmetadataterminology)GOLD(linguisticontology)
Typesystems
1
Recommendedancillaryknowledgeresources
225
*usedbythesoftwarecomponentsintegratedintheOpenMinTeDplatform(GATE,DKPRO,ALVIS,ARGOandILSP)ISOcathasrecentlymovedtotheClarinConceptRegistry(CCR)andiscurrentlyundercuration.
1
Recommendedancillaryknowledgeresources
226
RecommendedschemaforsoftwareresourcesThissectionincludesasynopsisoftherecommendedschemaforsofwareresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements,onlyasregardselementsrelatedtotheresourceitself.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.
OMTD-SHAREelement Usage
resourceType M
resourceName M
description M
identifier M
version M
componentDistributionMedium M
componentType M
licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M
versionoflicence M
contactEmailorlandingPage(oneofthetwomustbeprovided) M
contactPerson(identifierorpersonName) R
contactGroup(identifierororganizationName) R
mailingList(mailingListName,subscribe,unsubscribe,post,archive,otherArchive) R
issueTracker R
onlineHelpURL R
mustBeCitedWith R
downloadURLoraccessURL(oneofthetwoshouldbeprovided) Mwhenapplicable
resourceCreator(personororganization,describedwithidentifierorname) R
mediaTypeinsideinputContentResourceInfooroutputResourceInfo(i.e.mediaTypeofinputandoutputresource)
Mwhenapplicable
resourceTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
Recommendedschemaforsoftwareresources
227
languageinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
characterEncodinginsideinputContentResourceInfooroutputResourceInfo
Rwhenapplicable
mimeTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo
Rwhenapplicable
typesysteminsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
tagsetinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
annotationLevelinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable
typesystem R
tagset R
annotationResource R
framework R
forparameters:parameterName,description,parameterType,mandatory,multiValue
Mwhenapplicable
relationType=isCompatibleWith(externalrelation;linktomodels,annotationresourcesetc.thatcanbeusedwiththecomponent) R
Recommendedschemaforsoftwareresources
228
resourceType
Usage
Mandatory
Type
Closedcontrolledvocabulary
Attributes
Controlledvocabularyreferenceand/orvalues
ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component
Definition/Explanations
Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput
Recommendedusage
Forcomponents,thefixedvalue"component"mustbeaddedautomatically
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType
Recommendedusageistouseoneofthevalues"software","service"or"workflow"fordatacite:resourceTypeGeneral
resourceType
229
resourceName
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Thefullnamebywhichtheresourceisknown
Recommendedusage
Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPtagger”insteadofjust“taggerofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.N.B.Thiselementisintendedforahuman-readable/human-understandablenamefortheresource.
Relationtoothermetadataschemas
MavenPOM4.0.0:nameGATE:nameUIMA/UIMA-fit:nameDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title
resourceName
230
description
Usage
Mandatory
Type
Multilingualfreetext
Attributes
xs:lang
Definition/Explanations
Providesthedescriptionoftheresourceinprose
Recommendedusage
Giveabriefyetinformativedescriptionofthefunctionalitiesofthecomponent,thelanguage(s)itworkson,inputrequirementsetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.
Relationtoothermetadataschemas
MavenPOM4.0.0:descriptionGATE:commentUIMA/UIMA-fit:descriptionDCMI:skos:exactMatchdct:descriptionDataCite4.0:Description&descriptionTypewithvalue"abstract"
description
231
identifier
Usage
Mandatory
Type
freetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI
Definition/Explanations
ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource
Recommendedusage
Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither
theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcomponentsharvestedfromMaven,theMavenidcanbeusedwithareferencetotheMavenscheme(https://maven.apache.org/pom.html#Maven_Coordinates\).ThisiscombinedwiththeJavafullyqualifiedclassnamingconventionstogivethefollowingcoordinates:groupId:artifactId:version:(packaging):(classifier)#class
Relationtoothermetadataschemas
MavenPOM4.0.0:groupId&artifcactId&version&packaging&classifier,withresourceIdentifierSchemeURI="https://maven.apache.org/pom.html#Maven_Coordinates"GATE:classUIMA/UIMA-fit:classDCMI:skos:narrowMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)
identifier
232
identifier
233
version
Usage
Recommended
Definition/Explanations
Anystring,usuallyanumber,thatidentifiestheversionofaresource
Recommendedusage
Forcomponents,therecommendedpracticeistofollowthesemanticversioning(http://semver.org/).N.B."version"shouldnotbeconfusedwiththerelationthatlinkstogetheraspecificresourcewithitsvariousenrichedormodifiedversions(e.g.annotatedversion,subsetetc.).
Relationtoothermetadataschemas
MavenPOM4.0.0:versionDCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version
version
234
componentType
Usage
Mandatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:componentType:access,reader,writer,supportComponent,visualizer,debugger,validator,viewer,corpusViewer,lexiconViewer,editor,mlTrainer,mlPredictor,featureExtractor,dataSplitter,dataMerger,converter,evaluator,flowController,scriptBasedAnalyzer,matcher,gazetteerBasedComponent,crowdSourcingComponent,dataCollector,crawler,processingComponent,annotator,segmenter,stemmer,lemmatizer,tagger,chunker,parser,coreferenceAnnotator,namedEntityRecognizer,semanticsAnnotator,srlAnnotator,readabilityAnnotator,aligner,generator,summarizer,simplifier,naturalLanguageGenerator,prePostProcessor,spellingChecker,grammarChecker,normalizer,filters,extractor,topicExtractor,documentClassifier,languageIdentifier,sentimentAnalyzer,keywordsExtractor,terminologyExtractor,contradictionDetector,emotionRecognizer,eventExtractor,persuasiveExpressionMiner,informationExtractor,lexiconExtractorFromCorpora,lexiconExtractorFromLexica,wordSenseDisambiguator,qualitativeAnalyser
Definition/Explanations
Specifiesthetypeofthecomponentintermsofthefunction/taskitperforms
Recommendedusage
Please,selectoneofthepredefinedvalues.Itshouldbenotedthatthevaluesarehierarchicallyorganised,soit'srecommendedtoselectthemorespecificvalueapplicable(e.g."visualizer"ratherthanthebroader"supportComponent").Thecurrentlistofvaluesisintendedforusemainlybysimplecomponentsratherthanworkflowsorfullapplications.Thelistwillbefurtherenrichedwithvaluesthattargettheend-usersalso.
Relationtoothermetadataschemas
DCMI:skos:narrowMatchdct:type
componentType
235
componentType
236
licence
Usage
Mandatoryuponconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms
Definition/Explanations
Thelicenceofusefortheresource
Recommendedusage
Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:
usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption
licence
237
usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.
Relationtoothermetadataschemas
MavenPOM4.0.0:license/nameDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights
licence
238
rightsStmtName
Usage
Mandatoryuponconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
openAccessclosedAccessembargoedAccessrestrictedAccess:
Definition/Explanations
Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.
ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights
rightsStmtName
239
rightsStmtURL
Usage
Mandatoryuponconditions
Conditionsforusage
eitherlicenceorrightsStmtmustbefilledin
Type
URLpattern
Definition/Explanations
LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.
Recommendedusage
The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI
rightsStmtURL
240
nonStandardLicenceTermsURL
Usage
Mandatoryuponconditions
Conditionsforusage
whenoneofthevalues"nonStandardLicenceTerms"or"proprietary"isselectedfor"licence"
Type
URLpattern
Definition/Explanations
Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices
Recommendedusage
Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.
Relationtoothermetadataschemas
MavenPOM4.0.0:license/urlDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI
nonStandardLicenceTermsURL
241
versionoflicence
Usage
Mandatory
Type
freetext
Definition/Explanations
Theversionofthelicence
Recommendedusage
Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.
Relationtoothermetadataschemas
DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)
versionoflicence
242
componentDistributionMedium
Usage
Mandatory
Type
closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms-omtd:componentDistributionMedium:webService,sourceCode,executableCode,sourceAndExecutableCode
Definition/Explanations
Themedium/formofthedistribution(e.g.downloadableresource,accessiblethroughinterface,sourcecodeetc.)
componentDistributionMedium
243
accessURL
Usage
Recommendedunderconditions
Type
urlpattern
Definition/Explanations
Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted
Recommendedusage
Pleaseuseforcomponentsthatareexecutableaswebservices
accessURL
244
downloadURL
Usage
Recommendedunderconditions
Type
urlpattern
Definition/Explanations
Anyurlwheretheresourcecanbedownloadedfrom
Recommendedusage
Please,useifthecomponentisdistributedassourceand/orexecutablecode,andhastobedownloadedinordertobeexecuted;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository
Relationtoothermetadataschemas
MavenPOM4.0.0:canbedonethroughID
downloadURL
245
contactEmail
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
emailpattern
Definition/Explanations
Ageneralemailaddressthatcanbeusedascontactpointforaresource([email protected])
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
contactEmail
246
landingPage
Usage
Mandatoryunderconditions
Conditionsforusage
AnemailoralandingPagemustbeprovided
Type
URLpattern
Definition/Explanations
AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom
Recommendedusage
Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:
giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element
Relationtoothermetadataschemas
MavenPOM4.0.0:url
landingPage
247
contactPerson(identifierorpersonName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.
Relationtoothermetadataschemas
DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)
contactPerson(identifierorpersonName)
248
contactGroup(identifierororganizationName)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource
Recommendedusage
Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.
Relationtoothermetadataschemas
MavenPOM4.0.0:developers
contactGroup(identifierororganizationName)
249
mailingListInfo
Usage
Recommended
Type
setofmetadataelements
Definition/Explanations
Setofmetadataelements(name,subscribe,unsbuscribe,post,archive,otherArchive)requiredfordocumentingamailing
Recommendedusage
Mailinglistsareimportantfortrackinginformationusefulfordevelopersand/orusers;thewholesetofelementsinthemailingListgroupcanberepeatedforrecordingmultiplemailinglists.
Relationtoothermetadataschemas
MavenPOM4.0.0:Mailinglist
mailingListInfo
250
onlineHelpURL
Usage
Recommended
Type
urlpattern
Definition/Explanations
Aurlintendedforend-usersprovidingusefulinformationregardingthecomponetusage/application,e.g.executiontips,faq's,helpforumsetc.
Relationtoothermetadataschemas
GATE:helpurl
onlineHelpURL
251
issueTracker
Usage
Recommended
Type
urlpattern
Definition/Explanations
Theurlwhereissues,bugs,andfeaturerequestsshouldbesubmitted;thisinformationisimportantfors/wdevelopers
Relationtoothermetadataschemas
MavenPOM4.0.0:issuemanagement/url
issueTracker
252
mustBeCitedWith
Usage
Recommended
Type
freetextoridentifier
Definition/Explanations
Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)
Recommendedusage
Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither
theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.
mustBeCitedWith
253
resourceCreator(personororganization,describedwithidentifierorname)
Usage
Recommended
Type
identifierormultilingualfreetext
Attributes
forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)
Definition/Explanations
Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource
Recommendedusage
Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.
Relationtoothermetadataschemas
MavenPOM4.0.0:developersDCMI:skos:closeMatchdct:creator
resourceCreator(personororganization,describedwithidentifierorname)
254
DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4
resourceCreator(personororganization,describedwithidentifierorname)
255
mediaTypeinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
Closedcontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mediaType:text,audio,video,image
Definition/Explanations
Specifiesthemediatypeoftheresourcethatthecomponentprocessesand/orproduces.
Recommendedusage
OpenMinTeDonlyhandlestextresources,soonly"text"mustbeallowed.
mediaTypeinsideinputContentResourceInfooroutputResourceInfo
256
resourceTypeinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
controlledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:resourceType:corpus,document,userInputText,lexicalConceptualResource,languageDescription
Definition/Explanations
Thetypeoftheresourcethatthecomponenttakesasinputorproducesasoutput
Recommendedusage
Pleaseuseespeciallyforreadersandwritersinordertospecifytheresourcetypetheycanprocessorproduce;e.g.forreaders,whethertheytakeasinputadocument(singlefile)orcollectionoffiles(corpus).
Relationtoothermetadataschemas
GATE:parameters/UIMA/UIMA-fit:Parametersinput/outputtypes
resourceTypeinsideinputContentResourceInfooroutputResourceInfo
257
languageinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):
Definition/Explanations
Thelanguage(s)ofthetextthatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtoIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.
Recommendedusage
Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)thatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtotheIETFBCP47guidelines.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.
Relationtoothermetadataschemas
UIMA/UIMA-fit:@LanguageCapabilityDataCite4.0:language-butthisisthelanguageoftheresourceandnotofinput/output
languageinsideinputContentResourceInfooroutputResourceInfo
258
characterEncodinginsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:characterEncoding:alonglistofpopularcharacterencodings
Definition/Explanations
Thenameofthecharacterencodingusedintheresourceorsupportedbythecomponent
Recommendedusage
Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.
Relationtoothermetadataschemas
GATE:Parameters/encoding
characterEncodinginsideinputContentResourceInfooroutputResourceInfo
259
mimeTypeinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t
Definition/Explanations
Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentsupports,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)
Recommendedusage
Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcomponentsthatsupportmultiplemimetypes.
mimeTypeinsideinputContentResourceInfooroutputResourceInfo
260
Relationtoothermetadataschemas
UIMA/UIMA-fit:@MimeTypeCapability
mimeTypeinsideinputContentResourceInfooroutputResourceInfo
261
dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
aclAnthologyaimedCorpusalvisEnrichedDocumentbioNLPbioNLP;format-variant=ST2013a1_a2bnccadixeJSONconll2000conll2002conll2006conll2007conll2009conll2012dataSiftfactoredTagLemgategeniagrafhtml5Microdatai2b2imsCwbjdbckeaCorpuslllnegraExportpmlptb;format-variant=chunkedptb;format-variant=combinedrelptigertupp-dztwitteruimaBinaryCasuimaCASDumpweb1txces;format-variant=ilsp:
Definition/Explanations
Thesupplementarylevelofdataformat
Recommendedusage
Please,usetofurtherspecifytheformatoftheresourcesupportedbythecomponent(asinputoroutput).Forinteroperabilityreasons,itisimportanttostandardiseasfaraspossiblethiselement;thisiswhyalistofvaluesincludingtheformatscurrentlysupportedbycomponentsintheOMTDregistryisprovided.Wherepossible,itisalsorecommendedtousethe"documentationURL"elementwithinformationandexamplesaboutthespecificdataformat.
Relationtoothermetadataschemas
UIMA/UIMA-fit:@MimeTypeCapability
dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo
262
typesysteminsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
whenthes/wcomponenttakesasinput(orprovidesasoutput)aresourcethatusesaspecifictypesystem
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
typesysteminsideinputContentResourceInfooroutputResourceInfo
263
tagsetinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
tagsetinsideinputContentResourceInfooroutputResourceInfo
264
annotationLevelinsideinputContentResourceInfooroutputResourceInfo
Usage
Mandatorywhenapplicable
Conditionsforusage
iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory
Type
opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-
annotationLevelinsideinputContentResourceInfooroutputResourceInfo
265
gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other
Definition/Explanations
Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput
Relationtoothermetadataschemas
UIMA/UIMA-fit:@TypeCapability
annotationLevelinsideinputContentResourceInfooroutputResourceInfo
266
typesysteminsidecomponentDependencies
Usage
Mandatorywhenapplicable
Conditionsforusage
whenthes/wcomponentusesaspecifictypesystemforitsoperation
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
typesysteminsidecomponentDependencies
267
tagsetinsidecomponentDependencies
Usage
Mandatorywhenapplicable
Conditionsforusage
whenthes/wcomponentusesaspecifictagsetforitsoperation
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
tagsetinsidecomponentDependencies
268
annotationResourceinsidecomponentDependencies
Usage
Mandatorywhenapplicable
Conditionsforusage
whenthes/wcomponentusesaspecificannotationresourceforitsoperation
Type
identifierormultilingualfreetext
Attributes
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Aresource(e.g.ontology,terminologicalresource)usedforannotatingadocument,corpus,sentenceetc.
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.
annotationResourceinsidecomponentDependencies
269
framework
Usage
Recommended
Controlledvocabularyreferenceand/orvalues
UIMAGATEAlvisNLPother:
Definition/Explanations
Theframeworkusedfordevelopinganddeployingthecomponent
framework
270
relationType
Usage
Recommended
Type
Opencontrolledvocabulary
Controlledvocabularyreferenceand/orvalues
ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith
Definition/Explanations
Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit
Recommendedusage
Forcomponents,therecommendedrelationisisCompatibleWithholdingwithmodels,butanyrelationTypecanbeusedasappropriate.
Relationtoothermetadataschemas
DataCite4.0:skos:closeMatchdatacite:relationType
relationType
271
relatedResource1
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
relatedResource1
272
relatedResource2
Usage
Mandatorywhenapplicable
Conditionsforusage
whenrelationTypeisfilledin
Type
ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)
Definition/Explanations
Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType
Recommendedusage
Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.
relatedResource2
273
TheOMTD-SHAREmetadataschemaTheOMTD-SHAREmetadataschema istherecommendedschemaforthedescriptionoftheresources.Ithasbeenconceivedanddesignedinordertoserveasafacilitator,providingtheinteroperabilitybridgebetweenthevariousresourcetypesinvolvedinTDMprocesses,andasanintermediarywiththetargetaudience,includingTDMdevelopersandend-users.
Itsdesigntakesintoconsiderationthefactthatbothresourcesanduserscomefromdifferentscientificcommunitiesandtriestoachieveinteroperabilitythroughacommoncorevocabularyforthedescriptionofresourcesandtheirproperties,establishinglinkstothevocabulariesalreadyusedbythevarioussourcesforthispurpose.Standardsandbestpracticesofthesourcecommunitiesaretakenaboardtothebestextentpossible.ThemainprinciplesandstrategiesemployedinthedesignoftheOMTD-SHAREschemaconsistofthefollowing:
coverneedsofresourcediscoverabilityandTDMprocessingcoverdocumentationneedsofallresourcetypesinvolvedinTDMbeflexibleenoughtosupportvaryingdegreesofdocumentationcompletenessorganizetheschemaelementsandaccommodatecommonvs.particularfeaturesofresourcesreusewhatisavailablevs.createandrecommendnewelementsandvaluesstandardize/normalizeuserinputvs.allowforfreeuserinputdocumentprocessingprocedureandoutputs.
1
TheOMTD-SHAREmetadataschema
274
IthaslargelybeenbasedontheMETA-SHAREmetadataschema[Gavrilidouetal.2012],whichcatersforthedescriptionoflanguageresources,encompassingbothdata(textual,multimodal/multimediaandlexicaldata,grammars,languagemodelsetc.)andtechnologies(tools/services)usedfortheirprocessing.TheOMTD-SHAREismorerestrictedinthesensethatitfocusesontextresourcesonly,whileitalsoextendsthebasicschemainordertoincludeTDM-specificconcepts,anddescribeinanenhancedwayprocessingproceduresandworkflows.
AsinMETA-SHARE,theschemasetsouttodocumentthefulllifecycleofaresource,whichalsoincludesatleastaminimaldocumentationofthesatelliteentitiesthatparticipateinit,especiallytherelationsthatholdbetweenthem.TheOMTD-SHAREdatamodelthuscomprisesofthefollowingentities:
theresources,furtherclassifiedinto:corpora,i.e.datasetsoftextdocuments-mainlyscholarlypublicationsinOMTD-SHARElexical/conceptualresources,includinglexica,ontologies,termlists,gazetteersetc.,butalsotagsetsandannotationschemas,whichareusedforannotatingcorporalanguagedescriptions,whichmainlyrefertocomputationalgrammarsmachinelearningandstatisticalmodels ,softwarecomponents,piecesofsoftware,toolsofferedaslocallyexecutablecodesoraswebservices,wrappedinaworkfloworasstandaloneend-to-endapplications,and,finally,publications,whichconstituteapeculiarresourcetype,astheyareviewedinOpenMinTeDonlyinacollectiveform,asa"corpus",
butalsosatelliteentities,suchastheactors,beitpersonsororganizationsthathavecreatedtheresources,ortheprojectsthathavefundedthemorwheretheyareused.
Obviously,lexical/conceptualresources,languagedescriptionsandmodelsareancillaryresourcesusedfortheTDMoperation.Corporaareanin-betweencaseastheymayrefertocorporausedfortheTDMoperation,suchastrainingorevaluationcorporaandthusplayasupportiverole,ortheycanbecomposedofscholarlypublications,inwhichcasetheyareapproachedasapropercontentresourcetobemined.
Theschemaiscomposedofmetadataelementsthatareusedtodescribepropertiesandrelationsbetweenalltheseentities.Someoftheseelements,especiallythosethatpertaintoadministrativefeatures(e.g.identification,contact,licensinginformationetc.),arecommontoalltypesofresources,whileotherelements,mainlythoserepresentingtechnicalfeaturesaboutthecontentsandformatofresources,differacrosstypes.Asaforesaid,publicationsdifferfromotherresourcestypes:themetadataelementsrecommendedfortheirdescriptionmainlyderivefromtheneedofservingasselectioncriteriainthecorpusbuildingprocess.
2
3
TheOMTD-SHAREmetadataschema
275
OneofthecharacteristicfeaturesoftheSHAREfamilyofschemas istheadoptionofthecomponent-basedmechanism(ComponentMetaDataInfrastructure,CMDI),accordingtowhichsemanticallycoherentelementsaregroupedtogethertoformcomponents [Broederetal.,2008].Forinstance,thelicensingmoduleincludeselementssuchasthenameandURLofalicence,attributiontext,copyrightholders,etc.Forthesakeofsimplification,thecontainerelementsusedforthisgroupingwillnotbepresentedintheguidelinesunlessrequired.
TheOMTD-SHAREschemaclassifieselementsinto3levelsofoptionality:
mandatory:elementsthatarenecessaryforintendedpurposes,i.e.fordiscoveringresourcesandfortriggeringoperationsbetweencontentands/wcomponentsrecommended:elementsthatcanhelpthecurrentorfutureuseoftheresource,orusefulinformationthatprovidershavenotyetstandardizedoptional:allremaininginformationrelatedtothelifecycleofaresource.
TheschemaiscurrentlyimplementedasanXSD .AnimportantdifferencefromMETA-SHAREliesintheorganisationvis-a-visthedifferentresourcetypescovered:whileMETA-SHAREdescribesallresourcestypesinonecommonXSD,inOMTD-SHARE,theresourcetypesaredescribedinamoremodularwayasseparatesetsofXSDs.
WorkisongoingforproducingalsoanRDF/OWLversion,whichwillbedocumentedinthenextreleaseoftheguidelines.
.ThefullOMTD-SHAREschemaisdocumentedat:https://openminted.github.io/releases/omtd-share/.↩
.Modelscouldbeconsideredasasubtypeoflanguagedescriptions,butwedecidedtokeepitdistinctbecauseithadalotofpropertiesthatdifferentiateditfromgrammars;atthispointitwasalsoconsideredbettertokeepthemapartasitwouldenhancetheirdiscoverability.↩
.BasedontheMETA-SHAREschema,fourmoreadaptationsarenowavailable:ELRC-SHARE,clarin:el,andOMTD-SHARE.TheMETA-SHAREschemahasalsobeenimplementedasanRDF/OWLontologywiththecollaborationoftheld4ltW3Cgroup.↩
.Toavoidconfusionwiththeterm"component"alsousedforsoftwarecomponents,wewillfromnowonrefertothisconceptas"modules".↩
.ThecurrentversionofXSD'sisavailableat:https://github.com/openminted/omtd-share_metadata_schemaandthedocumentationofv1.0.0at:https://openminted.github.io/releases/omtd-share/1.0.0/↩
3
4
5
1
2
3
4
5
TheOMTD-SHAREmetadataschema
276
TheOMTD-SHAREmetadataschema
277
Glossary
annotation(text/corpusannotation)Anotebywayofexplanationorcommentaddedtoatextordiagram[OxfordEnglishDictionary,https://en.oxforddictionaries.com/definition/annotation].InOpenMinTeD,thetermrefersmainlytotextorcorpusannotation,whichisthepracticeofaddinginterpretativelinguisticinformationgroundedinaknowledgeresourcetoatextorcorpusrespectively.Forexample,onecommontypeofannotationistheadditionoftags,orlabels,indicatingthewordclasstowhichlexicalunitsinatextbelong;thesetagscomefromapredefinedset(e.g.Noun,Verb,Preposition,etc.).Semanticlabelingwithtermsandconceptsfromanontologyisanothercommonexampleofannotation.Relationshipssuchassyntacticdependenciesorsemanticrelationsthatlinkentitiesofthetextarealsoannotations.
annotationresourceAnyresourcethatcanbeusedforannotatingatext,includingpart-of-speechtagsets,annotationschemes,domain-specificontologies,etc.
annotationschemeAsetofelementsandvaluesdesignedtoannotatedata.Anannotationschemeusuallyaimstorepresentaspecificlevelofinformation,suchasmorphologicalfeaturesofwords,syntacticdependencyrelationsbetweenphrases,discourselevelinformation,etc.Itcanconsistofaflatstructureofelementsandvalues(e.g.part-of-speechtags)oritcanbemorecomplexwithinterrelatedelements(e.g.specificmorphologicalfeaturestobeusedforeachpart-of-speech).
applicationAnysoftwareprogram(orgroupofprogramsseenasawhole)intendedfortheend-userandaddressingoneormultiplerelateduserneeds.
component(softwarecomponent)
Glossary
278
Analgorithmwrappedinastandardwaysothatitcanbeintegratedasareusabletoolorservicewithinaparticularcomponent-orientedframeworksuchasUIMA,GATE,etc.
corpusAstructuredcollectionofpiecesofdata(textual,audio,video,multimodal/multimedia,etc.)typicallyofconsiderablesizeandselectedaccordingtocriteriaexternaltothesedata(e.g.size,typeoflanguage,typeofproducersorexpectedaudience,etc.)torepresentascomprehensivelyaspossibletheobjectofstudy.
datamodelAdatamodelisanabstractmodelthatorganizeselementsofdataandstandardizeshowtheyrelatetooneanotherandtopropertiesoftherealworldentities.[Wikipedia,https://en.wikipedia.org/wiki/Data_model]
distributionAnyformbywhicharesourcecanbeshared;itcanbeadownloadablePDForaplaintextfile,aformofacorpusaccessibleonlythroughawebinterface,orthesourcecodeofasoftware,etc.
documentApieceofwritten,printed,orelectronicmatterthatisprimarilyintendedforreading.
interoperabilityInteroperabilitydescribestheextenttowhichsystemsanddevicescanworktogether,exchangedata,andinterpretthatshareddata.Fortwosystemstobeinteroperable,theymustbeabletoexchangedataandsubsequentlypresentthatdatasuchthatitcanbeunderstoodbyauser.[ResearchDataAlliance,http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]
licence
Glossary
279
Apermissionorawrittenevidenceofapermissionthatconfersthelicenseetherighttodosomethingthatotherwisewouldbepreventedbythelaw.
licencecompatibility/interoperabilityTheconditionorstateinwhichtwoormorelicencescanco-existorbecombinedwithoutconflictingwitheachother.InOpenMinTeD,licencecompatibilityandlicenceinteroperabilityareusedassynonyms.
knowledgeresourceAresource(dataand/ortool)containing,producingorrepresentingknowledge;knowledgeisspecificinformationthatisrelevantforthelinguisticandconceptualinterpretationofdata.ForOpenMinTeDpurposes,thisinformationisexploitedorproducedbyTDMmodulesandtools,orexchangedbetweenthem.
languagedescriptionTheresourcedescribesalanguageorsomeaspect(s)ofalanguageviaasystematicdocumentationoflinguisticstructures.[OpenLanguageArchivesCommunity,http://www.language-archives.org/REC/type.html#language_description]Examplesincludesketchgrammar,computationalgrammar,etc.
languageresourceLanguageResources(LRs)encompass(a)datasets(textual,multimodal/multimediaandlexicaldata,grammars,languagemodels,etc.)inmachinereadableform,usedtoassistandaugmentlanguageprocessingapplications,butalso,inabroadersense,inlanguageandlanguage-mediatedresearchstudiesandapplications,and(b)tools/technologies/servicesusedfortheirprocessing.
lexical/conceptualresourceAresourceorganisedonthebasisoflexicalorconceptualentries(lexicalitems,terms,concepts,etc.)withtheirsupplementaryinformation(e.g.grammatical,semantic,statisticalinformation,etc.).InOpenMinTeD,theycanbeusedforannotationpurposes.
Glossary
280
machinelearning(ML)modelTheprocessoftraininganMLmodelinvolvesprovidinganMLalgorithm(thatis,thelearningalgorithm)withtrainingdatatolearnfrom.ThetermMLmodelreferstothemodelartifactthatiscreatedbythetrainingprocess.[http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]
metadataMetadataisstructuredinformationthatdescribes,explains,locates,orotherwisemakesiteasiertoretrieve,use,ormanageaninformationresource.Metadataisoftencalleddataaboutdataorinformationaboutinformation.[NationalInformationStandardsOrganization,Understandingmetadata,http://www.niso.org/publications/press/UnderstandingMetadata.pdf]
openaccess(OA)Thefreeandonlineavailabilityofliterature,whichallowstoread,download,copy,distribute,print,search,orlinktothefulltext,crawlarticlesforindexing,passthemasdatatosoftware,orusethemforanyotherusefulpurpose.Anavailabilitythatisgrantedwithoutfinancial,legal,ortechnicalbarriersotherthanthoseinseparablefromgainingaccesstotheinternetitself,andthoserelatedtogivingauthorscontrolovertheintegrityoftheirworkandtherighttobeproperlyacknowledgedandcited[BudapestOAInitiative2002;BethesdaStatementonOAPublishing2003;BerlinDeclarationonOAKnowledgeinScienceandHumanities2003]
OpenMinTeDinfrastructureAninfrastructurereferstothebasicstructuresandfacilitiesrequiredfortheoperationofasystem.TheOpenMinTeDinfrastructureconsistsofdifferentlayersofresources:contentresourcesthatcanbemined,ancillaryknowledgeresources,toolsandwebservices.AnyresourcethatcanberegisteredintheOpenMinTeDregistryispartoftheunderlyinginfrastructure.
OpenMinTeDplatform
Glossary
281
TheOpenMinTeDplatformbringstogetheralltheservicesthatfacilitatetheinteroperabilityaspectsoftheunderlyinginfrastructure(e.g.registration,searchandbrowsing,creationofworkflows,processing,annotation,etc.)and,thus,becomesaninfrastructuralserviceofthewiderresearchecosystem.
publicationAbook,article,etc.,thathasbeenmadeavailabletothepubliceitherviaaformalpublicationserviceorovertheinternetandisstoredatanarchiveorrepository.ForOpenMinTeDpurposes,thismainlycoversscholarlypublications.
resourceSomethingthatyoucanusetohelpyoutoachievesomething,especiallyinyourworkorstudy.[MacMillandictionary,http://www.macmillandictionary.com/dictionary/british/resource_1]
rightsstatementFormalorofficialstatementassertingthecopyrightstatusand/orthelicensingconditionsforagivenresource.Itcanbeissuedbyanauthoritativebody(e.g.http://rightsstatements.org/).ForOpenMinTeDpurposes,itcanbedeemedsimilartoa"licencecategory",groupinglicencesthatsharesimilarfeatures.
TextandDataMiningTextandDataMining(TDM)wasinitiallydefinedas“thediscoverybycomputerofnew,previouslyunknowninformation,byautomaticallyextractingandrelatinginformationfromdifferent(…)resources,torevealotherwisehiddenmeanings”(Hearst,1999),inotherwords,“anexploratorydataanalysisthatleadstothediscoveryofheretoforeunknowninformation,ortoanswersforquestionsforwhichtheanswerisnotcurrentlyknown”(Hearst,1999).[FutureTDM,http://www.futuretdm.eu/news/tdm-definition/]
service/webservicePieceofsoftwareaccessiblethroughremoteinvocationtypicallyusingsomeREST-styleAPIsorSOAPprotocols.
Glossary
282
toolPieceof(standalone)softwaretypicallyforaverylimitedtechnicalpurpose,suchasaparticularimplementationofapart-of-speechtagger(e.g.TreeTagger),atreeparsingprogram(e.g.mstparser),etc.PreferredtermsinOpenMinTeDinclude'component'and'workflow'.
workflowAseriesofsoftwarecomponentsassembledtogetherinordertoperformaspecifictask.
Glossary
283
Top Related