Metadata in NIR Fabio Vitali University of Bologna Maria Guercio University of Urbino.

Metadata in NIRFabio Vitali

University of BolognaMaria Guercio

University of Urbino

Introduction

Metadata support has always been present in NIR

Recently (June/July 2004) deep (and hot) discussions have happened within the WG about identifying a full set of metadata information

This is the result so far of the status of discussion.

Some terminology

Automatic: any task that can be completely left to the machine to be performed

– All kinds of data format conversion – E.g. XML->HTML or NIR XML -> NIR RDF.

Semi-automatic: any task that can, with a certain degree of precision, be performed by the machine, but that still requires a human for final verification and approval.

– Identification of structures – E.g. partitioning of documents, identification and interpretation

of citations Manual: any task that needs to be decided upon and

performed by a thinking human, even though the machine can provide the support to help him/her and ease the task itself.

Some terminology (2)

Objective– an objective datum is something for which no reasonable

discussion can exist as to its value.– E.g. the title of article 15, the publication date

Subjective– A subjective datum is something that requires an active

interpretation from a human that may be wrong, or for which different opinions exist

– E.g., resolution of implicit citations, classification of provisions

Explicit– A datum that is actually written somewhere in the text

Implicit– A datum that needs to be deduced from the external, or

through the application of specific reasoning

Some terminology (3) Low competence

– the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience

– E.g.: where does article 1 end and article 2 start High competence

– The kind of competence one may expect from overspecialized jurists that come to some results after careful and painful reasoning

– e.g.: dates and times in norms. Editorial intervention

– by the publisher of a document Authorial intervention

– by the author of a document

Design issues for NIR (1)

Data structure rather than application– Norme In Rete knows about applications,

but is not dependent on any use of the data and is not specifically targeted towards any specific application (except presentation)

– The same text should be marked in the same way by different editors (at least in the most fundamental structures)

Rigorous distinction of roles– The author of a norm is the legislator, the provider of

the actual XML document is the editor.– The legislator is GOD (his decisions cannot be

discussed), but He only speaks through the text of the norms.

– The editor can add a large quantity of information, but it has no official status

– The very act of adding tag is an editorial operation, subjective and open to discussions.

– In fact, any addition coming from editors (structure identification, notes, comments, interpretation) happens outside of the document content (in markup structures or in special metadata sections)

Complexity of the access to texts– Many editors, many publishing systems,

many copies in different stages of evolution

– There is no authoritative source of XML documents (only of printed documents).

– One web site could forget about updating a law to the latest version

– Use of URN allows to refer to the text of a law without identifying a single existing authoritative source.

Support for description and prescription– Tagging of existing texts can only be

descriptive (supporting any possible mess that the legislator may have put in)

– Support for legal drafting can be provided, suggesting or enforcing legal drafting rules in the writing.

Everything has a reliable name– Every legal structure needs to be

referenced and accessible.– References need to be unambiguous,

universal, definitive.– URN for whole documents, – id attributes for substructures and spans– XPointers for even smaller entities.

Clean separation between objective properties and interpretation– Objective properties can be marked by low-level

editors, while interpretation requires experts and high-level editors.

– Objective (manifest) properties include identification of boundaries (articles, slauses, etc.) and official facts about texts (publication dates, etc.)

– Interpretation includes identification of troublesome dates (dies coactu, dies valens), identification of normative content of the texts provisions, application of modifications.

Specific support for multiple interpretations– “Disposizioni” (law provisions) can be

identified and specified on the text. – Multiple different interpretations of the

same text must be allowed– So they cab be placed outside of the main

document.

Basic structures (1)

Containers– Documents, parts, subparts, articles, etc. – All numbered and titled

Text containers– Clauses (comma), list elements, etc.

Inline elements– Presentation oriented (bold, italics, etc.): discouraged,

we rely on HTML elements and CSS styles– Legal oriented (references, modifications, specification

of dates, organizations, roles, places, etc.): we rely on specific NIR elements.

Basic structures (2)

Metadata– Publication information and other data supplied

by editors (publication notes, document evolution, etc.)

– Law provisions for the interpretation of the semantics of the content

Support for irregular texts (those that do not comply with standard legal drafting rules) is available through relaxed syntax in some cases (documentoNIR)

The Schemas for NIR documents3 different DTDs

– Strict rules (prescriptive)– Loose rules (descriptive)– Light rules (support for most common

cases)– They are intercompatible

The vocabulary is exactly the sameAll light documents are also looseAll strict document are also loose

The needs for metadata

Metadata represent the only chance for putting information that was not explicitly written by the legislator.

All possible types of additional information beyond those provided in the text need to find a place here.

Uses: archival, analysis, annotations, automatic processing (consolidation), etc.

Official classification of metadata A starting point is provided by NISO (US

National Information Standards Organization) in the guide “Understanding metadata” (2004):– descriptive metadata to describe a resource “for

purposes such as discovery and identification”– structural metadata to indicate “how compounds

objects are put together”– administrative metadata to provide information “to

help manage a resource”, articulated (only) as rights management metadata and preservation metadata (“information needed to archive and preserve a resource”)

But… The distinction between descriptive, structural and

administrative metadata cannot find any concrete basis on the real practice: – All the communities involved in the preservation of

documents have developed and used relevant information related to the structure identification as a sub-set of information of their descriptive systems. They never consider the structural data as independent component.

– The ambiguity of the administrative metadata is even more evident, specifically in the digital systems where the technological components are less and less relevant for the long-term preservation and play a function for physical retrieval of a resource in a digital repository, but are considered part of the descriptive system in the case of web resources.

<xml>Changes</xml>

Metadata in the NIR DTD

Any kind of information that is provided by the editor rather than by the author.

In a way even tagging text is metadata

Deriving new versions out of an original and a few modification documents is also adding metadata.

But adding proper metadata means providing additional information to a version of a document that can be used to better search, contextualize and understand a document.

<xml>Changes</xml>

<xml>Text

</xml>

<xml>Changes</xml>

Proper metadata in the NIR DTD

Can be specified – In an external document (in RDF - still

underspecified)– In an internal section at the beginning of the

document (meta) in a NIR vocabulary– In many internal sections near the parts of the text

they refer to, in a NIR vocabulary Conversion back and forth is always possible

and automatic. Deals with description, structure,

administration, as well as: – Interpretation of content– Relationships with other documents– Comments and notes

Seven types of proper metadata Reflective information

– Things the document knows about itself Positioning information

– Things the document knows about the norms it expresses and the legal system it belongs to

Lifecycle information– Special moments in the history of the document and of its norms, and

the list of other documents that justify them Editorial notes

– Things the editor wants to attach to specific parts of the document but cannot, since the DTD does not allow editorial intervention on content

Iter-connected texts– The history of the document before its approval

Proprietary extensions Provisions (disposizioni)

Reflection info (descrittori)Refers to the document, not its content

– Publication date. Re-publications. Errata. Official clarifications.

– URN(s), aliases– Objective data, easy to find even with low competences

Storing freshness information?– A document does not usually know whether it is up-to-date.

We may deal with stale documents, dead web sites, CD-ROMs

– The best we can do is to provide them with a last-updated date

– The normative system will confirm whether this is the last interesting date, or there exist more recent versions of the same document

Positioning info (inquadramento)

Refers to the norms contained in the doc– Missing parts– Rank, function, nature and proposers of

the law– Keywords and taxonomies they belong to

Objective data (mostly), but requiring high competence to write down.

Lifecycle (altriatti) - 1

Over time, documents undergo changes (in content, efficacy, power and so on)

These change happen at specific points in time and depend on specific documents (modification documents).

Usually modification documents specify several changes on the same modified document, and may specify multiple modification dates.

Therefore it makes sense to create a secondary structure where all relevant moments and documents can be matched

Lifecycle (altriatti) - 2t01

1/1/1996

1/3/1997

12/6/1998

24/9/1999

1/1/2001

original

modified

suspended resumed

v02repealed

ID URN of law relation

r01 urn:nir:xxxxxxx12/1995 original

r02 urn:nir:xxxxxxx1/1997 passive

ID date idref

t01 1/1/1996 r01

t02 1/3/1997 r02

t03 12/6/1998 r03

t04 24/9/1999 r03

t05 1/1/2001 r04

Lifecycle (altriatti) - 3

The lifecycle section only provides information about the relation to the document that causes the modifications

This information is objective and can be provided with low competence

Information about each actual modification is optional and placed in the provision section.

That information is sometimes subjective and can be provided only with significant competence

Other sections

Editorial notes (redazionale)– Footnotes, comments, and any text the editor

feels like adding. It can point to specific places in the text through <ndr> elements

Iter-connected data (lavoripreparatori)– An official blurb detailing the iter for the approval

of the act, with presentation dates, discussion dates, etc. Plain text.

Proprietary– An open-ended section where editors can add

their own metadata with freedom.

Provisions

Provisions describe the meaning of each meaningful fragment of the text according to a predefined (and hopefully complete) taxonomy (ontology???)

Divided in three main sections plus a residual category:– Justifications– Analytical provisions– Modifications – Other

Justifications

Some norms (e.g., decrees) introduce before the actual text a foreword providing a number of justifications:– Considered…– Consulted…– Based on a proposal by– Considering…– Etc.

Analytical provisions

Describe properties and meaning of fragments of the actual text.

A full taxonomy exists, including concepts like definition, obligation, right, etc.

Carlo will be speaking about them

Modifications In a modifying law, each modification can be

described in detail with a provision. The provision describes in details what kind of

modification, the document it is applied to, where inside it, and when.

Possible modifications are: abrogation, substitution,insertion, renumbering, change of terms, prorogation, repetition, suspension, retro-activity, ultra-activity, etc (a total of 24 different types).

Currently no way to express normal case (dies coactu = dies valens = 15 days after publication for the whole act), but a way will be found soon.

Arguments for provisions

All provisions have some specific arguments, plus some shared arguments

E.g.: <motivazioni>

<pos href=“#art12com5”/><destinatario>sindaco</destinatario><controparte>ufficio tributi</controparte><termine da=“r01” a=“r02”/>

</obbligo>…

</regole>

Important shared arguments are positions and terms

Positions All provisions point to a position inside the

document where the text of the provision is placed. <articolo id="art1">

…<obbligo>

</obbligo>

The pos element points to the id, or XPointer, or the text content, of the part of the document that contains the provision.

Specify conditions, and specific efficacy (dies coactu) and validity (dies valens) intervals.

No formal language exists yet for specifying conditions– E.g.: “after the approval of the

corresponding regulation”Dates are specified by referring to the

id of the relevant date as placed in the lifecycle section.

Conclusions Metadata are still under heavy evolution within

the NIR WG. In the last 4 month a major work has been

started, in order to perform a systematic analysis of the desired metadata information for NIR documents.

I haven’t even mentioned namespaces Some details are still shaky (required elements,

repeatable elements, conditions, default values), but the structure should be reasonable stable.

These are not in the published version: it is still way too early.

Metadata in NIR Fabio Vitali University of Bologna Maria Guercio University of Urbino.

Documents

Transcript of Metadata in NIR Fabio Vitali University of Bologna Maria Guercio University of Urbino.

NATIONAL WHITE CRAFT BEER DOP … · province of pesaro and urbino italY marche region Pesaro Urbino PROVINCIA DI PESARO E URBINO PLACES FAMOUS FOR TRUFFLES AND LOCAL BEER fermignano

Raffaello Sanzio da Urbino.

Vitali - Chaconne (Violin and Piano)

Urbino Now 2012-2013

Barbarossa PR V14a - Thompson Communications · Guercio as “Antonio” Roberto Negrias as ... Giancarlo Guercio as Antonio Grazia Pellegrino as Gabriela Alberto San Toriello as

Urbino warm 2011

Reconstruction of the Urbino Clavichordharpsichords.weebly.com/uploads/2/5/0/1/25019733/urbino... · 2018. 9. 9. · •The intarsia of Urbino shows other wooden patches having a

Yerevan vitali

Vitali Ipcrf Mapeh

Solaris Urbino 2012

Stockholm, Urbino Brussels

The Entrepreneurs Radio Show_ 023 _Mark del Guercio

AT URBINO / AT ÚTICA · 2020. 5. 28. · Urbino Perla 60,8x60,8 - At. Útica Perla 60,8x60,8 AT URBINO / AT ÚTICA 60,8x60,8 cm 23,93”x23,93” formato pz/cj m²/cj m²/plcj/pl

Russo urbino presentazione

· standard O option new Urbino 10,5 new Urbino 12 new Urbino 18 Bodywork Access ramp wheelchair ramp at the 2nd entrance, positioned and stowed away manually

KaufmanICT Vitali Sh.Kaufman Copyright 2011 © Vitali S. Kaufman К следующему слайду – щелчком мыши, к обычному экрану – клавишей

Urbino villa i488,italy

E-commerce 2015 Kenneth C. Laudon Carol Guercio Traver business. technology. society. eleventh edition Kenneth C. Laudon Carol Guercio Traver business.

Erik Guercio Associate Director of Research - aacom.org · Erik Guercio Associate Director of Research Introduce research department staff Provide an overview of the website Show

Vitali - Ciaconna - Livret Web