Metadata in NIR
description
Transcript of Metadata in NIR
![Page 1: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/1.jpg)
Metadata in NIRFabio Vitali
University of BolognaMaria Guercio
University of Urbino
![Page 2: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/2.jpg)
Introduction
Metadata support has always been present in NIR
Recently (June/July 2004) deep (and hot) discussions have happened within the WG about identifying a full set of metadata information
This is the result so far of the status of discussion.
![Page 3: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/3.jpg)
Some terminology
Automatic: any task that can be completely left to the machine to be performed
– All kinds of data format conversion – E.g. XML->HTML or NIR XML -> NIR RDF.
Semi-automatic: any task that can, with a certain degree of precision, be performed by the machine, but that still requires a human for final verification and approval.
– Identification of structures – E.g. partitioning of documents, identification and interpretation
of citations Manual: any task that needs to be decided upon and
performed by a thinking human, even though the machine can provide the support to help him/her and ease the task itself.
![Page 4: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/4.jpg)
Some terminology (2)
Objective– an objective datum is something for which no reasonable
discussion can exist as to its value.– E.g. the title of article 15, the publication date
Subjective– A subjective datum is something that requires an active
interpretation from a human that may be wrong, or for which different opinions exist
– E.g., resolution of implicit citations, classification of provisions
Explicit– A datum that is actually written somewhere in the text
Implicit– A datum that needs to be deduced from the external, or
through the application of specific reasoning
![Page 5: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/5.jpg)
Some terminology (3) Low competence
– the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience
– E.g.: where does article 1 end and article 2 start High competence
– The kind of competence one may expect from overspecialized jurists that come to some results after careful and painful reasoning
– e.g.: dates and times in norms. Editorial intervention
– by the publisher of a document Authorial intervention
– by the author of a document
![Page 6: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/6.jpg)
Design issues for NIR (1)
Data structure rather than application– Norme In Rete knows about applications,
but is not dependent on any use of the data and is not specifically targeted towards any specific application (except presentation)
– The same text should be marked in the same way by different editors (at least in the most fundamental structures)
![Page 7: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/7.jpg)
Design issues for NIR (2)
Rigorous distinction of roles– The author of a norm is the legislator, the provider of
the actual XML document is the editor.– The legislator is GOD (his decisions cannot be
discussed), but He only speaks through the text of the norms.
– The editor can add a large quantity of information, but it has no official status
– The very act of adding tag is an editorial operation, subjective and open to discussions.
– In fact, any addition coming from editors (structure identification, notes, comments, interpretation) happens outside of the document content (in markup structures or in special metadata sections)
![Page 8: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/8.jpg)
Design issues for NIR (3)
Complexity of the access to texts– Many editors, many publishing systems,
many copies in different stages of evolution
– There is no authoritative source of XML documents (only of printed documents).
– One web site could forget about updating a law to the latest version
– Use of URN allows to refer to the text of a law without identifying a single existing authoritative source.
![Page 9: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/9.jpg)
Design issues for NIR (4)
Support for description and prescription– Tagging of existing texts can only be
descriptive (supporting any possible mess that the legislator may have put in)
– Support for legal drafting can be provided, suggesting or enforcing legal drafting rules in the writing.
![Page 10: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/10.jpg)
Design issues for NIR (5)
Everything has a reliable name– Every legal structure needs to be
referenced and accessible.– References need to be unambiguous,
universal, definitive.– URN for whole documents, – id attributes for substructures and spans– XPointers for even smaller entities.
![Page 11: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/11.jpg)
Design issues for NIR (6)
Clean separation between objective properties and interpretation– Objective properties can be marked by low-level
editors, while interpretation requires experts and high-level editors.
– Objective (manifest) properties include identification of boundaries (articles, slauses, etc.) and official facts about texts (publication dates, etc.)
– Interpretation includes identification of troublesome dates (dies coactu, dies valens), identification of normative content of the texts provisions, application of modifications.
![Page 12: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/12.jpg)
Design issues for NIR (7)
Specific support for multiple interpretations– “Disposizioni” (law provisions) can be
identified and specified on the text. – Multiple different interpretations of the
same text must be allowed– So they cab be placed outside of the main
document.
![Page 13: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/13.jpg)
Basic structures (1)
Containers– Documents, parts, subparts, articles, etc. – All numbered and titled
Text containers– Clauses (comma), list elements, etc.
Inline elements– Presentation oriented (bold, italics, etc.): discouraged,
we rely on HTML elements and CSS styles– Legal oriented (references, modifications, specification
of dates, organizations, roles, places, etc.): we rely on specific NIR elements.
![Page 14: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/14.jpg)
Basic structures (2)
Metadata– Publication information and other data supplied
by editors (publication notes, document evolution, etc.)
– Law provisions for the interpretation of the semantics of the content
Support for irregular texts (those that do not comply with standard legal drafting rules) is available through relaxed syntax in some cases (documentoNIR)
![Page 15: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/15.jpg)
The Schemas for NIR documents3 different DTDs
– Strict rules (prescriptive)– Loose rules (descriptive)– Light rules (support for most common
cases)– They are intercompatible
The vocabulary is exactly the sameAll light documents are also looseAll strict document are also loose
![Page 16: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/16.jpg)
The needs for metadata
Metadata represent the only chance for putting information that was not explicitly written by the legislator.
All possible types of additional information beyond those provided in the text need to find a place here.
Uses: archival, analysis, annotations, automatic processing (consolidation), etc.
![Page 17: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/17.jpg)
Official classification of metadata A starting point is provided by NISO (US
National Information Standards Organization) in the guide “Understanding metadata” (2004):– descriptive metadata to describe a resource “for
purposes such as discovery and identification”– structural metadata to indicate “how compounds
objects are put together”– administrative metadata to provide information “to
help manage a resource”, articulated (only) as rights management metadata and preservation metadata (“information needed to archive and preserve a resource”)
![Page 18: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/18.jpg)
But… The distinction between descriptive, structural and
administrative metadata cannot find any concrete basis on the real practice: – All the communities involved in the preservation of
documents have developed and used relevant information related to the structure identification as a sub-set of information of their descriptive systems. They never consider the structural data as independent component.
– The ambiguity of the administrative metadata is even more evident, specifically in the digital systems where the technological components are less and less relevant for the long-term preservation and play a function for physical retrieval of a resource in a digital repository, but are considered part of the descriptive system in the case of web resources.
![Page 19: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/19.jpg)
<xml>Changes</xml>
Metadata in the NIR DTD
Any kind of information that is provided by the editor rather than by the author.
In a way even tagging text is metadata
Deriving new versions out of an original and a few modification documents is also adding metadata.
But adding proper metadata means providing additional information to a version of a document that can be used to better search, contextualize and understand a document.
text
<xml>Changes</xml>
<xml>Text
</xml>
<xml>Changes</xml>
<xml>Changes</xml>
meta
![Page 20: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/20.jpg)
Proper metadata in the NIR DTD
Can be specified – In an external document (in RDF - still
underspecified)– In an internal section at the beginning of the
document (meta) in a NIR vocabulary– In many internal sections near the parts of the text
they refer to, in a NIR vocabulary Conversion back and forth is always possible
and automatic. Deals with description, structure,
administration, as well as: – Interpretation of content– Relationships with other documents– Comments and notes
![Page 21: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/21.jpg)
Seven types of proper metadata Reflective information
– Things the document knows about itself Positioning information
– Things the document knows about the norms it expresses and the legal system it belongs to
Lifecycle information– Special moments in the history of the document and of its norms, and
the list of other documents that justify them Editorial notes
– Things the editor wants to attach to specific parts of the document but cannot, since the DTD does not allow editorial intervention on content
Iter-connected texts– The history of the document before its approval
Proprietary extensions Provisions (disposizioni)
![Page 22: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/22.jpg)
Reflection info (descrittori)Refers to the document, not its content
– Publication date. Re-publications. Errata. Official clarifications.
– URN(s), aliases– Objective data, easy to find even with low competences
Storing freshness information?– A document does not usually know whether it is up-to-date.
We may deal with stale documents, dead web sites, CD-ROMs
– The best we can do is to provide them with a last-updated date
– The normative system will confirm whether this is the last interesting date, or there exist more recent versions of the same document
![Page 23: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/23.jpg)
Positioning info (inquadramento)
Refers to the norms contained in the doc– Missing parts– Rank, function, nature and proposers of
the law– Keywords and taxonomies they belong to
Objective data (mostly), but requiring high competence to write down.
![Page 24: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/24.jpg)
Lifecycle (altriatti) - 1
Over time, documents undergo changes (in content, efficacy, power and so on)
These change happen at specific points in time and depend on specific documents (modification documents).
Usually modification documents specify several changes on the same modified document, and may specify multiple modification dates.
Therefore it makes sense to create a secondary structure where all relevant moments and documents can be matched
![Page 25: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/25.jpg)
Lifecycle (altriatti) - 2t01
1/1/1996
t02
1/3/1997
t03
12/6/1998
t04
24/9/1999
t05
1/1/2001
original
v01
modified
v02
suspended resumed
v02repealed
ID URN of law relation
r01 urn:nir:xxxxxxx12/1995 original
r02 urn:nir:xxxxxxx1/1997 passive
r03 urn:nir:xxxxxxx5/1998 passive
r04 urn:nir:xxxxxxx12/2000 passive
ID date idref
t01 1/1/1996 r01
t02 1/3/1997 r02
t03 12/6/1998 r03
t04 24/9/1999 r03
t05 1/1/2001 r04
![Page 26: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/26.jpg)
Lifecycle (altriatti) - 3
The lifecycle section only provides information about the relation to the document that causes the modifications
This information is objective and can be provided with low competence
Information about each actual modification is optional and placed in the provision section.
That information is sometimes subjective and can be provided only with significant competence
![Page 27: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/27.jpg)
Other sections
Editorial notes (redazionale)– Footnotes, comments, and any text the editor
feels like adding. It can point to specific places in the text through <ndr> elements
Iter-connected data (lavoripreparatori)– An official blurb detailing the iter for the approval
of the act, with presentation dates, discussion dates, etc. Plain text.
Proprietary– An open-ended section where editors can add
their own metadata with freedom.
![Page 28: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/28.jpg)
Provisions
Provisions describe the meaning of each meaningful fragment of the text according to a predefined (and hopefully complete) taxonomy (ontology???)
Divided in three main sections plus a residual category:– Justifications– Analytical provisions– Modifications – Other
![Page 29: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/29.jpg)
Justifications
Some norms (e.g., decrees) introduce before the actual text a foreword providing a number of justifications:– Considered…– Consulted…– Based on a proposal by– Considering…– Etc.
![Page 30: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/30.jpg)
Analytical provisions
Describe properties and meaning of fragments of the actual text.
A full taxonomy exists, including concepts like definition, obligation, right, etc.
Carlo will be speaking about them
![Page 31: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/31.jpg)
Modifications In a modifying law, each modification can be
described in detail with a provision. The provision describes in details what kind of
modification, the document it is applied to, where inside it, and when.
Possible modifications are: abrogation, substitution,insertion, renumbering, change of terms, prorogation, repetition, suspension, retro-activity, ultra-activity, etc (a total of 24 different types).
Currently no way to express normal case (dies coactu = dies valens = 15 days after publication for the whole act), but a way will be found soon.
![Page 32: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/32.jpg)
Arguments for provisions
All provisions have some specific arguments, plus some shared arguments
E.g.: <motivazioni>
<regole><obbligo>
<pos href=“#art12com5”/><destinatario>sindaco</destinatario><controparte>ufficio tributi</controparte><termine da=“r01” a=“r02”/>
</obbligo>…
</regole>
Important shared arguments are positions and terms
![Page 33: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/33.jpg)
Positions All provisions point to a position inside the
document where the text of the provision is placed. <articolo id="art1">
<num>1.</num> <comma id="art1-com1">
<num>1</num><corpo>blah blah</corpo>
…<obbligo>
<pos href=“#art1com”/><destinatario>xxx</destinatario><controparte>y1</controparte>
</obbligo>
The pos element points to the id, or XPointer, or the text content, of the part of the document that contains the provision.
![Page 34: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/34.jpg)
Terms
Specify conditions, and specific efficacy (dies coactu) and validity (dies valens) intervals.
No formal language exists yet for specifying conditions– E.g.: “after the approval of the
corresponding regulation”Dates are specified by referring to the
id of the relevant date as placed in the lifecycle section.
![Page 35: Metadata in NIR](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813adb550346895da32492/html5/thumbnails/35.jpg)
Conclusions Metadata are still under heavy evolution within
the NIR WG. In the last 4 month a major work has been
started, in order to perform a systematic analysis of the desired metadata information for NIR documents.
I haven’t even mentioned namespaces Some details are still shaky (required elements,
repeatable elements, conditions, default values), but the structure should be reasonable stable.
These are not in the published version: it is still way too early.