The use of SGML and XML at the Publications Office
description
Transcript of The use of SGML and XML at the Publications Office
![Page 1: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/1.jpg)
The use of SGML and XML at the Publications Office
Dr. Holger BagolaDir A – Cell “Methods and Development — Formats”
![Page 2: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/2.jpg)
The use of SGML and XML at the Publications Office
2
Table of contents
• Historical overview• Formex• Other areas of XML usage• Conclusion
![Page 3: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/3.jpg)
The use of SGML and XML at the Publications Office
3
Table of contents
• Historical overview• Formex• Other areas of XML usage• Conclusion
![Page 4: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/4.jpg)
The use of SGML and XML at the Publications Office
4
Historical overview
• Tasks of the Publications Office• Archiving of legislative publications• First steps in SGML• Migration to XML• Basic advantage: availability of tools
![Page 5: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/5.jpg)
The use of SGML and XML at the Publications Office
5
Table of contents
• Historical overview• Formex• Other areas of XML usage• Conclusion
![Page 6: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/6.jpg)
The use of SGML and XML at the Publications Office
6
Formex (1)
• Basic principles– XML Schema instead of DTD– One single schema– Number of root elements 12 instead of
30– Number of elements about 350 instead
of 1200– Distinction between semantic and
physical markup
![Page 7: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/7.jpg)
The use of SGML and XML at the Publications Office
7
Formex (2)
ARTICLE (TI.ARTICLE, (PARAG+ | ALINEA+))
TI.ARTICLE (#PCDATA)
PARAG (NO.PARAG, ALINEA+)
NO.PARAG(#PCDATA)
ALINEA ((#PCDATA | NOTE | HT| FT)* |
(P | LIST | TABLE)+)
. . .
Blue: semantic markup
Red: physical markup
![Page 8: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/8.jpg)
The use of SGML and XML at the Publications Office
8
Formex (3)
• Table model– Analysis of CALS, HTML, Formex v. 3– Choice:
• Model close to HTML (top-down approach, nested tables)
• Maintenance of semantic information such as in Formex v. 3
![Page 9: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/9.jpg)
The use of SGML and XML at the Publications Office
9
Formex (4)
• Footnotes– Distinction between notes in text and
tables for readability and production simplicity
– Insertion of text notes into the surrounding text
– ID/IDREF to signal identical footnotes– Numbering is an object of presentation– Table notes assembled at the top of the
table
![Page 10: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/10.jpg)
The use of SGML and XML at the Publications Office
10
Formex (5)
• Quotations– Structured quotations vs. ‘#PCDATA’
quotations– Elements signaling start and end of a
quotation (quotation marks)– Element with function of a container for
structured quotations.
![Page 11: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/11.jpg)
The use of SGML and XML at the Publications Office
11
Formex (6)
Example:Article 2
In article 1(2) of regulation (EC) 1234/94 the word ‘car’ is replaced by ‘bus’.
Article 6 of the same regulation is replaced by the following text:
‘Article 6
This is the new text of article 6.’
![Page 12: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/12.jpg)
The use of SGML and XML at the Publications Office
12
Formex (7)Example:
<ARTICLE IDENTIFIER=“002”><TI.ARTICLE>Article 2</TI.ARTICLE><ALINEA>In article 1(2) of regulation (EC) 1234/94 the <QUOT.START ID=“QS0001” REF.END=“QE0001” CODE=“2018”/>car <QUOT.END ID=“QE0001” REF.START=“QS0001” CODE=“2019”/> is replaced by <QUOT.START ID=“QS0002” REF.END=“QE0002” CODE=“2019”/>bus<QUOT.END ID=“QE0002” REF.START=“QS0002” CODE=“2019”/>.</ALINEA><ALINEA>
<P>Article 6 of the same regulation is replaced by the following text:</P>
<QUOT.S><ARTICLE IDENTIFIER=“006”>
<TI.ARTICLE><QUOT.START ID=“QS0003” REF.END=“QE0003” CODE=“2018”/>Article 6</TI.ARTICLE>
<ALINEA>This is the new text of article 6.<QUOT.END ID=“QE0003” REF.START=“QS0003” CODE=“2019”/></ALINEA>
</ARTICLE></QUOT.S>
</ALINEA></ARTICLE>
![Page 13: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/13.jpg)
The use of SGML and XML at the Publications Office
13
Formex (8)
• Splitting large documents– Fragmentation by definition of inclusions
for the main document– Secondary instances referencing the
inclusions by means of XML entity mechanism
– Inclusions may not necessarily be valid XML instances
![Page 14: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/14.jpg)
The use of SGML and XML at the Publications Office
14
Formex (9)
main.xml
<?xml version=“1.0”?><doc> <ti>title</ti> <chap no=“1”> <incl ref=“frag-1.frg”/> </chap></doc>
frag-1.frg
<text>…</text><text>…</text>
container.xml
<?xml version=“1.0”?><!DOCTYPE frag [<!ENTITY cnt SYSTEM “frag-1.frg”>]><frag>&cnt;</frag>
![Page 15: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/15.jpg)
The use of SGML and XML at the Publications Office
15
Formex (10)
• Character set– OJ publications in 20 (21) languages– Different alphabets– International character set definition
Unicode (UTF-8)– Definition of allowed character ranges– Special font ‘EU-Albertina’
![Page 16: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/16.jpg)
The use of SGML and XML at the Publications Office
16
Formex (11)
• Meta-data– OJ publications are composed of
different levels: • Publication• Document• ‘Contents’
– Meta-data separated according to these levels
![Page 17: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/17.jpg)
The use of SGML and XML at the Publications Office
17
Formex (12)
Publication
Meta-data concerning the publication
Structure of thepublication withreferences to documents
Document
Meta-data for document
References to components
Document
Meta-data for document
References to components
Contentsmain part001
ContentsAnnex 1001.001
ContentsAnnex 2001.002
Contentsmain part002
ProCat
![Page 18: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/18.jpg)
The use of SGML and XML at the Publications Office
18
Formex (13)
• Meta-data (continued)– Extraction of meta-data by means of
automatic processes (pre-notices)– Extension of pre-notices by juridical analysis– Availability of notices in ProCat for other
productions (Celex) and projects
![Page 19: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/19.jpg)
The use of SGML and XML at the Publications Office
19
Formex (14)
• Final remark on Formex specifications– Only few complete production chains
from the author to the printer– Concentration on publication of Official
Journal
![Page 20: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/20.jpg)
The use of SGML and XML at the Publications Office
20
Formex (15)
• Validation of Formex deliveries– In-depth validation necessary– Automatic procedures– Manual procedures
![Page 21: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/21.jpg)
The use of SGML and XML at the Publications Office
21
Formex (16)
• Validation of Formex deliveries (continued)– Automatic procedures
• Control of filename conventions• Parsing of various components• Control of completeness• Execution of additional validation rules• Comparison of contents between Formex
and PDF
Report (XML instance)
![Page 22: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/22.jpg)
The use of SGML and XML at the Publications Office
22
Formex (17)
• Validation of Formex deliveries (continued)– Manual procedures
• Verification of the report generated by the automatic validation procedure
• Control of the use of Formex specifications in all language versions
Report (XML instance) = basis forarchiving or rejection
![Page 23: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/23.jpg)
The use of SGML and XML at the Publications Office
23
Formex (18)
• Conversion of Formex v. 3 into Formex v. 4– Conversion of character set (ISO 2020 – UTF8)– Transformation of SGML instances into well-
formed XML instances– Extraction of tables and conversion into an
intermediate model– Generation of meta-data levels– Conversion of old elements and generation of
new elements– Validation of the results
![Page 24: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/24.jpg)
The use of SGML and XML at the Publications Office
24
Formex (19)
• Specifications:
http://formex.publications.eu.int/
![Page 25: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/25.jpg)
The use of SGML and XML at the Publications Office
25
Table of contents
• Historical overview• Formex• Other areas of XML usage• Conclusion
![Page 26: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/26.jpg)
The use of SGML and XML at the Publications Office
26
Other areas of XML usage (1)
• Index of OJ publications– Biannual issues– Monthly issues– Extraction from Celex/ProCat– Transformation into PDF by means of
XSLT and XSL FO (biannual version only)
![Page 27: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/27.jpg)
The use of SGML and XML at the Publications Office
27
Other areas of XML usage (2)
• Consolidation of legal documents– Mainly based on Formex– Additional administrative data in XML– Relations between historical levels
• Description of the composition of a given historical level
• Concordance of information on numbering schemes (articles, …) for each level
![Page 28: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/28.jpg)
The use of SGML and XML at the Publications Office
28
Other areas of XML usage (3)
• Conversion to RTF– Compatibility with other EU services– Input in SGML or XML– Results with LegisWrite templates
![Page 29: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/29.jpg)
The use of SGML and XML at the Publications Office
29
Other areas of XML usage (4)
SGML instance
(Formex v. 3)
Characterconversion
Transformationinto well-
formed XML
Transformation into internalXML format
Transformationinto RTF
(LegisWrite)
Output inRTF (Legis-
Write)
XMLinstance
(Formex v. 4)
![Page 30: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/30.jpg)
The use of SGML and XML at the Publications Office
30
Other areas of XML usage (5)
• Production of the EU budget– Creation and maintenance of a common
central repository (XML)– Markup of modified elements during the
decision process in working language– Translation only of parts modified– Update of repository after publication
![Page 31: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/31.jpg)
The use of SGML and XML at the Publications Office
31
Other areas of XML usage (6)
Budget services
Translationservice
Publications Office
Budget XMLrepository
Printer
Formexarchive
pre-printingpost-printing
![Page 32: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/32.jpg)
The use of SGML and XML at the Publications Office
32
Other areas of XML usage (7)
• ‘Secondary legislation’– Publication of legislation in force in
‘new’ languages– XML production on basis of Formex
archive– Transformation of translated input– Transformation of SGML into XML of
Formex instance– Merging of XML instances
![Page 33: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/33.jpg)
The use of SGML and XML at the Publications Office
33
Other areas of XML usage (8)
Worddocument Formex
archive
Conversioninto XML
Extractionof text
Conversioninto XML
Extractionof skeleton
Mergingskeleton &
text
Simplifystructure
Publication
ProCat
Celex
![Page 34: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/34.jpg)
The use of SGML and XML at the Publications Office
34
Other areas of XML usage (9)
• European document repository– TIFF of publications– PDF of publications– Formex instances of OJ publications– Exchange of information by XML
messages
![Page 35: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/35.jpg)
The use of SGML and XML at the Publications Office
35
Other areas of XML usage (10)
• Publication of calls for tender (OJ-S)– Input in different electronic formats– Harmonization in XML– Updating database TED– Production of CD-ROM version
![Page 36: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/36.jpg)
The use of SGML and XML at the Publications Office
36
Table of contents
• Historical overview• Formex• Other areas of XML usage• Conclusion
![Page 37: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/37.jpg)
The use of SGML and XML at the Publications Office
37
Conclusion
• Difficult start with SGML• Successful use of XML as well as of
other standards such as XSLT/XPath, XSL FO
• Powerful possibilities of re-use of XML instances
• How to profit from our experiences?
![Page 38: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/38.jpg)
The use of SGML and XML at the Publications Office
38
Proposal for technical solution
• An example: a regulation in the European legislative context and a ‘Verordnung’ in German legislation
• Evident structural differences
• Evident common structural objects
![Page 39: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/39.jpg)
The use of SGML and XML at the Publications Office
39
Differences and common objects (1)
• EU regulation– Title– Preamble
• Citations• Recitals
– Enacting terms• Articles
– Article header» Numbering
– Paragraphs or alineas
• German regulation– Title– Preamble
• Paragraphs
– Enacting terms• Articles
– Article header» Numbering +
text– alineas
![Page 40: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/40.jpg)
The use of SGML and XML at the Publications Office
40
Differences and common objects (2)
– Final• Applicability• Signature
– Final
• Signature
![Page 41: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/41.jpg)
The use of SGML and XML at the Publications Office
41
Differences and common objects (3)
• preamble – European model
PREAMBLE (PREAMBLE.INIT,CITATION+,RECITAL+,
PREAMBLE.FINAL)
PREAMBLE.INIT (P)
CITATION (P)
RECITAL (NP)
PREAMBLE.FINAL (P)
– German modelPREAMBLE (P)
![Page 42: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/42.jpg)
The use of SGML and XML at the Publications Office
42
Differences and common objects (4)
• article– European model
ARTICLE (ARTICLE.HEADER, (PARAG+ |ALINEA+))
ARTICLE.HEADER(#PCDATA)PARAG (NO.PARAG, ALINEA+)ALINEA (P|LIST)+
– German modelARTICLE (ARTICLE.HEADER,
(PARAG+ |ALINEA+))ARTICLE.HEADER(NP)NP (NO.P,TXT)PARAG (NO.PARAG, ALINEA+)ALINEA (P|LIST)+
![Page 43: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/43.jpg)
The use of SGML and XML at the Publications Office
43
Differences and common objects (5)
• final – European model
FINAL (APPLICABILITY,SIGNATURE)APPLICABILITY (P)SIGNATURE (PL.DATE,SIGNATORY)PL.DATE (P)SIGNATORY (P+)
– German modelFINAL (SIGNATURE)SIGNATURE (PL.DATE,SIGNATORY)PL.DATE (P)SIGNATORY (P+)
![Page 44: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/44.jpg)
The use of SGML and XML at the Publications Office
44
Differences and common objects (6)
Specific models for European regulation
Specific models for German regulation
Common models for European and German regulation
![Page 45: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/45.jpg)
The use of SGML and XML at the Publications Office
45
Differences and common objects (7)
• Common grammar fragment<!ELEMENT ALINEA (P | LIST)+ ><!ELEMENT ARTICLE (ARTICLE.HEADER, (ALINEA+ | PARAG+)) ><!ELEMENT ENACTING.TERMS (ARTICLE+) ><!ELEMENT ITEM (NP, (P | LIST) ><!ELEMENT NO.P (#PCDATA) ><!ELEMENT NOTE (P+) ><!ATTLIST NOTE NOTE.ID ID #REQUIRED ><!ELEMENT NP (NO.P, TXT) ><!ELEMENT P (#PCDATA | NOTE)* ><!ELEMENT PARAG (PARAG.NO, ALINEA+) ><!ELEMENT PARAG.NO (#PCDATA) ><!ELEMENT PL.DATE (P+) ><!ELEMENT REGULATION (TITLE, PREAMBLE, ENACTING.TERMS, FINAL) ><!ATTLIST CTRY (DE | EU-EN) #REQUIRED ><!ELEMENT SIGNATORY (P+) ><!ELEMENT SIGNATURE (PL.DATE, SIGNATORY) ><!ELEMENT TITLE (P+) ><!ELEMENT TXT (#PCDATA | LIST | NOTE)* >
![Page 46: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/46.jpg)
The use of SGML and XML at the Publications Office
46
Differences and common objects (8)
• Specific grammar for EU regulation
<!ENTITY % common SYSTEM “regulation-common.dtd”>
%common;
<!ELEMENT APPLICABILITY (P) >
<!ELEMENT ARTICLE.HEADER (P) >
<!ELEMENT CITATION (P) >
<!ELEMENT FINAL (APPLICABILITY, SIGNATURE) >
<!ELEMENT PREAMBLE (PREAMBLE.INIT, CITATION+, RECITAL.INIT?,
RECITAL+, PREAMBLE.FINAL) >
<!ELEMENT PREAMBLE.FINAL (P) >
<!ELEMENT PREAMBLE.INIT (P) >
<!ELEMENT RECITAL (P | NP) >
<!ELEMENT RECITAL.INIT (P) >
![Page 47: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/47.jpg)
The use of SGML and XML at the Publications Office
47
Differences and common objects (9)
• Specific grammar for German regulation
<!ENTITY % common SYSTEM “regulation-common.dtd”>
%common;
<!ELEMENT ARTICLE.HEADER (NP) >
<!ELEMENT FINAL (SIGNATURE) >
<!ELEMENT PREAMBLE (P+) >
![Page 48: The use of SGML and XML at the Publications Office](https://reader034.fdocuments.us/reader034/viewer/2022051401/56814746550346895db4837d/html5/thumbnails/48.jpg)
The use of SGML and XML at the Publications Office
48
Final remarks
• Possible objects:– Metadata on document level– Metadata on archiving level (research
aspects)– Common models for complex objects: tables,
quotations, etc.