Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
4
Transcript of Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and...
BookshelfLeafing through XML
NLM Journal Article Tag Suite Conference 2010
Martin Latterner and Marilu HoeppnerNational Center for Biotechnology Information
National Library of Medicine
next><prev
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
NLM Collection Catalog
PubMed AbstractsElectronic Literature
Archive
Books, Monographs, Reports
Journals
Other publication formats
Book chapters, Monographs, Reports
Books in PubMed
Non-PubMed Books
User guides, Documentation
Journal articles PMC Journals PubMed Central
Bookshelf
Entrez Literature Resources
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Features of the Book DTDBooks and journals within PubMed CentralBookshelf WorkflowsIntegration of information between databases
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Modifications
Allowed icon as a child of exlnk.Allowed pre as a child of entry.Allowed glossary as a child of chapter.Added type: ppt.Added attributes id and BID to <foot>.Added attribute id to <p>.Added <title>, child of <bibsect>.Added <bb>, <gf> and <figgrp> as children of <linkgrp>.Added <email> as child of <txtstyle>.Added <pdf> as child of <glossary>.Added <figgrp1> as child of <entry>.…
NCBI Book DTD 1.0Based on ISO 12083 Article DTD
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
March 2003v1.0
December 2004v2.0
November 2005v2.1
BOOKSHELF XML DATANCBI BOOK DTD
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Book DTDof the
NLM Journal Article Tag Suite
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Designed to capture the semantic elements of the content, not form
e.g. bibliographic metadata
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<front>
<div type="titlepage" level="1" id="2001902bddd00001"> <booktitle> <ils style="strong">CONFLICT OF INTEREST IN MEDICAL RESEARCH</ils> </booktitle> <bookauthor> <bookauthor.name>Committee on Conflict of Interest in Medical Research</bookauthor.name> <bookauthor.info>Board on Health Sciences Policy</bookauthor.info> <bookauthor.info>INSTITUTE OF MEDICINE <ils style="smallcap"> <ils style="emphasis"> OF THE NATIONAL ACADEMIES</ils> </ils> </bookauthor.info> </bookauthor> <publication.stmt> <p style="center"> <publisher> <publisher.name>THE NATIONAL ACADEMIES PRESS</publisher.name> <publisher.address><state>Washington, D.C.</state></publisher.address> </publisher> </p> </publication.stmt> <page number="ii" id="2001902bppp00002"/> </div>
<div type="copyrightpage" level="1" id="2001902bddd00002"> <publication.stmt> <p style="normal"> <publisher> <publisher.name><ils style="strong">THE NATIONAL ACADEMIES PRESS</ils></publisher.name> <publisher.address> <street><ils style="strong">500 Fifth Street, N.W.</ils></street> <state><ils style="strong">Washington, DC</ils></state> <postcode><ils style="strong">20001</ils></postcode> </publisher.address> </publisher> </p> </publication.stmt> <publication.stmt> <p style="flindent">ISBN <isbn>978-0-309-13188-9</isbn> (hardcover)</p> </publication.stmt> <copyright>Copyright <copyright.year>2009</copyright.year> by the <copyright.holder>National Academy of Sciences</copyright.holder>. All rights reserved.</copyright> <printinfo> <print>Printed in the United States of America</print> </printinfo> </div></front>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book-meta> <book-title-group> <book-title>Conflict of Interest in Medical Research</book-title> </book-title-group> <contrib-group> <contrib contrib-type="author"> <collab>Institute of Medicine (US) Committee on Conflict of Interest in Medical Research, Education, and Practice</collab> </contrib> </contrib-group> <publisher> <publisher-name>National Academies Press (US)</publisher-name> <publisher-loc>Washington (DC)</publisher-loc> </publisher> <isbn>978-0-309-13188-9</isbn> <pub-date pub-type="ppub"> <year>2009</year> </pub-date> <permissions> <copyright-statement>Copyright © 2009, National Academy of Sciences</copyright-statement> <copyright-year>2009</copyright-year> </permissions></book-meta>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
More granular text descriptions are handled at attribute level
e.g. preface, foreword
<sec sec-type=“preface”>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Article Book
<abbrev-journal-title><article><article-categories><article-id><article-meta><conf-acronym><conference><conf-num><conf-theme><floats-group><front><front-stub><issue-sponsor><journal-meta><journal-subtitle><journal-title><journal-title-group><response><series-text><series-title><string-conf><sub-article><unstructured-kwd-group><x>
<alternate-form><area><book><book-front><book-meta><book-part><book-part-categories><book-part-meta><book-title><book-title-group><collection><collection-id><collection-list><collection-member><collection-meta><collection-name><map><map-group><multi-link>
DTD v3.0Elements
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<map-group>
XML
<map-group id="my-map-id"> <graphic xlink:href="img-uri"/> <map map-name="my-map"> <area map-shape="rect" map-coords="1,1,51,76" xlink:href="uri1"/> <area map-shape="rect" map-coords="54,4,94,74" xlink:href="ur2"/> </map></map-group>
XHTML
<img src="img-uri" usemap="#my-map-id"/><map id="my-map-id" name="my-map"> <area href="uri1" shape="rect" coords="1,1,51,76"/> <area href="uri2" shape="rect" coords="54,4,94,74"/></map>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<multi-link>
XML
<multi-link> <term>IDDM2</term> <ext-link ext-link-type="url" xlink:href="LINK1">Bookshelf</ext-link> <ext-link ext-link-type="url" xlink:href="LINK2">PubMed Central</ext-link>…</multi-link>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Article Bookabbrev-typearticle-typeresponse-type
alternate-form-typebook-idbook-part-numberbook-part-typegraphic-type (obsolete)indexedmap-altmap-coordsmap-namemap-shapeprimaryqualifiertaxonomic-id
DTD v3.0Attributes
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Books & Journals in PubMed Central
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Source Conversion
(1) Third-party vendor services: Tagging rules for journals can be applied to book content, especially, for lower level document objects.
CitationsFiguresTables
(2) In-house conversion: For content submitted in external DTDs, code reuse of PMC journal modules for handling:
DatesStringsCALS to XHTML table conversion
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Data Processing and Ingest
Software to lookup PubMed IDs in citations<pub-id pub-id-type=”pmid”>
Imaging resizing software and validation checks for graphics and supplementary data files such as PDF
Loading code for the extraction of key information, such as dates, subject categories, etc
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Output Formats
HTML
Uses base XSLT Article rendering rules for conversion of XML to HTML; book-specific overwrites or modifications
Uses XSL-FO base code for articles; book-specific overwrites or modifications
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Advantages of using a Shared Tag Set
Share XSLT modules during ingest, conversion processes, and renderingUse similar database infrastructureEnables closer integration for a variety of processes, such as PubMed submission and indexing
Submission of Content to Bookshelf
• PDF or Word• XML in NLM Book DTD• XML in external DTDs• Word authoring followed by conversion to XML (in-
house)
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book>
Submitted Files
PDFWord
XML (External DTD)
NLM Book DTD XML
Third-party vendoror
In-house Converters
Requirements
Pass validation Pass stylecheck
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book-part>
PMC
<book-part><book-part><book-part>
CMS
<book>CHOP-IT-UP
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
NCBI Word converter
XML
Instant HTML Preview
Publish to
Bookshelf
Microsoft Worddocument
Word Authoring Followed by Conversion to XML
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Stylechecker
Check business rules
Goal: one set of rendering rules for uniform source XML data
2 Checkpoints
Whole book (modified article stylechecker)
Individual book-part (article stylechecker)
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Integrating Content from Different Databases
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<!DOCTYPE sec SYSTEM "book.dtd"><sec><title/>
<sec id="molgen.tables" ><title/><p content-type="molecular_genetics"><italic>Information in the Molecular Genetics and OMIM tables may differ from that elsewhere in the GeneReview: tables may containmore recent information. —</italic>ED.</p><table-wrap id="pkd-ar.molgen.TA" position="anchor"><caption><p>Table A. Polycystic Kidney Disease, Autosomal Recessive: Genes and Databases</p></caption><table><tbody><tr><th>Gene Symbol</th><th>Chromosomal Locus</th><th>Protein Name</th><th>Locus Specific</th><th>HGMD</th></tr>
Data in the JATS Book DTD Delivered from External Database
<?get-external-xml molgen.tables?>
Processing Instruction in Source XML