Post on 29-Dec-2015
Lucas Mak and Dao Rong Gong
Michigan State University
Millennium and XML: Repurposing and Customizing Metadata
May 17 - 20, 2009
Today’s Outline
Overview of Metadata
Millennium system and XML
Overview of XSLT
Case Studies1. Sunday School Books Collection
2. New Book List
Conclusions and Observations
Metadata
Structured data or information about an information resource.
Types of metadata:– Descriptive– Administrative/Rights– Preservation– Technical– Structural
Descriptive Metadata
Popular descriptive metadata standards– Dublin Core (Simple & Qualified)– MODS– MARCXML– VRA Core– IEEE LOM– TEI Header– EAD
Innovative XML
XML records from Millennium
Retrieved through HTTP query
Data arrangement based on MARC fields– But MARC field and its subfields are siblings
Optimized for WebPAC display– Brief record (for search result index page display)
• Contains data from MARC 245, Publication year, record ID
– Full record (for both public and staff MARC display of individual record)
Millennium System and XML
MillenniumMillenniumMillenniumMillennium
Delimited Delimited TextText
Delimited Delimited TextText
MARCMARCMARCMARC
XMLXMLXMLXML
/xrecord
XMLServer
OAIHarvester
Metadata Builder
Content ProContent ProContent ProContent Pro
XML Server
XML server query string (search for title “xslt”):
http://magic.msu.edu/xmlopac/?xml=<WXREQ_ROOT><KEY>txslt</KEY></WXREQ_ROOT>
XSLT
Extensible Stylesheet Language Transformation
Current version: 2.0
“Transformation” means:– Manipulation of XML documents by creating a new
document based on the original document• We recommend against multiple bullet indents
Usages in library context:– Crosswalking
• Data selection and manipulation
– Web display• Example: converting EAD into HTML for web display
XSLT
Uses XPath expressions to select/filter data node– By name of “Element”
• <xsl:for-each select="marc:leader">– By value of “Element” and/or “Attribute”
• <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']>
• <xsl:if test="$leader7='c'">
Case Study One
Sunday School Books Collection – 19th century publications by religious
societies– 170 titles digitized and cataloged
Data conversion needs– Source: Millennium– Target: Content Pro– Conversions in:
• Format: .marc to XML• Schema and Data Structure: MARC to Qualified
Dublin Core
Options for Data Migration
Create Lists
Create Lists
MARCXML
MARCXML
InnovativeXML
InnovativeXML
MARCFile
MARCFile
Content ProContent Pro(QDC)(QDC)
Content ProContent Pro(QDC)(QDC)
MillenniumMillenniumMillenniumMillennium
HTTPQuery
HTTPQuery
XSLTXSLTMARCEditMARCEdit
MARCEditMARCEdit
Segment of Innovative XML
SiblingsSiblings
MARC field/subfield as value of elementMARC field/subfield as value of element
Field indicator asvalue of elementField indicator asvalue of element
Segment of MARC21XML
Parent-ChildParent-Child
MARC field/subfield as value of element attributeMARC field/subfield as
value of element attributeField indicator as
value of element attributeField indicator as
value of element attribute
Segment of MARC21XML
Issues with Innovative XML data conversion needs– Data structured differently from MARC21XML
• Availability of existing “Innovative XML to DC/QDC” XSLT?
– Not optimized for data manipulation• Complications in data selection
» Selection of data node by matching criteria against values in individual elements
» A series of matching may be needed for selecting just one node
• Efficiency in processing» Multiple upward, downward, and lateral movement
involved in data selection
Final Path of Data Migration
Create Lists
Create Lists
MARCXML
MARCXML
MARCFile
MARCFile Content ProContent Pro
(QDC)(QDC)
Content ProContent Pro(QDC)(QDC)
MillenniumMillennium(.marc)(.marc)
MillenniumMillennium(.marc)(.marc)
XSLTXSLT
MARCEditMARCEdit
MARCEditMARCEdit
Design of XSLT
Based on LC’s “MARC To Simple DC” XSLT
– Customized mappings according to LC’s suggestions
– Crosswalking strategies• Conditional processing (i.e. matching)
• boolean ( ), contains ( ), starts-with ( )• <xsl:if>, <xsl:choose>, <xsl:when>
• String manipulation• Used in both conditional processing and data selection for
output• substring ( ), substring-before ( ), substring-after ( ),
translate ( ), concat ( ), normalize-space ( )
Design of XSLT
Conditional Processing & String Manipulation in De-duplication<xsl:for-each
select="marc:datafield[@tag=246]/marc:subfield[@code='a']"> <xsl:if test="not(contains($dataField245Lower,
translate(substring(normalize-space(.),1,string-length()-1),
$upperCase,$lowerCase)))"> <xsl:element name="dcterms:alternative"> <xsl:value-of select="normalize-space
(substring(.,1,string-length()-1))"/>
</xsl:element>
</xsl:if>
</xsl:for-each>
Converts 245 & 246 into lower case before comparing
Chop trailing period (.)
Compare MARC 246 against MARC 245
Design of XSLT
Predicate
• Used for data selection and de-duplication
<!-- Output MARC 650y as <dcterms:temporal> -->
<xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']
[not(marc:subfield[@code='y'] = preceding-sibling::marc:
datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/
marc:subfield[@code='y']"> <xsl:element name="dcterms:temporal"> <xsl:value-of select="normalize-space(self::node())"/> </xsl:element> </xsl:for-each>
Selects LCSH only
Selects unique
650$y only
Design of XSLT
Hard-coding
Inserted elements that are global to all records
<!-- Output <dc:format>application/pdf</dc:format> --><xsl:element name="dc:format">
<xsl:text>application/pdf</xsl:text></xsl:element>
Existing New Book List – Newly cataloged books for browse shelf– New approach using XML and XSLT
New features design– Sorting– RSS feed– Customization
Case Study Two
New Book List Based on XML File
Millennium XML server outputs two files– Entire new book list over a rolling period of
time– List of daily added books
New Book List program output– Book List in HTML format– RSS feed for daily added books
Path of Data Processing
Web ServerWeb Server& php& php
Web ServerWeb Server& php& phpMillenniumMillenniumMillenniumMillennium
EXPECTEXPECT XSLTXSLT Internet
XML output
Observations and Challenges
Millennium System and XML– XSLT processor within Millennium and
customizing Innovative XML output
Using XML as data source– Large XML file size
XSLT and data processing– XSLT data manipulation– Lack of built-in functions for conditional data
looping etc.