XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011...
-
Upload
amelia-jenkins -
Category
Documents
-
view
218 -
download
0
Transcript of XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011...
![Page 1: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/1.jpg)
XML Basics for Digital Humanists
Alabama Digital Humanities CenterSeptember 19 & 23, 2011Instructor:Shawn Averkamp, Metadata [email protected]
![Page 2: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/2.jpg)
What is XML?
eXtensible
Markup
Language
![Page 3: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/3.jpg)
Language• XML is a language for structuring data. (other
methods of structuring data: database, excel spreadsheet, etc.)
• Not a data model, but a way of encoding a data model or knowledge domain so that it is machine-processable.
• XML is composed of syntax rules (just like any other language).
![Page 4: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/4.jpg)
Markup• XML uses “markup” to structure data.• XML uses labels within angle brackets (like in
HTML) to “tag” text.
![Page 5: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/5.jpg)
Ingredients3 avocados1/4 cup onions1/4 teaspoon garlic salt12 corn tortillas1 bunch fresh cilantro leavesjalapeno pepper sauce
![Page 6: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/6.jpg)
<ingredients> <ingredient qty=“3”>avocados</ingredient> <ingredient qty=“1/4” unit=“cup”>onions,diced</ingredient> <ingredient qty=“1/4” unit=“t”>garlic salt</ingredient> <ingredient qty=“12”>corn tortillas</ingredient> <ingredient qty=“1”>fresh cilantro leaves</ingredient> <ingredient>jalapeno pepper sauce</ingredient></ingredients>
element
attribute
Elements = things we care aboutAttributes = properties of those things
![Page 7: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/7.jpg)
eXtensible• You can extend your data model with other
XML data models (“schemas”).
![Page 8: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/8.jpg)
<mods>
<titleInfo>
<title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title>
</titleInfo>
<name type="personal">
<namePart>Red Ghost<namePart>
<role>
<roleTerm>Author</roleTerm>
</role>
</name>
<name type="personal">
<namePart>Dot Chomper<namePart>
<role>
<roleTerm>Advisor</roleTerm>
</role>
</name>
<abstract>Pac-man shaped magnetic tunnel junctions are proposed for CMOS-based magnetic flip flops for space applications…</abstract>
<extension>
<etd:degree>Ph.D.</etd:degree>
<etd:discipline>Electrical and Computer Engineering</etd:discipline>
</extension>
</mods>
The etd schema (in red) “extends” the mods schema
![Page 9: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/9.jpg)
Where is XML?
XML drives applications and information you use every day:•RSS feeds (Real Simple Syndication) for blogs, podcasts, more•iTunes stores your music library metadata and usage data in XML•Google uses XML to display geographic data in Google Maps and Earth (more info: http://code.google.com/apis/kml/documentation/kml_tut.html )
![Page 10: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/10.jpg)
What’s XML good for?
• Sharing/exchanging data online• Storing data• Controlling data display• Syndication
![Page 11: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/11.jpg)
The XML Family
XML The document language
XPath Language for navigating XML documents
XSD Schema language
XSLT (XML Stylesheet Language Transformations) Language for transforming XML into other formats (HTML, text, other XML documents)
XQuery Language for querying XML (similar to SQL database querying)
XForms Language for creating web input forms
![Page 12: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/12.jpg)
XML in the Humanities
• TEI– Shakespeare Quartos Archive:
http://www.quartos.org/– Lewis & Clark Journals:
http://lewisandclarkjournals.unl.edu/
• Syriac Reference Portal: http://www.syriac.ua.edu/
![Page 13: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/13.jpg)
Getting Started• Open Oxygen • Open movies.xml example (in left sample.xpr sidebar) or
paste code below into a new document
<?xml version="1.0" encoding="UTF-8"?><movies> <movie id="1"> <title>The Green Mile</title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</year> </movie></movies>
![Page 14: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/14.jpg)
Well-formedness
XML documents must be “well-formed” to be machine-readable. •XML documents must have a root element•XML elements must have a closing tag•XML tags are case sensitive•XML elements must be properly nested•XML attribute values must be quoted
![Page 15: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/15.jpg)
Exercise 1Copy and paste the following code into a new XML document in Oxygen. Correct all errors necessary to make this a well-formed XML document. <movie id=1> <title>The Green Mile<title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <Year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</movie> </year>
![Page 16: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/16.jpg)
<!-- Comments -->
Enclose comments within double-hyphen/angle bracket notation:<!-- a brief comment -->
<!--This is a very long block of comments…… … … more comments… … … comments…(still more comments here…)-->
![Page 17: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/17.jpg)
5 special symbols
To use the following characters in a text value, you must replace them with these entities:
& &
< <
> >
“ "
‘ '
![Page 18: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/18.jpg)
Exercise 2
In your movies.xml document, add another movie to the collection. Add a comment somewhere in the document (or “comment out” a block of elements). When you’ve finished, check for well-formedness (blue check icon).
![Page 19: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/19.jpg)
XML Schemas
Schemas describe the syntax rules for encoding a data model in XML:– Allowable elements, attributes, and values– Element types -- simple or complex
• Simple – contains a value• Complex – contains other elements
– Constraints of elements, attributes, and values• Repeatability (how many instances of each element allowed)• Obligation (is the element or attribute mandatory?)
– Datatypes of values (integer, string, date, etc.)
![Page 20: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/20.jpg)
<movies xmlns="http://example.com/schema.xsd"> <movie id="1"> <title>The Green Mile</title> <year>1999</year> </movie> <movie id="2"> <title>Taxi Driver</title> <year>1976</year> </movie> <movie id="3"> <title>The Matrix: Revolutions</title> <year>2004</year> </movie> <movie id="4"> <title>Shrek II</title> <year>2004</year> </movie></movies>
![Page 21: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/21.jpg)
XML Schemas
• Schemas are themselves XML files but with a .xsd file extension.
• In our XML document, we reference the schema by using a “namespace”
![Page 22: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/22.jpg)
Namespaces
The namespace is the unique identifier for the schema.
<mods xmlns=“http://www.loc.gov/mods/v3”> <titleInfo> <title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title> </titleInfo>……</mods>
![Page 23: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/23.jpg)
Namespace prefixes
When two or more schemas are used in an XML document, we use “prefixes” to distinguish between the elements of each.
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:etd="http://www.ndltd.org/standards/metadata/etdms/1.0/">…… <dateIssued>2011</dateIssued> <extension> <etd:degree>Ph.D.</etd:degree> <etd:discipline>Electrical and Computer Engineering</etd:discipline> </extension></mods>
![Page 24: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/24.jpg)
Valid XML
To be “valid” an XML document must:•Be well-formed•Include the schema declaration in the root element (e.g., <mods xmlns=“http://www.loc.gov/mods/v3”>)
•Conform to the rules of the schema
![Page 25: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/25.jpg)
Exercise 3
Copy and paste the code on the next slide into a new XML document in Oxygen. Add a <name> element to the document, then validate (red check icon). If it validates, then introduce an error into your document to see what error messages Oxygen gives you.
![Page 26: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/26.jpg)
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:etd="http://www.ndltd.org/standards/metadata/etdms/1.0/" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd" version="3.4">
<titleInfo> <title>Pac-man shaped magnetic tunnel junctions for magnetic flip flops for space applications</title> </titleInfo> <name type="personal"> <namePart>Red Ghost</namePart> <role> <roleTerm>Author</roleTerm> </role> </name> <name type="personal"> <namePart>Dot Chomper</namePart> <role> <roleTerm>Advisor</roleTerm> </role> </name> <abstract>Pac-man shaped magnetic tunnel junctions are proposed for CMOS-based magnetic flip flops for space applications…<abstract> <originInfo> <dateIssued>2011</dateIssued> </originInfo> <extension> <etd:degree>Ph.D.</etd:degree> <etd:discipline>Electrical and Computer Engineering</etd:discipline> </extension></mods>
![Page 27: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/27.jpg)
Using and creating schemas
• Always start with the data model!• Decide what entities and properties are
important to you and your project before choosing or creating a schema.
![Page 28: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/28.jpg)
Things to consider
• Are there existing schemas that meet your needs? • Are there commonly used schemas within your field? • If you find a schema that almost meets your needs, can
you extend it to cover the entire scope of what you want to model?
• Who (or what software applications) will you be sharing the data with?
• What kind of functionality do you want to support? Indexing? Flexible display? Visualizations?
![Page 29: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/29.jpg)
Tailor schemas to meet your needs
• You can make schema rules more strict (but not more lax)
• Extend schemas with other schemas (Your primary schema must allow extensions)
• If you expect use of your XML data to be very limited, you can change the schema. (Not recommended if you plan to share your data widely or beyond your own software applications)
![Page 30: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/30.jpg)
Documentation
• Data dictionaries, markup guidelines, best practices are important, especially if you have assistants entering your data.
• Examples of documentation:– MODS guidelines:
http://www.loc.gov/standards/mods/userguide/generalapp.html
– UVa Library TEI guidelines: http://www.lib.virginia.edu/digital/reports/teiPractices/dlpsPractices_postkb.html
![Page 31: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/31.jpg)
Exercise 4Work together to create a data model for a dictionary (or a knowledge domain of your choosing). What should the root element be? What are the elements that will be contained within the root? What are the attributes* (properties) of each of your elements?
Create an instance of your data model in XML. What adjustments or enhancements would you need to make for your schema to be extensible?
*How do you know when something should be an attribute or an element? There is often no wrong answer to this. Use your best judgment—if you think you will not need to further refine a property (for instance, in our recipe example we would not need to refine quantity or unit any further), an attribute is probably the best choice.
![Page 32: XML Basics for Digital Humanists Alabama Digital Humanities Center September 19 & 23, 2011 Instructor: Shawn Averkamp, Metadata Librarian smaverkamp@ua.edu.](https://reader035.fdocuments.us/reader035/viewer/2022062423/5697bf921a28abf838c8f2ea/html5/thumbnails/32.jpg)
Resources
• Books, tutorials, and other resources: http://www.lib.ua.edu/digitalhumanities/xml-resources
• http://www.xml.com/