Exercise overview - Erik Web viewClass 5 – Methods of description, representation and...
Transcript of Exercise overview - Erik Web viewClass 5 – Methods of description, representation and...
Class 5 – Methods of description, representation and
classification
Exercise overview
Last week we explored cataloging models and learned different types of metadata. We explored the
world of cataloging principles and got a glimpse of how these principles inform metadata
representation and encoding standards. We learned about the development of cataloging practices
and standards in libraries, museums and archives (LAM) and the challenges associated with
developing and maintaining standards that interoperate in these different domains.
This week we will continue exploring our metadata model by learning about a metadata standard
called Dublin Core and finding out more about the MARC bibliographic standard. Our focus this week
is on data fields/structure and data content/values and how these metadata schema and storage
mechanisms help us create and work with representations. In this class we will compare what we
learned about RDA/AACR2 with our knowledge of Dublin Core to evaluate the suitability of metadata
standards to an information need. Next week we will begin exploring how these metadata standards
are encoded or represented in digital documents.
Instructions:
Work individually or in groups to complete the worksheet. When you get to a section that requires
you to select a resource to explore – pick one resource (please don’t always choose the first one!).
When asked to ‘discuss as a group’, consider your response and continue completing the worksheet.
We’re going to work with computer coding today and here’s an important note as you follow the exercises. Computer code is shown on numbered lines and are enclosed in boxes. The
numbered lines are simply to help as a reference during instruction and should not be copied into
your program. For example a line that reads 56. p { visibility:hidden; } should simply be typed in as
p { visibility:hidden; }
Metadata Standards and Web Services Page 1
Erik Mitchell
Suggested readings1. Mitchell, E. (2013). Chapter 1: Metadata developments in libraries and other cultural heritage institutions. In Library
Linked Data: Research and Adoption, Library Technology Reports, 49,5 July/August 2013.
2. Review: Understanding Metadata. NISO press. http://www.niso.org/publications/press/UnderstandingMetadata.pdf
3. Powell & Johnston. (2007). Guidelines for implementing dublin core in xml. http://dublincore.org/documents/dc-xml-guidelines/
4. Read/skim: Taylor, Arlene G. and Daniel N.Jouedrey. (2009). “Systems for Vocabulary Control.” The Organization of Information.3rd Edition.
5. Read / Skim to supplement RDA description. Tillett, Barbara. What is FRBR? http://www.loc.gov/cds/downloads/FRBR.PDF
6. Read section 0 of RDA - RDA Toolkit - Section 0, Introduction, 1–12. Retrieved from http://access.rdatoolkit.org. See TXT file from this week for login information
7. Good resource for cataloging rules. https://sites.google.com/site/opencatalogingrules/
8. Gilliand, Anne J. (2012). Setting the Stage. http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.html
9. Explore: Introduction to the Dewey Decimal Classificationhttp://www.oclc.org/dewey/versions/ddc22print/intro.pdf
10. Explore: Introduction to Library of Congress Subject Headings http://www.tulane.edu/~techserv/lcsh%20introd.html
11. Explore: Library of Congress Main Classes, http://www.loc.gov/catdir/cpso/lcco/
12. Harden, Jean. (2012). Inadvertent RDA: New Catalogers’ Errors in AACR2. Journal of Library Metadata. 12:2-3. http://www.tandfonline.com.proxy-um.researchport.umd.edu/doi/full/10.1080/19386389.2012.700597
Understanding types of metadata
Take a few minutes to explore/review the definitions of metadata as found in our NISO reading from
last week (http://www.niso.org/publications/press/UnderstandingMetadata.pdf). Fill out Table 1
describing what purpose these types of metadata serve.
Table 1 Metadata types
Metadata type Metadata use
Administrative
Descriptive
Preservation
Metadata Standards and Web Services Page 2
Erik Mitchell
Structural
Technical
Key Questions
Question 1. What type of metadata is most often used in creating library online catalogs?
Question 2. The NISO document comments on different uses of metadata. In addition to
discovery what other uses are mentioned and which types of metadata would be useful for
each function?
Question 3. In addition to Dublin Core, the NISO document mentions a number of other metadata
standards and their functions. Using the document as your guide, fill out the table below:
Table 2Metadata standards, their function and examples
Metadata standard Primary type (e.g. Descriptive, structural…)
Example application
Dublin Core
Text Encoding Initiative (TEI)
Metadata Encoding and Transmission Standard (METS)
Metadata Object Description Schema (MODS)
Metadata Standards and Web Services Page 3
Erik Mitchell
Encoded Archival Description (EAD)
Visual Resource Association (VRA Core)
As you can see, metadata standards can fit both very general and specific uses, can be used for any
range of needs and can be loosely or strictly structured. The process of designing a metadata
standard or selecting one often involves a process akin to data modeling in which you design or
select a database or metadata standard that meets your information needs.
Conceptual model
With an understanding of the principles underlying the process of information organization lets turn our attention to models that help us understand the elements of representation. For a more in-depth discussion of the building blocks of metadata you can refer to Chapter 4 in Metadata Standards and Web Services in Libraries, Archives, and Museums
Step 1: Spend a few minutes with the Mitchell article, paying attention to the tables on page 9,
complete table 3 and answer the following questions:
Key Questions
Question 4. What are the five elements of a metadata standard, what is an example standard and
what need does each element fill?
Table 3 Exploration of metadata standards elements
Element Example What need does the element fill?
Data Model
Content rules
Metadata Standards and Web Services Page 4
Erik Mitchell
Metadata schema / vocabularies
Data serialization
Data exchange
Question 5. What are some of the challenges in metadata confronting libraries, archives and
museums identified in the Mitchell article?
Reflect back on the cataloging principles studied in class 4 (e.g. Paris Principles, IFLA, ISAD/G).
These principles are foundational to the entire model represented here in that they inform the data
model required to represent relationships, specify rules for creating content, suggest metadata
schema requirements and identify ways in which the created metadata should be stored, exchanged
and used. Consider, for example the inherent complexity in the IFLA guidelines as compared to the
Paris Principles. In order to create and manage new relationships between Works, Expressions,
Manifestations and Items we need to think about each of these metadata building blocks and
consider how they need to change to accommodate those relationships.
Data modeling
Data modeling is the process of designing or selecting an information system that is capable of
storing the metadata that you need to adequately represent your resource. Imagine the data model
required to store all of the information that YouTube needs to keep track of the videos, users,
comments, ratings and license data. Data modeling may involve a relational database like MySQL, a
text-file format like MARC (Machine Readable Cataloging) or a Graph database (Note: A Graph
Metadata Standards and Web Services Page 5
Erik Mitchell
database describes a specific relationship between two things like Erik Likes the UMD iSchool and is
a common structure in social network systems).
In LIS, data modeling has historically focused on describing books and archival resources but the
increasing adoption of electronic and non-print media has challenged our profession’s ability to
support representations of these objects using our standard metadata systems. This semester we
are going to explore metadata standards that still serve the bulk of book cataloging (e.g. MARC,
AACR2, RDA) and standards that support archival resources like Dublin Core (DC). For the scope of
our class, data modeling includes all four of the aspects of the metadata model we explored last week
(fields/structure, content/values, format, data exchange).
Creating metadata
In our work with HTML documents we worked with first class (aka “primary source”) information
objects. They were not representations or derivatives but rather were used as the ‘endpoint.’ HTML
documents in real life tend to be complex because they contain not only ‘first-class’ information but
also contain representation metadata (e.g. meta tags), contextualization metadata (e.g. <p>, <h1>,
<title> elements) and sometimes contain surrogate metadata (e.g. abstracts and summaries). In
contrast, other metadata standards such as Dublin Core or Qualified Dublin Core are designed to
contain descriptive, administrative, or technical metadata about an information object.
Some key uses of descriptive metadata include:
1. Discovery by searching or browsing
2. Identification of resource values and differences
3. Co-location of similar resources
4. Provision of location information
5. Tracking of rights information
6. Differentiation of dissimilar resources
Metadata Standards and Web Services Page 6
Erik Mitchell
Reflect on the user-needs explored in our FRBR model (e.g. Find, Identify, Select, Acquire) and the
connection of these uses to the process of metadata creation. In this class we will explore how a
specific metadata standard – Dublin Core helps fill these services.
Identifying metadata
Representations are useful for text based resources but are almost always essential in media/image
based resources including still images, movies or audio recordings. Representations often contain
metadata that describes the information object but does not always. Lets return to one of our readings
for this week – the overview of metadata types by Gillian.
http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.html
With this article in mind, review the resource at http://1.usa.gov/qZFg3R . Note, if the resource is
unavailable you can find pdf files with the printed metadata in the class module.
Step 2: As you look at the resource, notice the tabs for “about this item, obtaining copies, and
access original.” Find the link to the MARC record and use it to see more metadata about
the record. If you have trouble identifying what the MARC fields mean (e.g. 245) – you can
check out http://www.oclc.org/us/en/bibformats/en. Review the page and classify the
metadata on the page according to Gillian’s types of metadata.
Question 6. What are some example metadata elements for each type of metadata?
a. Descriptive metadata:
b. Technical metadata
c. Administrative metadata:
d. Structural metadata:
Metadata Standards and Web Services Page 7
Erik Mitchell
e. Rights metadata:
Exploring representation schemas – Dublin Core and MARC
No matter what type of metadata we are implementing, we need to follow a schema (e.g. A standard
set of elements that enable other people and systems to understand our metadata). One of the most
common schemas in use both in libraries and other information fields is the Dublin Core standard.
Dublin core consists of 15 core elements that were selected/designed to be applicable to a wide
range of information resources. While the process of working with Dublin Core can get complex, the
basic activity of identifying and assigning metadata to each element is pretty simple. Lets begin by
exploring this schema and how it implements these different types of metadata.
Question 7. Go to a web browser and retrieve the page http://dublincore.org/documents/dces/.
This is the basic Dublin core element set. A set of fifteen properties or elements that is
commonly used to describe elements. Review the metadata on the digital image page above
and map the metadata from the LOC image page onto the 15 Dublin Core properties.
Table 4: DC cataloging
Dublin Core property
Value from LOC digital image
contributor
Coverage
creator
Date
description
Metadata Standards and Web Services Page 8
Erik Mitchell
format
identifier
Language
Publisher
Relation
Rights
Source
Subject
Title
Type
Key Questions
Review the types of metadata you found and answer the following questions:
Question 8. Are there metadata elements on the LOC page that you were not able to map to one
of the fifteen DC properties? What were they and what type of metadata (e.g. descriptive,
administrative, etc) were they?
Question 9. Were there DC properties that you did not understand or did not use? What were
they?
Metadata Standards and Web Services Page 9
Erik Mitchell
Dublin Core is a very general standard that does not dive into the detail often found in other metadata
schemas. For example while Dublin Core has 15 core elements, the MARC metadata schema has
several hundred! In addition the MARC schema allows us to refine metadata elements through the
use of special attributes.
One-to-One principle
One of the most complex decisions catalogers have to make when working with digital objects is
which object they are cataloging. In our Library of Congress example we are cataloging a digital
object of some print resource. In addition, we are dealing with only some limited files from that digital
object (notice for example that the administrative metadata discusses a larger collection). Deciding
which resource you are cataloging and then creating metadata that pertains only to that object is
known as the one-to-one principle (http://wiki.dublincore.org/index.php/Glossary/One-to-
One_Principle). While this is an important factor in reducing ambiguity in cataloging, as you may
have seen, it can be difficult to establish and hold this concept in application. In fact, some standards
(MODS for example) take a more pragmatic approach and suggest that it is acceptable to mix
description of print and digital objects as long as appropriate metadata elements are used.
Contents and values
The foundation of good metadata is proper identification of metadata elements and accurate
transcription / translation of metadata from an information resource into those elements. Remember
that data content refers to the formatting of metadata in a field and data values refers to the use of
controlled vocabularies, authority files, subject heading lists and taxonomies as field values. In the
coming weeks we will explore controlled vocabularies in more detail – particularly in the context of
subject heading assignment. For the time being however, lets expand our understanding of our
Dublin Core metadata schema by exploring the use of controlled vocabulary and content formatting.
Value schemes in metadata
In addition to defining the elements that make up a metadata schema (e.g. title, creator, contributor)
metadata schema also include rules for how the content is formatted. The Dublin Core standard calls
these two types of content control “Vocabulary Encoding Schemes” and “Syntax Encoding Schemes.”
Metadata Standards and Web Services Page 10
Erik Mitchell
Vocabulary schemes help create good metadata by providing a pre-defined list of acceptable values
for a given metadata element (e.g. Language names for the element Language). In contrast, Syntax
schemes help create good metadata by defining the formatting of content within a field (e.g. a specific
way of writing the date in the Date element). Vocabulary encoding schemes include subject heading
lists (e.g. Library of Congress Subject Headings (LCSH) and Dewey Decimal Classification (DDC)),
Thesauri of place names (e.g. TGN) and Internet Document Type categories (e.g. MIME Types).
Step 3: Lets quickly explore one of these vocabularies – the Thesaurus of Geographic Names
(TGN).
a. Using your web browser, open http://www.getty.edu/vow/TGNSearchPage.jsp
b. Search for Washington DC
Key Questions
Question 10. What is the preferred name in TGN for Washington DC?
Question 11. What are some other names?
Question 12. What “Place Types” does TGN map to Washington DC?
Question 13. What are some other ways to identify Washington DC? Are there any unique
identifiers or system actionable metadata that you might use?
Question 14. How might you use the information contained in this vocabulary to improve the
metadata functions related to FRBR (e.g., Find, Identify, Select, Acquire)?
Metadata Standards and Web Services Page 11
Erik Mitchell
Content schemes in metadata
Syntax encoding schemes focus on defining how a particular fields is formatted. Two fields for which
formatting is particularly important is dc:date and dc:identifier.
Step 4: In order to get a sense of these content formatting schemes lets also become acquainted
with the Dublin Core metadata registry, a website that lists all of the fields for Dublin Core.
a. In your web-browser go to http://dublincore.org
b. Click on DCMI Specifications and find “The Dublin Core Metadata Registry” on the
webpage.
c. Go to the Metadata Registry and click on the “Browse | Search” link.
d. Select “Syntax Encoding Schemes” and click OK
e. Click on “dcterms:W3CDTF” and explore the link to the w3c (hint – called see on the
page)
Key Questions
Question 15. How would you format today’s date using the W3c Date Time Format
(W3cDTF)
Question 16. Referring back to the DC metadata registry, pull up the list of properties (Also
known as fields or elements in metadata speak) and look for some elements that would be a
good fit for this content scheme?
While we have focused largely on Dublin Core in this exercise, there is a much larger scheme known
as Qualified Dublin Core that contains many more elements. These elements are often refinements
of the simple Dublin Core set. For example, while Dublin Core has the element "date," Qualified
Dublin Core also has "created," "valid," "available," "issued," and "modified" refinements of this
elements. As you can see with these elements, QDC metadata moves past descriptive metadata
elements into administrative element types.
Metadata Standards and Web Services Page 12
Erik Mitchell
Metadata terminology
Before we become too immersed in metadata land we need to clarify some terminology that we first
learned when working with HTML.
Element: A specific metadata field for which there is a definition and specific use. In Dublin Core
elements are also referred to as Properties
Properties: See elements
Fields: See elements (Note that the word 'field' is very common in MARC lingo)
Attributes: A refining concept for an element that gives the main element context. Also called
Classes in Dublin Core, Attributes provide context for the value of an element.
Classes: See Attributes
Values: A value is a string (‘literal’) (e.g. text) or a pointer to a string (“non-literal”) (e.g. a unique
identifier or URI) that belongs to either an element or an attribute. Values are the “data” in metadata.
Metadata services – Creation, extraction, conversion
As the semester progresses we will spend time exploring metadata-rich services. We have already
created one (our JavaScript bookmarklet) which used technical metadata (e.g. the URL) to see if a
web-resource that we had found was available via UMD.
Step 5: Before we wrap this week up, lets return to our NISO metadata document to get a sense of
what types of metadata services exist.
a. Return to the NISO document and explore pages 10-12 related to metadata creation
and services.
Key Questions
Question 17. In addition to manual metadata creation (i.e. what you did earlier in this
worksheet), what other ways exist to create metadata?
Metadata Standards and Web Services Page 13
Erik Mitchell
Question 18. The NISO document also explores ways to assess metadata quality. We have
already considered one way (adherence to value and content rules). What are some other
ways to assess quality?
Question 19. The Dublin core element and class dictionary that we have been exploring in
this exercise is an example of what type of metadata service?
Automatic metadata processes
As our last exercise this week, lets explore how automated tools can help us create, manipulate,
validate, and index metadata. To do this we will focus on an automatic metadata generation tool.
Step 6: Lets look at an automatic metadata generation tool. Find the brief url to the LC resource
we cataloged above. Now, visit http://www.ukoln.ac.uk/metadata/dcdot/ and enter the URL
metadata harvester box (http://1.usa.gov/qZFg3R). Review the created metadata and
compare it to your Dublin Core record above
Table 5: Manual vs automatic cataloging
DublinCore property
Manually cataloged value Automatically generated value
contributor
Coverage
creator
Date
description
Metadata Standards and Web Services Page 14
Erik Mitchell
format
identifier
Language
Publisher
Relation
Rights
Source
Subject
Title
Type
Key questions
Question 20. Which fields contained identical or similar values?
Question 21. Which fields were widely different? Did the automatic metadata generator do
a better job creating an accurate representation or did you?
Metadata evaluation
While there are a number of measures of metadata quality, in this class we will focus on for that relate
directly to the quality of the metadata in relation to the resource it is representing. These four criteria
are:
1. Specificity (i.e. creating metadata that fits the most specific appropriate level of description)
Metadata Standards and Web Services Page 15
Erik Mitchell
2. Completeness (i.e. how fully our representation describes our resource)
3. Consistency (i.e. metadata generation and formatting)
4. Accuracy (i.e. is the metadata representation correct?)
Step 7: With these four criteria in mind, take a few moments individually or in your group to
compare the metadata record your group created with the record the automatic metadata
generation tool created. For each criteria rate the metadata on a scale of 1-3 (1 = poor, 2 =
acceptable, 3 = excellent).
Table 6: Evaluation of manual/automatic
Metadata quality Manually created Automatically generated
Specificity of metadata (Is the
metadata appropriately
granular?)
Completeness (Is the record
complete?)
Consistency (Are fields
formatted properly? )
Accuracy (Is the generated
metadata correct?)
Total Score
Key Questions
Question 22. When you add the scores for each evaluation which process comes out on
top?
Metadata Standards and Web Services Page 16
Erik Mitchell
Question 23. Do you agree with the overall score? Why or why not?
Question 24. Were there areas where the manual or automatic processes were better?
Why do you think this is?
Summary
This week we began exploring metadata standards, services and evaluative approaches in more
detail. We found that there are metadata standards for different domains and communities and found
that the particular issues of metadata interoperability, transformation and quality control are
considerable. In the coming weeks we will continue working with our metadata models through the
encoding and transformation processes.
Metadata Standards and Web Services Page 17
Erik Mitchell