Metadata standards Guidelines, data structures, and file formats to facilitate reliability and...

19
Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description

Transcript of Metadata standards Guidelines, data structures, and file formats to facilitate reliability and...

Page 1: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Metadata standards

Guidelines, data structures, and file formats to facilitate reliability and

quality of description

Page 2: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Outline

• Why create and follow metadata standards?• What kinds of standards are there? • How does this all work? • How do standards evolve?

Page 3: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

The world of standards

A standard is any agreed-upon means of doing something.

Standards can be formally created and adopted or merely customary.

With standards, products and processes have a certain level of consistency and predictability that can make production and use more efficient.

Page 4: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Goals of metadata standards

Metadata standards enable more reliable description. For example, by agreeing to use separate fields to indicate first names and last names of resource creators, displays of search results by author can be properly alphabetized and more easily read, no matter if first name or last name comes first in the display.

Reliable description enables the sharing of data across different systems.

Page 5: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Types of standardsElings and Waibel describe four types of metadata standards: • Data structure (fields); MARC and EAD. • Data content (values); AACR2 (RDA) and DACS.• Data format; XML. • Data exchange; Z39.50 and OAI.

These are useful categories, but sometimes standards may straddle them. You could say, for example, that MARC reflects AACR2 and not the other way around (although MARC defines data fields in a technical sense, AACR2 defines the content with which the fields are populated and to some degree conceptually determines the MARC fields; in practice these two become functionally intertwined).

Page 6: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Multiple standards at work

A cataloger uses AACR2 to determine:• That a book’s title should be part of its description. • The wording, spelling, capitalization, and punctuation of the title.

The cataloger uses MARC to record the title information in a consistent form that computers can process.

Page 7: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Multiple standards at work

Two computer networks can use Z39.50 to determine how to exchange their MARC catalog records.

The result? A user at Library A can search Library B’s catalog and not discern a difference in the way that information is structured and presented. It just works.

Page 8: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Developing and adopting standards

Organizations agree to adopt standards because the benefits of creating products or services that work together can be great.

However, developing standards and forging that agreement can be a difficult process.

For metadata content standards, using them can be complicated, and there is plenty of room for interpretive flexibility.

Page 9: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Content standards: considerations

Why are content standards so complicated? Because documents are various!

Most content standards will try to implement a few basic guidelines supplemented by rules and options for special cases.

Ideally, the basic guidelines will be based on clearly articulated goals and principles.

Page 10: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Example: RDA goals

RDA has articulated a concrete set of descriptive goals and principles.

A few goals:

• Enable description of any resource (not just printed materials).

• Align with the FRBR conceptual model (works, expressions, manifestations, resources) and its objectives (finding, selecting, understanding, and so on).

• Create content descriptions that can be used in multiple encodings and displays.

• Retain backward compatibility with existing records.

Page 11: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Example: RDA Principles

One principle is that descriptions should reflect “the resource’s representation of itself.”

This is a longstanding principle in library cataloging: where possible, description = transcription.

This can be linked to the objective of finding known items: the catalog description should match how the item is known to others, which is most likely from the item itself.

Page 12: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Example: RDA guidelines

This principle of transcription underlies the basic guideline for RDA titles, which is that the “title proper” or primary title should come from the preferred source of information, which for books is the title page.

While the wording comes from the title page, though, the capitalization and punctuation are standardized for all titles.

Page 13: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Example: RDA special cases

What if...

• Some introductory words on the title page seem like they’re not really part of the title (e.g., Walt Disney Presents Sleeping Beauty)?

• The title is given in two languages (e.g., Canadian Literature/Litterature Canadienne)?

• There is a spelling mistake in the title?

• The document is a manifestation of a commonly known work but has a slightly different title than most manifestations (e.g., William Shakespeare’s Hamlet)?

• A subtitle appears under what seems to be the main title (e.g., Museum Informatics an introductory textbook)?

• The title is over one paragraph long?

Page 14: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Keeping standards relevant

Standards are immediately out of date, of course.

RDA has been in development since 2004, as part of a cooperative effort by U.S., U.K., Canadian, and Australian library associations. These are tremendous efforts!

Particular institutions, such as the Library of Congress, will issue their own rules for interpreting the standards, which smaller organizations (such as the University of Texas) may or may not choose to adopt.

Page 15: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Your mission

Complete your subject classification for next week: introduction, classified structure, alphabetical structure, and reflective essay.

A few notes on assignments, based on what I’ve seen in meeting with many of you, follow...

Page 16: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Sort like with like

Try to place like kinds of things together (processes, products, people), not just things that have some thematic relation. Remember, a hierarchy in its strict form takes one kind of thing and goes from the most general category to the most specific.

this: Animals -> domesticated animals -> animals raised for food -> pigs

this: Agricultural processes -> farming -> factory farming

this: Effects -> effects of farming practices -> effects on animals -> overcrowding

not this: Animals -> pastures, pens, cages -> overcrowding

not this: Animals -> factory farming -> mercury poisoning

Page 17: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Levels of abstractionWrangling your concepts can be difficult when they are at different levels of abstraction. You may need to generate intermediate levels that weren’t explicit in your source documents.

Source concepts: meat eating, e.coli, cholesterol, sustainabilitydisadvantages of meat eating

health risks

health risks associated with meat eating

high cholesterol

health risks associated with industrial meat production

bacterial contamination

e.coli contamination

unsustainable practices

effects of industrial meat production

consumption of resources

pollution

Page 18: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Node labels or subfacet labels

Especially because your classifications are small, many of you may make use of labels that help clarify the principles of division used in your classified structure.

In most cases, you will not use these terms to describe documents, and they are not, strictly speaking, actual concepts in your classification. You don’t need to include them in your alphabetical representation.

Example

Computers<by form factor>

DesktopLaptop

<by operating system>MacOSLinuxWindows

<by operating system> is just a structural label. It’s not a concept you’ll use to categorize documents.

Page 19: Metadata standards Guidelines, data structures, and file formats to facilitate reliability and quality of description.

Non-subject concepts

Don’t include document attributes that aren’t subjects, such as forms or genres (blogs, articles, books, diaries...). Really, I mean it.

You are creating a representation of a subject that can be used to organize documents; you are not describing the types of documents in which users might be interested.

Include in your classification: terms for concepts that relate to gardening, such as types of plants (grasses, cacti, shrubs).

Do not include in your classification: Document types that list such plants (plant databases, seed catalogs). However, you might use your classification to categorize a cactus database with the Cacti concept...

INF 384 C, Spring 2009