WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

WHAT I HAVE FOUND OUT FROM AN

ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

BYMartha M. YeeCataloging SupervisorUCLA Film & Television [email protected]://myee.bol.ucla.edu

mailto:[email protected]

INTRODUCTION1. Some definitions2. The vision3. The experiment4. Some problems?

SOME DEFINITIONSThe semantic web: a way to

represent knowledge; a knowledge representation language that provides ways of expressing meaning that are amenable to computation; a means of constructing maps of domains of knowledge consisting of class and property axioms with a formal semantics

SOME DEFINITIONSThe semantic webThe web as huge shared

databaseHyperdata replacing hypertext

SOME DEFINITIONSRDF (Resource Description

Framework): a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats


Framework)Data encoded as:the subject of a triple (New York)the predicate of a triple (has the

postal abbreviation)the object of a triple (NY)


Framework)XML is commonly used to express

RDF, but is not a necessity


Framework)RDFS or RDF Schema is an

extensible knowledge representation language providing basic elements for the description of ontologies, AKA RDF vocabularies


Framework)RDFS data encoded as:Class (= Entity); the subject of a tripleClass relationship (semantic linkage);

the predicate of a tripleClass property (= Attribute); the

object of a triple

SOME DEFINITIONSRDF (Resource Description Framework)OWL (Web Ontology Language): a

family of knowledge representation languages for authoring ontologies compatible with RDF

SOME DEFINITIONSRDF (Resource Description Framework)SKOS (Simple Knowledge Organisation

Systems): a family of formal languages built upon RDF and designed for representation of thesauri, classification schemes, taxonomies or subject-heading systems

THE VISIONThe Web as shared database

instead of shared document store

THE VISIONInstead of records, URI’s (Uniform

Resource Identifiers) for entities:

URI for work containing all work attributes, including preferred name, variant names, but also much more data about work than our current authority records do

THE VISIONURI for expression, containing all

expression attributes, and linked back to work

THE VISIONURI for manifestation, containing

all manifestation attributes, and linked back to expression

THE VISIONURI’s for persons, corporate bodies,

places, subjects, etc. , including preferred name, variant names, but also much more data about person, corporate body, place or subject (concept or object) than our current authority records do

THE VISIONIf any data about a particular

entity needed to be changed, it would be changed once at the URI and immediately accessible to all users, libraries and library staff by means of links down to local data such as circulation, acquisitions, and binding data

THE EXPERIMENTA set of cataloging rules that are more

FRBR-ized than RDA in that they more clearly differentiate between:

data applying to the expression vs.data applying to the manifestation

THE EXPERIMENTYou can find these rules at:

http://myee.bol.ucla.edu

THE EXPERIMENTI am now in the process of trying

to model my cataloging rules in the form of an RDF/RDFS/OWL/SKOS model

THE EXPERIMENTI don’t seriously expect anyone

to adopt these rules!

THE EXPERIMENTMy research questions:1. Is it possible for catalogers to

tell in all cases whether a piece of data pertains to the expression or the manifestation?

THE EXPERIMENTMy research questions:2. Is it possible to fit our data into

RDF/RDFS/OWL/SKOS?

THE EXPERIMENTMy research questions:3. If it is, is it possible to use that data to

design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?

THE EXPERIMENTYou can find my

RDF/RDFS/OWL/SKOS model at:

http://myee.bol.ucla.edu

SOME PROBLEMS?Can we do what we need to do

within the context of the semantic web?

SOME PROBLEMS?More granularity, or data parsing

by catalogers Those familiar with RDA, FRBR,

and FRAD development will recognize that much of that development is directed at increasing granularity in cataloger-produced data

SOME PROBLEMS?Granularity issues:More structure and more granularity

makes possible more powerful indexing and more sophisticated display,

but is more complex and expensive to apply and less likely to be adopted in a standard fashion across all communities, i.e. less likely to produce interoperable data.

SOME PROBLEMS?Granularity issues:Currently, we demarcate a surname from a

forename by putting the surname first, followed by a comma and than the forename.

Even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is "surname" and which part is "forename" in a culture unfamiliar to the cataloger.

SOME PROBLEMS?Granularity issues:Currently we do not collect information

about gender. If we were to increase the granularity

of our data in order to gather that information, we would encounter situations in which the cataloger would not necessarily know if a given creator was a female or a male or of some other sexual orientation.

SOME PROBLEMS?Granularity issues:Currently, if we are adding a birth and/or

death date, whatever dates we use are all together in a $d subfield, without any separate coding to indicate which date is birthdate and which is death date (although an occasional b. or d. will tell us this kind of information).

We could certainly provide more granularity for dates, but that would make the MARC format just that much more complex and difficult to learn.

SOME PROBLEMS?Granularity issues:People who dislike the MARC format

already argue that it is too granular and therefore requires too much of a learning curve before people encoding data using MARC can learn to use it.

How much of the granularity already in MARC is used either in existing records, or even if present, is used in indexing and display software?

SOME PROBLEMS?Granularity issues:Granularity costs money and libraries

and archives are already starving for resources.

Granularity can only be provided by people, and people are expensive.

One frightening thing about the Internet is that it seems to be based on an economy of free intellectual labor. Only the programmers get paid. Everyone else is a volunteer.

SOME PROBLEMS?Other issues:Potentially every piece of data describing

a particular entity could be represented by a URI leading out to a SKOS list of data values. Is the Internet really fast enough to assemble a record from hundreds of URI’s in a reasonable amount of time?

SOME PROBLEMS?If the work is represented by a URI

and the author of the work is represented by a linked URI,

how would it be possible to guarantee success for a user that searched on

a variant of the author name in combination with a variant of the

title?

SOME PROBLEMS?There is a cross reference from FBI to

United States. Federal Bureau of Investigation, but not from FBI Counterterrorism Division to United States. Federal Bureau of Investigation. Counterterrorism Division. For that reason, a search in any OPAC name index for FBI Counterterrorism Division will fail.

SOME PROBLEMS?The solution to this problem

is to define a transitive or inheritance relationship between a corporate body and its corporate subdivisions.

SOME PROBLEMS?Unfortunately, RDF seems to

resist hierarchical relationship.

It assumes that you just need to connect everything to everything else without needing to express any hierarchy.

SOME PROBLEMS?This is bad news for

bibliographic data which is rife with hierarchical relationships.

Hierarchy is one of our major tools for expressing meaning to our users.

SOME PROBLEMS?Can all bibliographic data be

reduced to either a class or a property with a finite list of values? Another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus?

SOME PROBLEMS?Is there an assumption on part of

semantic web developers that a given type of data, such as publisher name, would be EITHER “literal” (i.e. transcribed or composed) OR represented by a URI (controlled)?

SOME PROBLEMS?Cataloging is rooted in humanistic

practices that require careful recording of evidence. There will always be a value in distinguishing (and labelling as such) the following types of data:

copied as is from an artifact (transcribed)supplied by a catalogercategorized by a cataloger (controlled)

SOME PROBLEMS?I notice that Tim Berners-Lee, the

father of the Internet and the Semantic Web himself, emphasizes the importance of recording not just data, but where the data came from, for the sake of authenticity (see February 7, 2008 interview of Sir Tim Berners-Lee by Talis http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html)

SOME PROBLEMS?For many data elements, therefore,

it will be important to be able to record BOTH a literal (transcribed and/or composed form) AND a URI (controlled form)

Is this a problem in RDF?

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

Documents

Transcript of WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES