WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES
description
Transcript of WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES
WHAT I HAVE FOUND OUT FROM AN
ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES
BYMartha M. YeeCataloging SupervisorUCLA Film & Television [email protected]://myee.bol.ucla.edu
INTRODUCTION1. Some definitions2. The vision3. The experiment4. Some problems?
SOME DEFINITIONSThe semantic web: a way to
represent knowledge; a knowledge representation language that provides ways of expressing meaning that are amenable to computation; a means of constructing maps of domains of knowledge consisting of class and property axioms with a formal semantics
SOME DEFINITIONSThe semantic webThe web as huge shared
databaseHyperdata replacing hypertext
SOME DEFINITIONSRDF (Resource Description
Framework): a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats
SOME DEFINITIONSRDF (Resource Description
Framework)Data encoded as:the subject of a triple (New York)the predicate of a triple (has the
postal abbreviation)the object of a triple (NY)
SOME DEFINITIONSRDF (Resource Description
Framework)XML is commonly used to express
RDF, but is not a necessity
SOME DEFINITIONSRDF (Resource Description
Framework)RDFS or RDF Schema is an
extensible knowledge representation language providing basic elements for the description of ontologies, AKA RDF vocabularies
SOME DEFINITIONSRDF (Resource Description
Framework)RDFS data encoded as:Class (= Entity); the subject of a tripleClass relationship (semantic linkage);
the predicate of a tripleClass property (= Attribute); the
object of a triple
SOME DEFINITIONSRDF (Resource Description Framework)OWL (Web Ontology Language): a
family of knowledge representation languages for authoring ontologies compatible with RDF
SOME DEFINITIONSRDF (Resource Description Framework)SKOS (Simple Knowledge Organisation
Systems): a family of formal languages built upon RDF and designed for representation of thesauri, classification schemes, taxonomies or subject-heading systems
THE VISIONThe Web as shared database
instead of shared document store
THE VISIONInstead of records, URI’s (Uniform
Resource Identifiers) for entities:
URI for work containing all work attributes, including preferred name, variant names, but also much more data about work than our current authority records do
THE VISIONURI for expression, containing all
expression attributes, and linked back to work
THE VISIONURI for manifestation, containing
all manifestation attributes, and linked back to expression
THE VISIONURI’s for persons, corporate bodies,
places, subjects, etc. , including preferred name, variant names, but also much more data about person, corporate body, place or subject (concept or object) than our current authority records do
THE VISIONIf any data about a particular
entity needed to be changed, it would be changed once at the URI and immediately accessible to all users, libraries and library staff by means of links down to local data such as circulation, acquisitions, and binding data
THE EXPERIMENTA set of cataloging rules that are more
FRBR-ized than RDA in that they more clearly differentiate between:
data applying to the expression vs.data applying to the manifestation
THE EXPERIMENTYou can find these rules at:
http://myee.bol.ucla.edu
THE EXPERIMENTI am now in the process of trying
to model my cataloging rules in the form of an RDF/RDFS/OWL/SKOS model
THE EXPERIMENTI don’t seriously expect anyone
to adopt these rules!
THE EXPERIMENTMy research questions:1. Is it possible for catalogers to
tell in all cases whether a piece of data pertains to the expression or the manifestation?
THE EXPERIMENTMy research questions:2. Is it possible to fit our data into
RDF/RDFS/OWL/SKOS?
THE EXPERIMENTMy research questions:3. If it is, is it possible to use that data to
design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?
THE EXPERIMENTYou can find my
RDF/RDFS/OWL/SKOS model at:
http://myee.bol.ucla.edu
SOME PROBLEMS?Can we do what we need to do
within the context of the semantic web?
SOME PROBLEMS?More granularity, or data parsing
by catalogers Those familiar with RDA, FRBR,
and FRAD development will recognize that much of that development is directed at increasing granularity in cataloger-produced data
SOME PROBLEMS?Granularity issues:More structure and more granularity
makes possible more powerful indexing and more sophisticated display,
but is more complex and expensive to apply and less likely to be adopted in a standard fashion across all communities, i.e. less likely to produce interoperable data.
SOME PROBLEMS?Granularity issues:Currently, we demarcate a surname from a
forename by putting the surname first, followed by a comma and than the forename.
Even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is "surname" and which part is "forename" in a culture unfamiliar to the cataloger.
SOME PROBLEMS?Granularity issues:Currently we do not collect information
about gender. If we were to increase the granularity
of our data in order to gather that information, we would encounter situations in which the cataloger would not necessarily know if a given creator was a female or a male or of some other sexual orientation.
SOME PROBLEMS?Granularity issues:Currently, if we are adding a birth and/or
death date, whatever dates we use are all together in a $d subfield, without any separate coding to indicate which date is birthdate and which is death date (although an occasional b. or d. will tell us this kind of information).
We could certainly provide more granularity for dates, but that would make the MARC format just that much more complex and difficult to learn.
SOME PROBLEMS?Granularity issues:People who dislike the MARC format
already argue that it is too granular and therefore requires too much of a learning curve before people encoding data using MARC can learn to use it.
How much of the granularity already in MARC is used either in existing records, or even if present, is used in indexing and display software?
SOME PROBLEMS?Granularity issues:Granularity costs money and libraries
and archives are already starving for resources.
Granularity can only be provided by people, and people are expensive.
One frightening thing about the Internet is that it seems to be based on an economy of free intellectual labor. Only the programmers get paid. Everyone else is a volunteer.
SOME PROBLEMS?Other issues:Potentially every piece of data describing
a particular entity could be represented by a URI leading out to a SKOS list of data values. Is the Internet really fast enough to assemble a record from hundreds of URI’s in a reasonable amount of time?
SOME PROBLEMS?If the work is represented by a URI
and the author of the work is represented by a linked URI,
how would it be possible to guarantee success for a user that searched on
a variant of the author name in combination with a variant of the
title?
SOME PROBLEMS?There is a cross reference from FBI to
United States. Federal Bureau of Investigation, but not from FBI Counterterrorism Division to United States. Federal Bureau of Investigation. Counterterrorism Division. For that reason, a search in any OPAC name index for FBI Counterterrorism Division will fail.
SOME PROBLEMS?The solution to this problem
is to define a transitive or inheritance relationship between a corporate body and its corporate subdivisions.
SOME PROBLEMS?Unfortunately, RDF seems to
resist hierarchical relationship.
It assumes that you just need to connect everything to everything else without needing to express any hierarchy.
SOME PROBLEMS?This is bad news for
bibliographic data which is rife with hierarchical relationships.
Hierarchy is one of our major tools for expressing meaning to our users.
SOME PROBLEMS?Can all bibliographic data be
reduced to either a class or a property with a finite list of values? Another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus?
SOME PROBLEMS?Is there an assumption on part of
semantic web developers that a given type of data, such as publisher name, would be EITHER “literal” (i.e. transcribed or composed) OR represented by a URI (controlled)?
SOME PROBLEMS?Cataloging is rooted in humanistic
practices that require careful recording of evidence. There will always be a value in distinguishing (and labelling as such) the following types of data:
copied as is from an artifact (transcribed)supplied by a catalogercategorized by a cataloger (controlled)
SOME PROBLEMS?I notice that Tim Berners-Lee, the
father of the Internet and the Semantic Web himself, emphasizes the importance of recording not just data, but where the data came from, for the sake of authenticity (see February 7, 2008 interview of Sir Tim Berners-Lee by Talis http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html)
SOME PROBLEMS?For many data elements, therefore,
it will be important to be able to record BOTH a literal (transcribed and/or composed form) AND a URI (controlled form)
Is this a problem in RDF?