WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

45
WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

description

WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES. BY. Martha M. Yee Cataloging Supervisor UCLA Film & Television Archive [email protected] http://myee.bol.ucla.edu. INTRODUCTION. 1. Some definitions 2. The vision 3. The experiment - PowerPoint PPT Presentation

Transcript of WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

Page 1: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

WHAT I HAVE FOUND OUT FROM AN

ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

Page 2: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

BYMartha M. YeeCataloging SupervisorUCLA Film & Television [email protected]://myee.bol.ucla.edu

Page 3: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

INTRODUCTION1. Some definitions2. The vision3. The experiment4. Some problems?

Page 4: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSThe semantic web: a way to

represent knowledge; a knowledge representation language that provides ways of expressing meaning that are amenable to computation; a means of constructing maps of domains of knowledge consisting of class and property axioms with a formal semantics

Page 5: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSThe semantic webThe web as huge shared

databaseHyperdata replacing hypertext

Page 6: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description

Framework): a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats

Page 7: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description

Framework)Data encoded as:the subject of a triple (New York)the predicate of a triple (has the

postal abbreviation)the object of a triple (NY)

Page 8: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description

Framework)XML is commonly used to express

RDF, but is not a necessity

Page 9: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description

Framework)RDFS or RDF Schema is an

extensible knowledge representation language providing basic elements for the description of ontologies, AKA RDF vocabularies

Page 10: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description

Framework)RDFS data encoded as:Class (= Entity); the subject of a tripleClass relationship (semantic linkage);

the predicate of a tripleClass property (= Attribute); the

object of a triple

Page 11: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description Framework)OWL (Web Ontology Language): a

family of knowledge representation languages for authoring ontologies compatible with RDF

Page 12: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME DEFINITIONSRDF (Resource Description Framework)SKOS (Simple Knowledge Organisation

Systems): a family of formal languages built upon RDF and designed for representation of thesauri, classification schemes, taxonomies or subject-heading systems

Page 13: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONThe Web as shared database

instead of shared document store

Page 14: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONInstead of records, URI’s (Uniform

Resource Identifiers) for entities:

URI for work containing all work attributes, including preferred name, variant names, but also much more data about work than our current authority records do

Page 15: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONURI for expression, containing all

expression attributes, and linked back to work

Page 16: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONURI for manifestation, containing

all manifestation attributes, and linked back to expression

Page 17: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONURI’s for persons, corporate bodies,

places, subjects, etc. , including preferred name, variant names, but also much more data about person, corporate body, place or subject (concept or object) than our current authority records do

Page 18: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE VISIONIf any data about a particular

entity needed to be changed, it would be changed once at the URI and immediately accessible to all users, libraries and library staff by means of links down to local data such as circulation, acquisitions, and binding data

Page 19: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTA set of cataloging rules that are more

FRBR-ized than RDA in that they more clearly differentiate between:

data applying to the expression vs.data applying to the manifestation

Page 20: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTYou can find these rules at:

http://myee.bol.ucla.edu

Page 21: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTI am now in the process of trying

to model my cataloging rules in the form of an RDF/RDFS/OWL/SKOS model

Page 22: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTI don’t seriously expect anyone

to adopt these rules!

Page 23: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTMy research questions:1. Is it possible for catalogers to

tell in all cases whether a piece of data pertains to the expression or the manifestation?

Page 24: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTMy research questions:2. Is it possible to fit our data into

RDF/RDFS/OWL/SKOS?

Page 25: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTMy research questions:3. If it is, is it possible to use that data to

design indexes and displays that meet the objectives of the catalog (providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)?

Page 26: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

THE EXPERIMENTYou can find my

RDF/RDFS/OWL/SKOS model at:

http://myee.bol.ucla.edu

Page 27: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Can we do what we need to do

within the context of the semantic web?

Page 28: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?More granularity, or data parsing

by catalogers Those familiar with RDA, FRBR,

and FRAD development will recognize that much of that development is directed at increasing granularity in cataloger-produced data

Page 29: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:More structure and more granularity

makes possible more powerful indexing and more sophisticated display,

but is more complex and expensive to apply and less likely to be adopted in a standard fashion across all communities, i.e. less likely to produce interoperable data.

Page 30: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:Currently, we demarcate a surname from a

forename by putting the surname first, followed by a comma and than the forename.

Even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is "surname" and which part is "forename" in a culture unfamiliar to the cataloger.

Page 31: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:Currently we do not collect information

about gender. If we were to increase the granularity

of our data in order to gather that information, we would encounter situations in which the cataloger would not necessarily know if a given creator was a female or a male or of some other sexual orientation.

Page 32: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:Currently, if we are adding a birth and/or

death date, whatever dates we use are all together in a $d subfield, without any separate coding to indicate which date is birthdate and which is death date (although an occasional b. or d. will tell us this kind of information).

We could certainly provide more granularity for dates, but that would make the MARC format just that much more complex and difficult to learn.

Page 33: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:People who dislike the MARC format

already argue that it is too granular and therefore requires too much of a learning curve before people encoding data using MARC can learn to use it.

How much of the granularity already in MARC is used either in existing records, or even if present, is used in indexing and display software?

Page 34: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Granularity issues:Granularity costs money and libraries

and archives are already starving for resources.

Granularity can only be provided by people, and people are expensive.

One frightening thing about the Internet is that it seems to be based on an economy of free intellectual labor. Only the programmers get paid. Everyone else is a volunteer.

Page 35: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Other issues:Potentially every piece of data describing

a particular entity could be represented by a URI leading out to a SKOS list of data values. Is the Internet really fast enough to assemble a record from hundreds of URI’s in a reasonable amount of time?

Page 36: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?If the work is represented by a URI

and the author of the work is represented by a linked URI,

how would it be possible to guarantee success for a user that searched on

a variant of the author name in combination with a variant of the

title?

Page 37: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?There is a cross reference from FBI to

United States. Federal Bureau of Investigation, but not from FBI Counterterrorism Division to United States. Federal Bureau of Investigation. Counterterrorism Division. For that reason, a search in any OPAC name index for FBI Counterterrorism Division will fail.

Page 38: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?The solution to this problem

is to define a transitive or inheritance relationship between a corporate body and its corporate subdivisions.

Page 39: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Unfortunately, RDF seems to

resist hierarchical relationship.

It assumes that you just need to connect everything to everything else without needing to express any hierarchy.

Page 40: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?This is bad news for

bibliographic data which is rife with hierarchical relationships.

Hierarchy is one of our major tools for expressing meaning to our users.

Page 41: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Can all bibliographic data be

reduced to either a class or a property with a finite list of values? Another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus?

Page 42: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Is there an assumption on part of

semantic web developers that a given type of data, such as publisher name, would be EITHER “literal” (i.e. transcribed or composed) OR represented by a URI (controlled)?

Page 43: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?Cataloging is rooted in humanistic

practices that require careful recording of evidence. There will always be a value in distinguishing (and labelling as such) the following types of data:

copied as is from an artifact (transcribed)supplied by a catalogercategorized by a cataloger (controlled)

Page 44: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?I notice that Tim Berners-Lee, the

father of the Internet and the Semantic Web himself, emphasizes the importance of recording not just data, but where the data came from, for the sake of authenticity (see February 7, 2008 interview of Sir Tim Berners-Lee by Talis http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html)

Page 45: WHAT I HAVE FOUND OUT FROM AN ATTEMPT TO BUILD AN RDF MODEL OF FRBR-IZED CATALOGING RULES

SOME PROBLEMS?For many data elements, therefore,

it will be important to be able to record BOTH a literal (transcribed and/or composed form) AND a URI (controlled form)

Is this a problem in RDF?