A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN...

30
A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email [email protected] URL http://www.ukoln.ac.uk/web-focus/presentations UKOLN is supported by:

Transcript of A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN...

Page 1: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

The Role Of MetadataBrian KellyUKOLNUniversity of BathBath, BA2 7AY

[email protected]://www.ukoln.ac.uk/web-focus/presentations

UKOLN is supported by:

Page 2: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Contents

• Introduction• Background To Metadata• Metadata Standards• Metadata Management• Metadata And Quality• Conclusions

The Brief"I know from conversations … I have had with customers, that metadata poses some really difficult questions …"

The talk addresses the questions:What is metadata and why is it important? What's this Dublin Core I've heard about (and why Dublin?) What benefits will I get if I use metadata? How should I do it? What will it cost me?

The Brief"I know from conversations … I have had with customers, that metadata poses some really difficult questions …"

The talk addresses the questions:What is metadata and why is it important? What's this Dublin Core I've heard about (and why Dublin?) What benefits will I get if I use metadata? How should I do it? What will it cost me?

Intr

od

uct

ion

Page 3: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

About UKOLN / Web Focus

UKOLN:• A national centre of expertise in digital information

management (including metadata)• Based at University of Bath• Funded by JISC and Resource to support the

Higher & Further / cultural heritage sectorsUK Web Focus:

• Provides advice and support on Web issues, especially standards and best practices

• Provided by Brian Kelly• Funded by JISC from Nov 1996 - August 2003.

Now jointly funded by JISC & ResourceQA Focus:

• Developing QA methodology to support JISC digital library programmes

Intr

od

uct

ion

Page 4: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

About You

How many are:• Librarians• Software / systems developers (techies)• Commercial vendors• Others

ExpertNovice Average

What is the extent of your knowledge of metadata?

RDFOAICLD…

MARCDublin Core…

???

Intr

od

uct

ion

Page 5: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

What is Metadata?

"This metadata you've been talking about …. isn't it just catalogue records?"

Question at metadata seminar, 1998

Metadata can be regarded as:• Catalogue records for the Web• Data about data• Structured information suitable for automated

processing

Bac

kgro

un

d

Metadata Demystified

http://www.niso.org/standards/resources/Metadata_Demystified.pdf

In current practice, the term has come to mean structured information that feeds into automated processes, and this is currently the most useful way to think about metadata

In current practice, the term has come to mean structured information that feeds into automated processes, and this is currently the most useful way to think about metadata

Page 6: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

The Problem

Back in mid-1990s:• Size of Web growing exponentially• Web being used for both scholarly and

non-scholarly (!) purposes• Need for better searching mechanisms• Search engines seemed promising, but

concerns over abuse (e.g. porn index spammers) and difficulties in finding quality information

• Various sectors came together to develop a core set of metadata attributes for resource discovery

Bac

kgro

un

d

Page 7: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Dublin Core

In mid-1990s:• Meeting held in Dublin, Ohio in 1995• Involvement from several sectors

(libraries, museums, science, IT, …)• Agreement reached on a core set of

metadata attributes for resource discovery• Given the name Dublin Core (DC)• DCMI organisation later formed• DC Working parties established to

coordination development of DC• Regular annual conferences held

Du

blin

Co

re

See <http://dublincore.org/>See <http://dublincore.org/>

Page 8: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Why So Complex?

Why is there a need for working groups, annual events, etc. for developing a standard for catalogue records?

• It's not just documents: an Author record is inappropriate for a painting, a piece of music, etc.

• It's not just for humans: the DC records will be processed by software, for which unambiguity in essential

• It needs to be integrated: with a rapidly-developing Web architecture

• It needs to be future-proofed : so we don't have to do it all again when a new technology emerges

Du

blin

Co

re

Page 9: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Using Dublin Core

Note that DCMI defined a core set of elements:Title A name given to the resource.Creator An entity primarily responsible for

making the content of the resource.

Publisher An entity responsible for making the resource available.

Date A date of an event in the lifecycle of the resource.

… …How this format could be represented was not defined initially

Du

blin

Co

re

Page 10: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Representing Dublin Core

Initially many people thought that DC would be embedded in HTML pages:<META NAME="DC.Creator" CONTENT="Brian Kelly">

but how are multiple author's represented:<META NAME="DC.Creator" CONTENT="Brian Kelly">

<META NAME="DC.Creator" CONTENT="John Smith">

or<META NAME="DC.Creator" CONTENT="Brian Kelly, John Smith">

It is not possible to describe the potential complexities of DC in the HTML language

Du

blin

Co

re

Page 11: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Dublin Core Is Too Simple!

Dublin Core was designed as a core set of metadata elements for resource discovery. However:

• The benefits of the standard became apparent and DC became used in many areas

• There was a need to be able to represent richer metadata content and relationship e.g. Multiple authors and contact details Alternative titles Use of controlled vocabularies from particular

schemesA mechanism known as Qualified Dublin Core was developed to address this.

Du

blin

Co

re

Page 12: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Use In HTML

Dublin Core potential was recognised and the W3C's release of HTML 4.0 included a mechanism for defining schemes in the <meta> element:

<meta name = "DC.Subject" content = "heart attack"><meta name = "DC.Subject" scheme = "MeSH" content = "Myocardial Infarction; Pericardial Effusion">

<meta name = "DC.Type" scheme = "DCMIType" content = "Dataset"><meta name = "DC.Type" scheme = "DCMIType" content = "Event">

See <http://dublincore.org/documents/2001/04/12/usageguide/qualified-html.shtml>

See <http://dublincore.org/documents/2001/04/12/usageguide/qualified-html.shtml>

Du

blin

Co

re

Page 13: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

XML

XML (Extensible Markup Language):• Developed by W3C• A meta-language used to create other languages• Addresses HTML's lack of extensibility• A family of standards which form the foundations

for a richer and more interoperable Web: XML XML Namespaces XSLT XML Schemas …

• A proven success

Rather than slowly tweaking HTML to allow rich DC to be embedded, XML allows new metadata applications to be developed which can be integrated with existing Web services

Rather than slowly tweaking HTML to allow rich DC to be embedded, XML allows new metadata applications to be developed which can be integrated with existing Web services

W3C

Dev

elo

pm

ents

Page 14: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Beyond Use In HTML

In parallel to release of HTML 4.0 W3C working on:• A rich metadata framework which could be used

for any metadata application: Content filtering (this resource contains

nudity) Defining collections of related resources

(Web site maps) Digital signatures …

• Development of the Semantic Web - An ambitious attempt to allow data from distributed services to be integrated

RDF (Resource Description Framework) was developed as W3C's solution to both problems

RDF (Resource Description Framework) was developed as W3C's solution to both problems

W3C

Dev

elo

pm

ents

Page 15: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

RDF

RDF:• An XML application

• Richer than conventional XML applications: a mathematical model which describes relationships is embedded in the RDF

• This richness comes with a price - increased complexity

RDF applications are being developed. However at present it may be advisable to leave RDF to the research community or well-funded pilot studies to prove its benefits before committing to use in a service environment(However note that metadata in PDF documents is stored as RDF)

RDF applications are being developed. However at present it may be advisable to leave RDF to the research community or well-funded pilot studies to prove its benefits before committing to use in a service environment(However note that metadata in PDF documents is stored as RDF)

W3C

Dev

elo

pm

ents

Page 16: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Beyond Resource Discovery

Metadata has a role to play beyond item-level resource discovery

Other metadata applications include:• Metadata for digitised objects: about the object

and about the digitisation process• Management / administrative metadata: review

this resource by xx; delete this resource on …; this resource is managed by the XYZ group; …

• Metadata about collections (physical and online)• …

Usi

ng

Met

adat

a

Page 17: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Metadata Modelling (1)

You want to use Dublin Core metadata. How do you choose how to model your metadata?

• Do you use simple Dublin Core (the basic 15 elements)?

• Do you use qualified Dublin Core to enable richer metadata to be described?

• If the latter, how do you decide which qualified DC metadata to use?

These are key issues to address.In some cases answers may be provided for you.In other cases, you musty answer these questions for yourself.

These are key issues to address.In some cases answers may be provided for you.In other cases, you musty answer these questions for yourself.

Usi

ng

Met

adat

a

Page 18: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Metadata Modelling (2)

Why do you wish to use metadata?• Because it fashionable?• Because you're a librarian and librarians 'do'

metadata?• Because you want you Web site to be no. 1 in

Google?• Because you are developing an application which

requires use of metadata?

Please remember:• Developing applications which make use of

metadata can be expensive.• Creating and managing metadata can be expensive• Search engines such as Google typically make little

or no use of metadata

Please remember:• Developing applications which make use of

metadata can be expensive.• Creating and managing metadata can be expensive• Search engines such as Google typically make little

or no use of metadata

Usi

ng

Met

adat

a

Page 19: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Metadata Modelling (3)

Exploit Interactive case study:• EU-funded ejournal• Requirement to provide

local searching better than simple free text searching:

• Search by title, author and keywords

• Search by funding stream• Search by issue and article type

• The end-user interface is illustrated

See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>

See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>

Usi

ng

Met

adat

a

Page 20: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Metadata Modelling (4)

How did we manage and model the metadata?

Article metadata

doc_title = "The XHTML Interview"author="Kelly, B."title="WebWatching National Node Sites"description = "In this issue's Web Technologies column we ask Brian Kelly to tell us more about XHTML."article_type = "regular"

Issue metadata

issue_num = "6"pub_date="25 Oct 2002"

Site metadata

name = "Exploit Interactive"publisher="UKOLN"

<meta name="DC.Title" content="The XHTML Interview"><meta name="DC.Creator" content="Kelly, B."><meta name="DC.Description" content="In this issue's Web Technologies …."><meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue6/"><meta name="DC.Type" content="text.article.regular" scheme="Exploit-categories">

Processed by server-side script

Page 21: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

You may wish to:• Embed HTML metadata in HTML pages• Link to HTML metadata from HTML• Embed RDF• Store metadata in application

(home-grown scripts, CMS, metadata repository, image management system, …)

Storing DC Metadata

It is up to you how you store your metadata. Your choice will be affected by the use which will be made of your metadata and how it will be created and managed.

You may wish to store your metadata in a database and make it available according to its use.

HTML

RDF

Author Book Pub. Date

G.Orwell 1984 1948

I. Rankin Question Of Blood

2003

Met

adat

a M

anag

emen

t

Metadata management tool

Page 22: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

A Simple DC Management Tool

DC-dot:• Simple Web-based

DC creation and management tool

• Output in range of formats (HTML, XHTML, RDF, …)

• Provides validation• Useful for small-scale

metadata creationBut:

• Not ideal for large-scale usage• Doesn't provide rich

management capabilities

http://www.ukoln.ac.uk/metadata/dcdot/

Met

adat

a M

anag

emen

t

Page 23: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Management Tools

Many types of metadata tools:• Type the metadata by hand• Use File -> Properties menu in MS Office

applications and export data• Home-grown database systems• Home-grown scripting solutions• Use of commercial systems:

• Library management systems• Image management systems• …

There is no single ideal solution.The solution you choose should reflect your needs, expertise, organisational culture, …

There is no single ideal solution.The solution you choose should reflect your needs, expertise, organisational culture, …

Met

adat

a M

anag

emen

t

Page 24: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Quality Assurance

The Need for QA:• Metadata is the 'glue' for integration of services• If the metadata quality is poor, services will not be

able to be interoperable• There is therefore a need for quality assurance

procedures to ensure fitness for purposeWhat Can Go Wrong?

• Things that can go wrong include:• Metadata is out-of-date or incorrect• Metadata is used inconsistently within service• Metadata is used inconsistently across services • Metadata is not modelled correctly• Metadata not compliant with storage standard• …

Qu

alit

y A

ssu

ran

ce

Page 25: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Think About The Implementation

It is important that when you deploy metadata systems you can manage and maintain the metadata. For example:

• Details of the person maintaining the data change (name change due to marriage, person leaves, …)

• Organisational details change (mergers, takeovers, …)

• Technology changes

Prepare for change! People change, organisations change, responsibilities change, technologies change, …Ensure that you can manage the metadata which reflects such changes

Prepare for change! People change, organisations change, responsibilities change, technologies change, …Ensure that you can manage the metadata which reflects such changes

Qu

alit

y A

ssu

ran

ce

Page 26: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Need For Cataloguing Rules

Your Cataloguing Rules• You will need cataloguing rules to support your

metadata creation• You will need to provide necessary training and

support (especially if you are dependent on cataloguing by non-professionals)

Interoperability• How will you interoperate with services which

deploy different cataloguing rules:04/07/03 – what date is this?LSC – what does this stand for?

• Humans use context; software products don't• There is a need to define the standards you're

applying (in a machine understandable way)

Met

adat

a M

anag

emen

t

Page 27: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Need For QA Procedures

So we have:• Tools for managing metadata• Cataloguing rules

But:• People make mistakes• Software may have bugs• Our rules may be ambiguous• The standards may be ambiguous• The metadata may be correct but confusing in

other contexts,• …

Although humans can adapt to errors and unambiguities, software typically can't. We therefore need quality assurance procedures to ensure that metadata applications will be interoperable.

Although humans can adapt to errors and unambiguities, software typically can't. We therefore need quality assurance procedures to ensure that metadata applications will be interoperable.

Qu

alit

y A

ssu

ran

ce

Page 28: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Approaches To QA

We may wish to consider:• Systematic checking at data creation• Systematic checking of output• Semi-automated checking (e.g. duplication,

common misspellings, out-of-range checks, …)• Automated checking • …

Worst Case Scenario:You service is fine, and quality metadata provided. Your data is integrated with others services to provide an international portal to quality resources. However the other service providers have poor quality metadata. The poor quality of the final service brings your contributor into disrepute.

Worst Case Scenario:You service is fine, and quality metadata provided. Your data is integrated with others services to provide an international portal to quality resources. However the other service providers have poor quality metadata. The poor quality of the final service brings your contributor into disrepute.

Qu

alit

y A

ssu

ran

ce

Page 29: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Pulling It Together

Page 30: A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk.

A centre of expertise in digital information management

www.ukoln.ac.uk

Conclusions

To conclude:• Metadata can provide richer searching and other services

within a service and the glue for integration across several services

• There are several key standards: Dublin Core, HTML, XML, …• You will need to select the standards appropriate to your

service requirements• You will need to choose the metadata according to your

service requirements• You will need to choose the architectural framework and

applications for managing your metadata according to your service requirements

• You will need to ensure that you have appropriate quality assurance mechanisms in place – otherwise the above work will have been wasted!

• It can be worth it!