A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN...
-
Upload
kevin-carter -
Category
Documents
-
view
213 -
download
0
Transcript of A centre of expertise in digital information management The Role Of Metadata Brian Kelly UKOLN...
A centre of expertise in digital information management
www.ukoln.ac.uk
The Role Of MetadataBrian KellyUKOLNUniversity of BathBath, BA2 7AY
[email protected]://www.ukoln.ac.uk/web-focus/presentations
UKOLN is supported by:
A centre of expertise in digital information management
www.ukoln.ac.uk
Contents
• Introduction• Background To Metadata• Metadata Standards• Metadata Management• Metadata And Quality• Conclusions
The Brief"I know from conversations … I have had with customers, that metadata poses some really difficult questions …"
The talk addresses the questions:What is metadata and why is it important? What's this Dublin Core I've heard about (and why Dublin?) What benefits will I get if I use metadata? How should I do it? What will it cost me?
The Brief"I know from conversations … I have had with customers, that metadata poses some really difficult questions …"
The talk addresses the questions:What is metadata and why is it important? What's this Dublin Core I've heard about (and why Dublin?) What benefits will I get if I use metadata? How should I do it? What will it cost me?
Intr
od
uct
ion
A centre of expertise in digital information management
www.ukoln.ac.uk
About UKOLN / Web Focus
UKOLN:• A national centre of expertise in digital information
management (including metadata)• Based at University of Bath• Funded by JISC and Resource to support the
Higher & Further / cultural heritage sectorsUK Web Focus:
• Provides advice and support on Web issues, especially standards and best practices
• Provided by Brian Kelly• Funded by JISC from Nov 1996 - August 2003.
Now jointly funded by JISC & ResourceQA Focus:
• Developing QA methodology to support JISC digital library programmes
Intr
od
uct
ion
A centre of expertise in digital information management
www.ukoln.ac.uk
About You
How many are:• Librarians• Software / systems developers (techies)• Commercial vendors• Others
ExpertNovice Average
What is the extent of your knowledge of metadata?
RDFOAICLD…
MARCDublin Core…
???
Intr
od
uct
ion
A centre of expertise in digital information management
www.ukoln.ac.uk
What is Metadata?
"This metadata you've been talking about …. isn't it just catalogue records?"
Question at metadata seminar, 1998
Metadata can be regarded as:• Catalogue records for the Web• Data about data• Structured information suitable for automated
processing
Bac
kgro
un
d
Metadata Demystified
http://www.niso.org/standards/resources/Metadata_Demystified.pdf
In current practice, the term has come to mean structured information that feeds into automated processes, and this is currently the most useful way to think about metadata
In current practice, the term has come to mean structured information that feeds into automated processes, and this is currently the most useful way to think about metadata
A centre of expertise in digital information management
www.ukoln.ac.uk
The Problem
Back in mid-1990s:• Size of Web growing exponentially• Web being used for both scholarly and
non-scholarly (!) purposes• Need for better searching mechanisms• Search engines seemed promising, but
concerns over abuse (e.g. porn index spammers) and difficulties in finding quality information
• Various sectors came together to develop a core set of metadata attributes for resource discovery
Bac
kgro
un
d
A centre of expertise in digital information management
www.ukoln.ac.uk
Dublin Core
In mid-1990s:• Meeting held in Dublin, Ohio in 1995• Involvement from several sectors
(libraries, museums, science, IT, …)• Agreement reached on a core set of
metadata attributes for resource discovery• Given the name Dublin Core (DC)• DCMI organisation later formed• DC Working parties established to
coordination development of DC• Regular annual conferences held
Du
blin
Co
re
See <http://dublincore.org/>See <http://dublincore.org/>
A centre of expertise in digital information management
www.ukoln.ac.uk
Why So Complex?
Why is there a need for working groups, annual events, etc. for developing a standard for catalogue records?
• It's not just documents: an Author record is inappropriate for a painting, a piece of music, etc.
• It's not just for humans: the DC records will be processed by software, for which unambiguity in essential
• It needs to be integrated: with a rapidly-developing Web architecture
• It needs to be future-proofed : so we don't have to do it all again when a new technology emerges
Du
blin
Co
re
A centre of expertise in digital information management
www.ukoln.ac.uk
Using Dublin Core
Note that DCMI defined a core set of elements:Title A name given to the resource.Creator An entity primarily responsible for
making the content of the resource.
Publisher An entity responsible for making the resource available.
Date A date of an event in the lifecycle of the resource.
… …How this format could be represented was not defined initially
Du
blin
Co
re
A centre of expertise in digital information management
www.ukoln.ac.uk
Representing Dublin Core
Initially many people thought that DC would be embedded in HTML pages:<META NAME="DC.Creator" CONTENT="Brian Kelly">
but how are multiple author's represented:<META NAME="DC.Creator" CONTENT="Brian Kelly">
<META NAME="DC.Creator" CONTENT="John Smith">
or<META NAME="DC.Creator" CONTENT="Brian Kelly, John Smith">
It is not possible to describe the potential complexities of DC in the HTML language
Du
blin
Co
re
A centre of expertise in digital information management
www.ukoln.ac.uk
Dublin Core Is Too Simple!
Dublin Core was designed as a core set of metadata elements for resource discovery. However:
• The benefits of the standard became apparent and DC became used in many areas
• There was a need to be able to represent richer metadata content and relationship e.g. Multiple authors and contact details Alternative titles Use of controlled vocabularies from particular
schemesA mechanism known as Qualified Dublin Core was developed to address this.
Du
blin
Co
re
A centre of expertise in digital information management
www.ukoln.ac.uk
Use In HTML
Dublin Core potential was recognised and the W3C's release of HTML 4.0 included a mechanism for defining schemes in the <meta> element:
<meta name = "DC.Subject" content = "heart attack"><meta name = "DC.Subject" scheme = "MeSH" content = "Myocardial Infarction; Pericardial Effusion">
<meta name = "DC.Type" scheme = "DCMIType" content = "Dataset"><meta name = "DC.Type" scheme = "DCMIType" content = "Event">
See <http://dublincore.org/documents/2001/04/12/usageguide/qualified-html.shtml>
See <http://dublincore.org/documents/2001/04/12/usageguide/qualified-html.shtml>
Du
blin
Co
re
A centre of expertise in digital information management
www.ukoln.ac.uk
XML
XML (Extensible Markup Language):• Developed by W3C• A meta-language used to create other languages• Addresses HTML's lack of extensibility• A family of standards which form the foundations
for a richer and more interoperable Web: XML XML Namespaces XSLT XML Schemas …
• A proven success
Rather than slowly tweaking HTML to allow rich DC to be embedded, XML allows new metadata applications to be developed which can be integrated with existing Web services
Rather than slowly tweaking HTML to allow rich DC to be embedded, XML allows new metadata applications to be developed which can be integrated with existing Web services
W3C
Dev
elo
pm
ents
A centre of expertise in digital information management
www.ukoln.ac.uk
Beyond Use In HTML
In parallel to release of HTML 4.0 W3C working on:• A rich metadata framework which could be used
for any metadata application: Content filtering (this resource contains
nudity) Defining collections of related resources
(Web site maps) Digital signatures …
• Development of the Semantic Web - An ambitious attempt to allow data from distributed services to be integrated
RDF (Resource Description Framework) was developed as W3C's solution to both problems
RDF (Resource Description Framework) was developed as W3C's solution to both problems
W3C
Dev
elo
pm
ents
A centre of expertise in digital information management
www.ukoln.ac.uk
RDF
RDF:• An XML application
• Richer than conventional XML applications: a mathematical model which describes relationships is embedded in the RDF
• This richness comes with a price - increased complexity
RDF applications are being developed. However at present it may be advisable to leave RDF to the research community or well-funded pilot studies to prove its benefits before committing to use in a service environment(However note that metadata in PDF documents is stored as RDF)
RDF applications are being developed. However at present it may be advisable to leave RDF to the research community or well-funded pilot studies to prove its benefits before committing to use in a service environment(However note that metadata in PDF documents is stored as RDF)
W3C
Dev
elo
pm
ents
A centre of expertise in digital information management
www.ukoln.ac.uk
Beyond Resource Discovery
Metadata has a role to play beyond item-level resource discovery
Other metadata applications include:• Metadata for digitised objects: about the object
and about the digitisation process• Management / administrative metadata: review
this resource by xx; delete this resource on …; this resource is managed by the XYZ group; …
• Metadata about collections (physical and online)• …
Usi
ng
Met
adat
a
A centre of expertise in digital information management
www.ukoln.ac.uk
Metadata Modelling (1)
You want to use Dublin Core metadata. How do you choose how to model your metadata?
• Do you use simple Dublin Core (the basic 15 elements)?
• Do you use qualified Dublin Core to enable richer metadata to be described?
• If the latter, how do you decide which qualified DC metadata to use?
These are key issues to address.In some cases answers may be provided for you.In other cases, you musty answer these questions for yourself.
These are key issues to address.In some cases answers may be provided for you.In other cases, you musty answer these questions for yourself.
Usi
ng
Met
adat
a
A centre of expertise in digital information management
www.ukoln.ac.uk
Metadata Modelling (2)
Why do you wish to use metadata?• Because it fashionable?• Because you're a librarian and librarians 'do'
metadata?• Because you want you Web site to be no. 1 in
Google?• Because you are developing an application which
requires use of metadata?
Please remember:• Developing applications which make use of
metadata can be expensive.• Creating and managing metadata can be expensive• Search engines such as Google typically make little
or no use of metadata
Please remember:• Developing applications which make use of
metadata can be expensive.• Creating and managing metadata can be expensive• Search engines such as Google typically make little
or no use of metadata
Usi
ng
Met
adat
a
A centre of expertise in digital information management
www.ukoln.ac.uk
Metadata Modelling (3)
Exploit Interactive case study:• EU-funded ejournal• Requirement to provide
local searching better than simple free text searching:
• Search by title, author and keywords
• Search by funding stream• Search by issue and article type
• The end-user interface is illustrated
See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>
See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>
Usi
ng
Met
adat
a
A centre of expertise in digital information management
www.ukoln.ac.uk
Metadata Modelling (4)
How did we manage and model the metadata?
Article metadata
doc_title = "The XHTML Interview"author="Kelly, B."title="WebWatching National Node Sites"description = "In this issue's Web Technologies column we ask Brian Kelly to tell us more about XHTML."article_type = "regular"
Issue metadata
issue_num = "6"pub_date="25 Oct 2002"
Site metadata
name = "Exploit Interactive"publisher="UKOLN"
<meta name="DC.Title" content="The XHTML Interview"><meta name="DC.Creator" content="Kelly, B."><meta name="DC.Description" content="In this issue's Web Technologies …."><meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue6/"><meta name="DC.Type" content="text.article.regular" scheme="Exploit-categories">
Processed by server-side script
A centre of expertise in digital information management
www.ukoln.ac.uk
You may wish to:• Embed HTML metadata in HTML pages• Link to HTML metadata from HTML• Embed RDF• Store metadata in application
(home-grown scripts, CMS, metadata repository, image management system, …)
Storing DC Metadata
It is up to you how you store your metadata. Your choice will be affected by the use which will be made of your metadata and how it will be created and managed.
You may wish to store your metadata in a database and make it available according to its use.
HTML
RDF
Author Book Pub. Date
G.Orwell 1984 1948
I. Rankin Question Of Blood
2003
Met
adat
a M
anag
emen
t
Metadata management tool
A centre of expertise in digital information management
www.ukoln.ac.uk
A Simple DC Management Tool
DC-dot:• Simple Web-based
DC creation and management tool
• Output in range of formats (HTML, XHTML, RDF, …)
• Provides validation• Useful for small-scale
metadata creationBut:
• Not ideal for large-scale usage• Doesn't provide rich
management capabilities
http://www.ukoln.ac.uk/metadata/dcdot/
Met
adat
a M
anag
emen
t
A centre of expertise in digital information management
www.ukoln.ac.uk
Management Tools
Many types of metadata tools:• Type the metadata by hand• Use File -> Properties menu in MS Office
applications and export data• Home-grown database systems• Home-grown scripting solutions• Use of commercial systems:
• Library management systems• Image management systems• …
There is no single ideal solution.The solution you choose should reflect your needs, expertise, organisational culture, …
There is no single ideal solution.The solution you choose should reflect your needs, expertise, organisational culture, …
Met
adat
a M
anag
emen
t
A centre of expertise in digital information management
www.ukoln.ac.uk
Quality Assurance
The Need for QA:• Metadata is the 'glue' for integration of services• If the metadata quality is poor, services will not be
able to be interoperable• There is therefore a need for quality assurance
procedures to ensure fitness for purposeWhat Can Go Wrong?
• Things that can go wrong include:• Metadata is out-of-date or incorrect• Metadata is used inconsistently within service• Metadata is used inconsistently across services • Metadata is not modelled correctly• Metadata not compliant with storage standard• …
Qu
alit
y A
ssu
ran
ce
A centre of expertise in digital information management
www.ukoln.ac.uk
Think About The Implementation
It is important that when you deploy metadata systems you can manage and maintain the metadata. For example:
• Details of the person maintaining the data change (name change due to marriage, person leaves, …)
• Organisational details change (mergers, takeovers, …)
• Technology changes
Prepare for change! People change, organisations change, responsibilities change, technologies change, …Ensure that you can manage the metadata which reflects such changes
Prepare for change! People change, organisations change, responsibilities change, technologies change, …Ensure that you can manage the metadata which reflects such changes
Qu
alit
y A
ssu
ran
ce
A centre of expertise in digital information management
www.ukoln.ac.uk
Need For Cataloguing Rules
Your Cataloguing Rules• You will need cataloguing rules to support your
metadata creation• You will need to provide necessary training and
support (especially if you are dependent on cataloguing by non-professionals)
Interoperability• How will you interoperate with services which
deploy different cataloguing rules:04/07/03 – what date is this?LSC – what does this stand for?
• Humans use context; software products don't• There is a need to define the standards you're
applying (in a machine understandable way)
Met
adat
a M
anag
emen
t
A centre of expertise in digital information management
www.ukoln.ac.uk
Need For QA Procedures
So we have:• Tools for managing metadata• Cataloguing rules
But:• People make mistakes• Software may have bugs• Our rules may be ambiguous• The standards may be ambiguous• The metadata may be correct but confusing in
other contexts,• …
Although humans can adapt to errors and unambiguities, software typically can't. We therefore need quality assurance procedures to ensure that metadata applications will be interoperable.
Although humans can adapt to errors and unambiguities, software typically can't. We therefore need quality assurance procedures to ensure that metadata applications will be interoperable.
Qu
alit
y A
ssu
ran
ce
A centre of expertise in digital information management
www.ukoln.ac.uk
Approaches To QA
We may wish to consider:• Systematic checking at data creation• Systematic checking of output• Semi-automated checking (e.g. duplication,
common misspellings, out-of-range checks, …)• Automated checking • …
Worst Case Scenario:You service is fine, and quality metadata provided. Your data is integrated with others services to provide an international portal to quality resources. However the other service providers have poor quality metadata. The poor quality of the final service brings your contributor into disrepute.
Worst Case Scenario:You service is fine, and quality metadata provided. Your data is integrated with others services to provide an international portal to quality resources. However the other service providers have poor quality metadata. The poor quality of the final service brings your contributor into disrepute.
Qu
alit
y A
ssu
ran
ce
A centre of expertise in digital information management
www.ukoln.ac.uk
Pulling It Together
A centre of expertise in digital information management
www.ukoln.ac.uk
Conclusions
To conclude:• Metadata can provide richer searching and other services
within a service and the glue for integration across several services
• There are several key standards: Dublin Core, HTML, XML, …• You will need to select the standards appropriate to your
service requirements• You will need to choose the metadata according to your
service requirements• You will need to choose the architectural framework and
applications for managing your metadata according to your service requirements
• You will need to ensure that you have appropriate quality assurance mechanisms in place – otherwise the above work will have been wasted!
• It can be worth it!