Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century,...

29
Knowledge, Information, Data & Metadata Management (KIDMM) METADATA solution or distraction?

Transcript of Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century,...

Page 1: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

METADATAsolution or distraction?

Page 2: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Conrad TaylorChair of BCS Electronic Publishing Specialist Group

My background is in design & production for print

…and then various electronic media

Page 3: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

My briefing paper:'Metadata's many meanings and uses’

www.ideography.co.uk/briefings/

conrad
Sticky Note
I pointed out that this briefing paper covers the topic of metadata in much greater depth than I could cover in a short talk, and therefore I recommended that people download this from the URL given.
Page 4: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Briefly introducing KIDMMA ‘knowledge communities’ project within the BCS◆◆

Involvement from a number of BCS Specialist Groups & Forums◆◆

Artificial Intelligence SG – Business Information Systems SG – Business/IT Interface Specialist Group –

Computer Arts Society – Data Management SG – Electronic Publishing SG – Financial Services SG –

Geospatial SG – Human-Computer Interface – Information Retrieval SG – Open Source SG –

Primary Health Care SG – Project Management SG – Sociotechnical SG – BCS Women –

Engineering & Technology Forum – Health Informatics Forum

Plus people from non-BCS organisations◆◆

XML UK – Data Management Association UK – English Heritage – Victoria & Albert Museum –

International Society for Knowledge Organization UK – Universal Decimal Classification Advisory Board –

World Health Organization – International Association for Information & Data Quality UK –

BSI standards committees IST/40 and IST/41…

Page 5: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Briefly introducing KIDMMAn email discussion list◆◆

A web site at ◆◆ www.epsg.org.uk/KIDMM

Conference in preparation: 17 September 2007◆◆

Exhibition in preparation, ◆◆

to raise consciousness about a range of issues in informatics

10 panels proposed◆»

able to travel, for multiple use◆»

content devised through collaboration with ◆»Specialist Groups and others

perhaps an online ‘exhibition catalogue' too!◆»

Page 6: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Helping people to find contentThis is not new – authors, editors and publishers have been tackling these issues for ages…

Tables of contents: ◆◆

Pliny the Elder – Natural History Valerius Soranus

Chapter-and-verse divisions to assist reference: ◆◆

Archbishop Stephen Langton, 1150–1228 Robert Estienne, Vulgate Bible of 1551

Indexes: ◆◆

13th century, in the context of French university debating

Page 7: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Library cataloguesNote : the record is a convenient surrogate for the real information objects

Sumerian archives of clay tablets 4,000 years ago : ◆◆

catalogues have been found in some of them

Great Library of Alexandria, founded by Ptolemy II : ◆◆

catalogue in 120 volumes by Callimachus (305–240 BC)

Chinese state library, Han Dynasty – approx. 2000 years ago : ◆◆

provided with a classification scheme, catalogued on silk scrolls

Onwards to Anthony Panizzi (BL cataloguing rules); ◆◆

Melvil Dewey (1876 – classification, card catalogues); to MARC and OPACs

Page 8: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

So, just what is ‘Metadata’?It means different things to different communities of practice.◆◆

In Database Management, since the 70s, it has meant ◆◆

descriptions of what each field is supposed to contain, and the relationships between fields.

e.g., an important consideration when databases have to be merged ◆»(Edit – Transform – Load operations)

In the mid-1990s, the term was hijacked by ◆◆

Librarians and Information Scientists

a defining moment: ◆»the March 1995 OCLC/NCSA Metadata Workshop — Dublin, Ohio

conrad
Sticky Note
A joint event of the Online Computer Library Centre in Doblin, OH, and the National Centre for Supercomputer Applications, at the University of Illinois at Urbana-Champain. NCSA had just recently developed the Mosaic Web browser.
Page 9: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

So, just what is ‘Metadata’? (2)Lorcan Dempsey, 1994:◆◆

Metadata is information about resources, and is of various types,

and levels of fullness. In this article it is used inclusively to refer to names,

locations and descriptive data which facilitate access or selection.

In some cases, the metadata may be no more than a file name and

location; in others, in library systems, for example, structured descriptive

data may be manually created. Resources are the actual information

objects of interest…

Network Resource Discovery: a European Library Perspective.

Page 10: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Dublin Core Metadata InitiativeLaunched by the March 1995 OCLC/NCSA conference◆◆

Belief that free-text indexation of online resources is inadequate◆»

Realisation that one cannot realistically compile MARC data for ◆»Web pages and electronic documents

and that Web resources are far more diverse than MARC ◆»was ever designed to cope with

DCMI aims to define a core set of metadata elements◆◆

to use for cataloguing DLOs (‘Document-Like Objects’)◆»

‘[A] reasonable alternative way to obtain usable metadata for electronic

resources is to give authors and information providers a means to describe

the resources themselves, without having to undergo the extensive

training required to create records conforming to established standards…’

Page 11: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Dublin Core Metadata Element Set

DC Element Definition

Title A name given to the resource.

Creator An entity primarily responsible for making the content of the resource.

Subject A topic of the content of the resource.

Description An account of the content of the resource.

Publisher An entity responsible for making the resource available.

Contributor An entity responsible for making contributions to the content of the resource.

Date A date of an event in the lifecycle of the resource.

Type The nature or genre of the content of the resource.

Format The physical or digital manifestation of the resource.

Identifier An unambiguous reference to the resource within a given context.

Source A Reference to a resource from which the present resource is derived.

Language A language of the intellectual content of the resource.

Relation A reference to a related resource.

Coverage The extent or scope of the content of the resource.

Rights Information about rights held in and over the resource.

Page 12: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Metadata: a media catalogue exampleGood exemplar of many problems that occur in metadata applications

The object to be retrieved has no text in it◆◆

This feature is shared with museum and gallery cataloguing applications◆»

However, the objects in question are increasingly digital◆◆

The catalogue has traditionally been built in a database, ◆»e.g. FileMaker Pro for small-scale picture libraries

(‘Standalone metadata’ – actually, it’s ◆» data in a database)

But also: certain image file formats can carry ◆» embedded metadata

Plenty of issues to do with classification, authority &c◆◆

Page 13: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
Digital camera images imported into an iView MediaPro image database. One elementary shred of metadata associated with each image is its filename; but by default, the filenames are meaningless unless the image is re-named.
Page 14: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
However, the file carries with it a number of descriptions such as file size, file type, camera make and model, and the date and time when the photo was taken (assuming that the camera's clock was set correctly! (This is a photo of Sir Tim Berners-Lee speaking at the BCS Lovelace Lecture, 2007)
Page 15: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
Furthermore, because digital cameras conform to a Japanese industrial standard called Exif, many camera settings are also recorded, such as focal length of lens, shutter speed, aperture, flash setting and so on.
Page 16: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
This embedded metadata can be retrieved by a variety of programs, and sometimes provides an opportunity for a bit of fun. When a young lady in China sent me this photo of herself via Skype, I caused some consternation by asking her what she thought of the Canon Ixus camera…(See next slide)
Page 17: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
Here is the image metadata in her JPEG file, retrieved through the "File Info" command in Adobe Photoshop.
Page 18: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
Once the file has been imported into an image database, or an asset/workflow manager such as Adobe Bridge, other kinds of annotation can be added by the photographer or archivist.Some of the information is obvious, if it has been correctly recorded – for example location, event name, or copyright holder.Other fields are more contentious: for example, I am invited to assign the image to a Category. What does this mean? What are the categories available for me to choose between?
Page 19: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
Keywords and Categories are particularly problematic. Ought we be able to choose our own tags? For a single photographer-archivist, this may be tolerable and even useful; but if someone else is to search the image database, how do we know what keywords and categories will occur to them?(Also click the Comments balloon next to "Subject Codes")
conrad
Sticky Note
It can definitely help if you have access to a controlled vocabulary of Subject Codes for your community of practice. For example, the NewsCodes that are used by the newspaper industry.Where do these codes come from, which I have used in this working example? See next slide.
Page 20: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

conrad
Sticky Note
This is an extract from the BCS Taxonomy, commissioned by the BCS Knowledge Services Board to assist classification of publications and articles produced by the Society. The numeric coding, however, is my own addition. I chose three of the codes to apply to the photo of Berners-Lee's lecture. Commissioning the taxonomy was a step in the right direction for the Society, but it does have some deficiencies (including internal inconsistencies), and seems not to be actively revised, maintained and updated. BCS Web content is being tagged within the Content Management System using this taxonomy, though we have heard that BCS staff are not confident in its application. (These are typical problems with taxonomies in use – not unique to the BCS.)
Page 21: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Issues, issues…How is the metadata ‘tied to’ the information object?◆◆

(and what, by the way, is the object to which metadata is attached?)◆»

‘Problematic labels’ – such as keywords, category tags◆◆

issues of how to classify things◆»

matching the vocabulary of the searcher to that of the classifier◆»

How does the metadata-attaching process actually happen?◆◆

who does it, and are they competent?◆»

can it be automated?◆»

Page 22: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

The metadata-to-object relationshipThe catalogue approach – keep it in a database◆◆

the only way to go with non-digital assets◆»

standardising queries – for example z39·50◆»

from records to tags – embedding metadata makes a lot of sense!◆◆

immediately raises the need for standardised embedding technologies◆»

RDF: Resource Description Framework

XMP: Extensible Metadata Protocol

But database solutions have important merits◆◆

as a repository for community-contributed metadata◆»

metadata adjuncts to Content Management Systems◆»

Page 23: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

(A lightning introduction to RDF & XMP) ◆◆ Resource = anything that can be addressed (via a URI)

◆◆ Property = resource that can be a property of other resources

examples: Creator – CreationDate – LiteraryGenre …◆»

◆◆ Value = the ‘content‘, if you like, of a Property

◆◆ RDF Statement — combines Resource, Property and Value

{ Hamlet } { has a CreationDate } { value = 1602 }

{ Hamlet } { belongs to a LiteraryGenre } { value = play }

{ Hamlet } { has a Creator } { value = William Shakespeare }

XMP, basically a way of embedding RDF statements in documents◆◆

Page 24: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Those problematic labels…Freeform keyword assignment ◆◆ versus controlled vocabularies

Government Category List (◆» GCL) – Local Government Category List (LGCL)

Systematized Nomenclature of Medicine – Clinical Terms (◆» SNOMED–CT)

International Press Telecommunications Council – ◆» IPTC NewsCodes

Code TopicType English Spanish German

01000000 Subject arts, culture and entertainment

arte, cultura y espectáculos

Kultur, Kunst, Unterhaltung

01001000 SubjectMatter archaeology arqueología Archäologie

01003000 SubjectMatter bullfighting toros Stierkampf

01007001 SubjectDetail jewellery joyas Schmuck

Page 25: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Those problematic labels…Freeform keyword assignment ◆◆ versus controlled vocabularies

Classification issues◆◆

Listen to the KIDMM panel discussion on this topic, ◆»with Leonard Will, John Lindsay and Nic Holt

http://www.epsg.org.uk/meetings/classification2006

Taxonomies, for example the BCS Subject Taxonomy ◆»commisioned by the Knowledge Services Board and applied to the BCS Web site

Ontologies, Description Logics, OWL and the Semantic Web: ◆»read Prof. Ian Horrocks’ 2005 Needham Lecture:

http://www.epsg.org.uk/pub/needham2005

Page 26: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Ontologies and the Semantic Web – a British Computer Society lecture by Professor Ian Horrocks

If you put all these letters together, you get

SHOIN

, which is therefore the name by which the logic that underpins W3C’s Web Ontology Language is known.

Ian next presented a slide of ‘Class/Concept Contructors’, bristling with logic notation symbols (see Fig. 4 above). “You can’t come to a talk by a description logic person without seeing one of these slides,” he said. “These are the constructors that are available… the main interesting feature is that I can describe the whole language in this one slide.” The system uses just eight constructors; it can be described very succinctly; it’s very elegant; and it is compositional, so you can build complicated descriptions from these basics.

Figure 4 – Class and Concept Constructors in SHOIN description logic

(Excerpt from the PDF of Ian Horrocks’ 2005 Needham Lecture)

conrad
Sticky Note
One page from the report of the Needham Lecture 2005… Within the BCS, valuable things are being written and said about knowledge and information management. It will be important to archive these, convert them to useful form, and make them available for study and discussion. This account was a collaboration between Conrad Taylor and Ian Horrocks. Various BCS Specialist Groups – and KIDMM – are keen to amass more documentation and recordings for study…
Page 27: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Some interesting casesBritish Medical Journal◆◆

Here, the issue is granularity of content, ◆»to deliver customised information products

Achieved within a Content Management System◆»

Texts are highly structured, using XML editing tools◆»

Elements and their attributes are applied by knowledgeable experts◆»

The account of Phil Caisley's talk will be available later in 2007

Page 28: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Some interesting casesImprovement and Development Agency◆◆

Establishing Knowledge Communities to support best practice, ◆»and to avoid ‘re-inventing wheels’

Impressive online community system for ◆»local government officers and elected councillors

CMS supports about 70 themed communities ◆»with discussion boards, document repositories, wikis, blogs…

Anyone can tag anything!◆»

For now, the approach is ‘◆» folksonomic’

Page 29: Metadata: solution or distraction? · Robert Estienne, Vulgate Bible of 1551 Indexes: 13th century, in the context of French university debating. Knowledge, Information, Data & Metadata

Knowledge, Information, Data & Metadata Management (KIDMM)

Some interesting casesInternational Society for Knowledge Organization (UK)◆◆

Recently established, by people in the ◆»Librarianship /Information Science community…

Many of whom are also KIDMM list-members◆»

A promising variety of interesting discussions◆»

Currently, for example, discussion of how to combine ◆»folksonomy and expert tagging and the role of faceted classification