Metadata – The What and Why
Alexander König The Language Archive, MPI for Psycholinguistics
CLARIN NL: CMDI Tutorial 13th September 2012
www.clarin.eu The Shape of Metadata
Metadata is “transcendental” Data about data: ‘who, what, where and when’ of a document. Structured data about data In the computer age: machine readable data about data Metadata is data describing a (set of) (digital) resource(s)
www.clarin.eu Why Metadata?
(Re) Finding resources: l Using free text “key words” (Google-
like search) l Special search engines that exploit
the structure of metadata
www.clarin.eu Why Metadata?
l Those who don’t take care over their metadata are doorless.
l Content is expensive to create.
l Just what good is your content if the people who need to read it can’t find it?
www.clarin.eu The Shape of Metadata
Metadata doesn't need to be in any particular form per se, but... ...if you want computers to understand it and be more useful to humans it should be structured
=> In a standardized fashion using a
metadata model
www.clarin.eu The Shape of Metadata
There are a lot of different
Metadata Models l Dublin Core (DC) l OLAC l IMDI l CMDI l ...
www.clarin.eu The Shape of Metadata
Nowadays metadata is more widely used than you might think You probably already deal with metadata on a daily basis without even thinking about it
The Shape of Metadata
Some common metadata models in the linguistic domain are l Dublin Core (DC) / OLAC l IMDI l CMDI
www.clarin.eu
www.clarin.eu The Shape of Metadata
Dublin Core (DC) Metadata Set
Content Intellectual Property
Instance
Title Creator Date
Subject Publisher Type
Description Contributor Format
Language Rights Identifier
Relation
Coverage
Source
www.clarin.eu The Shape of Metadata
DC example [Content] DC.Title = “American Gods” DC.Language = “English” [IP] DC.Creator = “Neil Gaiman” [Instance] DC.Format = “Hardcover” DC.Date = “2001-06-19” DC.Identifier = “0-380-97365-0”
www.clarin.eu The Shape of Metadata
IMDI is a more complex model designed with multimodal resources in mind
www.clarin.eu The Shape of Metadata
l IMDI contains strcutural information (corpus files for building a tree) l Resources are attached to the metadata (session file / bundle)
Top Related