2009.09.29 chris poppe - metadata
-
Upload
chris-poppe -
Category
Technology
-
view
601 -
download
1
description
Transcript of 2009.09.29 chris poppe - metadata
ELIS – Multimedia Lab
Metadata - Aanknopingspunten, Prioriteiten, Toekomsperspectieven
en Aantekeningen vanuit de Marge
Chris PoppeMultimedia Lab
Department of Electronics and Information SystemsFaculty of Engineering
Ghent University
2/39
ELIS – Multimedia Lab
Multimedia Lab
• Multimedia Lab– Research group of Ghent University (Faculty of
Engineering)– Multimedia
• Video!– Coding,– Processing– Transmission– Analysis– Adaptation– Annotation– …
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
3/39
ELIS – Multimedia Lab
Outline
• What is metadata?• Metadata vs. Tags?
– Benefits/disadvantages?• What is a metadata standard?
– Why is it needed?– How does it look like?– What are the problems?
• What is the semantic web?– Web 2.0?– Web 3.0?– Semantic Web Technologies?
• Conclusions
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
4/39
ELIS – Multimedia Lab
Metadata
• Data describing data• Museum for the history of sciences
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
5/39
ELIS – Multimedia Lab
Metadata
• Data describing data• Digital content
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…
6/39
ELIS – Multimedia Lab
Use of Metadata
• Understanding of multimedia content• Sharing• Management• Retrieval
– Search– browse
• Processing
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
7/39
ELIS – Multimedia Lab
Metadata: tags
• Tag– Free text annotation– Keywords, terms, comments– Informally– Personally– Started as taxonomies or vocabularies used to describe
content– Evolved into folksonomies
• User-driven
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
8/39
ELIS – Multimedia Lab
Taxonomies
• Top down• Pre-defined structure• Hierarchy• Controlled vocabularies• Expert
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
9/39
ELIS – Multimedia Lab
Taxonomies
• Example– Dewey Decimal Classification– Library classification
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
10/39
ELIS – Multimedia Lab
Folksonomy
• Folk + taxonomy– Free form text annotation– No predefined structure– No hierarchy– Users add metadata– Flat name space– Bottom up
• Two types:– Broad– Narrow
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
11/39
ELIS – Multimedia Lab
Broad Folksonomy
• Tagging shared content• Anyone can participate• Examples
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
12/39
ELIS – Multimedia Lab
Narrow Folksonomy
• Tagging your own content• Tagging friend’s content
– No consolidation– No emerging vocabularies
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
13/39
ELIS – Multimedia Lab
Tagging usage
• Navigation– Tag clouds– Organization– Hints
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
14/39
ELIS – Multimedia Lab
Tagging howto?
• Totally free• Semi-structured• Hinted
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
15/39
ELIS – Multimedia Lab
Tagging problems
• Cultural differences: Genghis Kahn, for some a hero, for others a criminal
• Communities of users can give different meaning to tags: Movie vs. Film vs. Cinema
• Language issues• Ambiguity• Misspelled tags (40% Flickr, 28% del.icio.us)• Semantics of tags
– Factual tags: what is it about, what it is: ‘image’, ‘article’, ‘blog’,…
– Subjective tags: user’s opinion: ‘funny’, ‘hot’, ‘stupid’,…– Personal tags: self reference: ‘toread’, ‘mycomments’,
…– Tag: “nothing to do with Brussels”
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
16/39
ELIS – Multimedia Lab
Metadata
• Data describing data• Digital content
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…
17/39
ELIS – Multimedia Lab
MP2 JPEG MPEG-2 MXF JPEG2000 AVI AAC H.264/MPEG-4 AVC PNG
Motion JPEG2000 TIFF MP4 MPEG WAV FLAC VC-1 Ogg Vorbis DivX AIFF GIF JPEG-LS Matroska OGM/OGG Windows Media Audio DIRAC 3GP DV FLV Betacam Realmedia MOV AC-3/Dolby Digital Theora ASF TTA
• Compression and container formats
• Standards for multimedia– Standards for metadata?
Multimedia
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Video compressionAudio compressionImage compression Physical Containers
18/39
ELIS – Multimedia Lab
• Standard which determines the structure of metadata
Metadata Standard
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
ResolutionDpi
Date/Time createdCreator
Camera usedFile format (JPG, BMP, GIF, PNG, …)
Location shot (GPS)Copyright
TitleGenreRating
CommentKeywords
Depicted event…
<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”> <namePart>Claus, Hugo</namePart> <namePart type=“date”>1929-</namePart> <role> <text>creator</text> </role></name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>
MODSMetadata Object Description Schema
19/39
ELIS – Multimedia Lab
XML
• XML (Extensible Markup Language)– Standardized by W3C (World Wide Web Consortium)– Language to define the structure of a document
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->
<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>
•XML element•Attribute•values
20/39
ELIS – Multimedia Lab
XML Schema
• XML Schema– Uses XML to denote the structure of a document
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->
<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>
•XML schema•Elements:
•Boekenlijst•Boek•Titel•Auteur
•Order•Types (of values)
Determines
21/39
ELIS – Multimedia Lab
Metadata Standard
• Describe structure of metadata using XML schema
• Textual specification, explains semantics of the elements– titleInfo : “A word, phrase, character, or group of
characters, normally appearing in a resource, that names it or the work contained in it. “
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>
MODS XML schema
Determines
22/39
ELIS – Multimedia Lab
• Shared information uses common structure• Standard software can be used to parse information
Use of Metadata Standards
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>
MODS document
DB
<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>
MODS document<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>
MODS document
DBSpeak same language
23/39
ELIS – Multimedia Lab
Metadata Standards
• Different Metadata Standards exist!– Different domains– Different communities– Different formats– Different focus
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
24/39
ELIS – Multimedia Lab
Detection and Representation of Moving Objects for Video Surveillance Chris Poppe
Ghent, Belgium – June 9 2009
Problem Metadata Standards
• Different Metadata standards can describe same thing• But in different way!!!
<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>
Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”
metadata example 1
CVML (Computer Vision Markup Language)
<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>
BB: “Coordinates of a rectangular segment.”
metadata example 2
VS7 (Video Surveillance Schema)
25/39
ELIS – Multimedia Lab
Problems Metadata Standard
• Current metadata standards define structure of metadata• Mappings are needed to use different standards within one
system• Metadata standard does not solve everything!
– For instance: DC creator property• Creator=“Shakespeare, William”• Creator=“William Shakespeare”• Creator=“Shakespeare”• Creator=“W. Shakespare”
– Same problems as tagging can occur• Lack of ways to describe semantics of metadata
– Currently plain text– Not machine readable
• Multimedia content shifts to online repositoriesMetadata
Chris PoppeLevend Geheugen, Brussels, Belgium – September 29 2009
26/39
ELIS – Multimedia Lab
Semantic Web ?.0
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
27/39
ELIS – Multimedia Lab
The Syntactic Web
• Consider a typical web page:
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
• Mark-up consists of: – rendering
information (e.g., font size and colour)
– Hyper-links to related content
• Semantic content is accessible to humans but not (easily) to computers…
28/39
ELIS – Multimedia Lab
Impossible (?) using the Syntactic Web…
• Complex queries involving background knowledge– Give me the telephone number of the responsible
person within Multimedia Lab of the demo about metadata applications
• Locating information in data repositories– Travel enquiries– Prices of goods and services– Results of human genome experiments
• Finding and using “web services”– Visualize surface interactions between two proteins
• Delegating complex tasks to web “agents”– Book me a holiday next weekend somewhere warm, not
too far away, and where they speak French or English
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
29/39
ELIS – Multimedia Lab
Semantic Web Technologies
• Technologies developed by the World Wide Web Consortium (W3C)
• Vision: the Web as universal medium for data, information and knowledge exchange
• HTML, XML -> RDF, RDFS, OWL, …• Technologies to interconnect, exchange information
– Applicable for metadata also!
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
30/39
ELIS – Multimedia Lab
Why is XML not enough
• http://www.w3.org/DesignIssues/RDF-XML.html (Tim Berners-lee)
• Try to express “The author of the note is Tim” in XML
• For a person, the three representations means the same, but NOT for a machine!– XML contains structures only, no semantics
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
<author> <uri>note</uri> <name>Tim</name> </author>
<author> <uri>note</uri> <name>Tim</name> </author>
<document href="note"> <author>Tim</author> </document>
<document href="note"> <author>Tim</author> </document>
<document uri="note" author="Tim" /><document uri="note" author="Tim" />
31/39
ELIS – Multimedia Lab
RDF
• RDF (Resource Description Framework)• Triples: subject – predicate – object• URI to identify resources• “The author of the note is Tim”
• Serialization in XML:• <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<Note rdf:about=http://www.example.org/#note> <hasAuthor rdf:resource="http://www.example.org/#Tim”/> </Note> </rdf:RDF>
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Note TimhasAuthor
32/39
ELIS – Multimedia Lab
RDFS
• RDF Schema• Standardized vocabulary for describing concepts• Introduces classes and instances
• Subclasses, sub properties– Possible to define hierarchies!
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Note1
TimhasAuthor
ClassNote
ClassPerso
n
type type
33/39
ELIS – Multimedia Lab
OWL
• Web Ontology Language, W3C recommendation (2004)• Provides richer vocabulary• Define advanced relations
– Data typing– Cardinalities– Rich typing of properties– …
• Example:
• Allows for intelligent reasoning• Complex ontologies can be created
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Note1
TimhasAuthor
ClassNote
ClassPerso
n
type type
isAuthorFrom
<owl:ObjectProperty rdf:ID=“isAuthorFrom”> <owl:inverseOf rdf:resource=“#hasAuthor”></owl:ObjectProperty>
34/39
ELIS – Multimedia Lab
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Ontology
• Information in a domain is structured using an ontology• a data model that represents a set of concepts and relations
amongst these concepts within a specific domain
• Thesaurus– Dictionary
• Synonyms
• Taxonomy– Hierarchy
• Subclass and siblings
• Ontology– concepts– relations
35/39
ELIS – Multimedia Lab
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
Ontology (using OWL)
• Example: ontology for domain of science
subClassOf
birth date
DatatypeProperty
PersonClass: Person
Class: ScientistScientist
Individualbirth date
“14/10/1801”
OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …
“Joseph Plateau”
36/39
ELIS – Multimedia Lab
Semantic Web Technologies
• SPARQL Protocol And RDF Query Language (SPARQL)– SQL-like language for RDF– Example: search for all the notes of Tim
• SELECT ?x WHERE ?x hasAuthor Tim
• Rule Interchange Language (RIF)– Example rule: if Tim is the author of the note, he is also
a contributor– goal is to create an interchange format for different rule
languages and inference engines – closely related to ontologies
• rules combine information and derive new information on top of ontologies
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
37/39
ELIS – Multimedia Lab
Semantic Web Technologies
• Data on the web can be linked to each other– Example: ontology on Brussels
• DBpedia.org
– Browsing:• Brussels ->cityofbirth -> Raymon_Goethals ->
managerclubs -> RSC Anderlecht …
– Querying: find all people born in Brussels before 1930– Reasoning: if a person was born in Brussels, he was
also born in Belgium
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
About Brussels.mht
38/39
ELIS – Multimedia Lab
Semantic Web Technologies
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
39/39
ELIS – Multimedia Lab
Semantic Web ?.0
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
40/39
ELIS – Multimedia Lab
Conclusions
• Use metadata standards!– Allows interchange– Structures the metadata
• When no standard is sufficient– Apply proprietary format– Structures the metadata
• If tagging is needed for search/browsing/retrieval– Provide fixed structure
• E.g., who, what, where, when, …
– Provide fixed vocabulary• Thesaurus• Hierarchy• Ontology for advanced reasoning
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009
41/39
ELIS – Multimedia Lab
Questions?
MetadataChris Poppe
Levend Geheugen, Brussels, Belgium – September 29 2009