DE Conferentie 2007 Jack van Ossenbruggen
-
Upload
digitaal-erfgoedconferentie -
Category
Documents
-
view
196 -
download
1
description
Transcript of DE Conferentie 2007 Jack van Ossenbruggen
Tumbling Walls Tumbling Walls & &
Building BridgesBuilding Bridges
Steps towards a Culture Web
2
Interoperability: tearing down the walls between collections
• Musea have increasingly nice websites
• But: most of them are driven by stand-alone collection databases
• Data is isolated, both syntactically and semantically
• If users can do cross-collection search, the individual collections become more valuable!
3
The Web: “open” documents and links
URL URLWeb link
4
The Semantic Web: “open” data and links
URL URLWeb link
Painter“Henri Matisse”
Getty ULAN
creator
Dublin Core
Painting“Green Stripe (Mme Matisse)”Royal Museum of Fine Arts, Copenhagen
5
6
Principle 1: semantic annotation
Description of web objects with “concepts”from a shared vocabulary
7
Principle 2: semantic search
• Search for objects which are linked via concepts (semantic link)
• Use the type of semantic link to provide meaningful presentation of the search results
Paris
Montmartre
PartOf
Query“Paris”
8
Principle 3: vocabulary alignment
“Tokugawa”
SVCN periodEdo
SVCN is local in-house ethnology thesaurus
AAT style/periodEdo (Japanese period)Tokugawa
AAT is Getty’s Art & Architecture Thesaurus
9
The myth of a unified vocabulary• In large virtual collections there are
always multiple vocabularies – In multiple languages
• Every vocabulary has its own perspective– You can’t just merge them
• But you can use vocabularies jointly by defining a limited set of links– “Vocabulary alignment”
• It is surprising what you can do with just a few links
10
11
12
Part of the Dutch national MultimediaN project
CWI, VU, UvA, DEN, ICNAlia Amin, Lora Aroyo
Mark van Assem, Victor de Boer Lynda Hardman
Michiel Hildebrand, Laura Hollink Marco de Niet, Borys Omelayenko
Marie-France van Orsouw Jacco van Ossenbruggen
Guus Schreiber, Jos Taekema Annemiek Teesing, Anna Tordai
Jan Wielemaker, Bob Wielinga
Artchive.comRijksmuseum Amsterdam
Dutch ethnology musea (Amsterdam, Leiden)
National Library (Bibliopolis)
http://e-culture.multimedian.nl
13
14
Extra slides
15
From metadata to semantic metadata
16
Example textual annotation
17
Resulting semantic annotation (rendered as HTML with RDFa)
18
Levels of interoperability
• Syntactic interoperability–using data formats that you can
share–XML family is the preferred option
• Semantic interoperability–How to share meaning / concepts–Technology for finding and
representing semantic links
19
Term disambiguation is key issue in semantic search• Post-query
–Sort search results based on different meanings of the search term
–Mimics Google-type search
• Pre-query–Ask user to disambiguate by
displaying list of possible meanings– Interface is more complex, but more
search functionality can be offered
20Semantic autocompletion
21
Faceted (pre query)Faceted search
22
23
24
25
skos
26
• v
27
Multi-lingual labels for concepts
28
Learning alignments
• Learning relations between art styles in AAT and artists in ULAN through NLP of art historic texts– “Who are Impressionist painters?”
29
Perspectives
• Basic Semantic Web technology is ready for deployment
• Web 2.0 facilities fit well:– Involving community experts in
annotation–Personalization, myArt
• Social barriers have to be overcome!– “open door” policy– Involvement of general public =>
issues of “quality”
30
Semantic interoperability• Large, smart web “mash ups”, combining:
– Data: images, metadata & encyclopaedic knowledge (gazetteers, thesauri, Wikipedia, …)
– Visualisations: maps, timelines, social networks, …
• Data too diverse for a traditional database approach– fixed schemas will not work– data includes relational data, XML text, images, video,
…
• Need to link different data sources together– focus on light weight, heuristic approaches– reusing as much as possible (web standards)
• Need new interfaces and search paradigms– need to find relations between pieces of information– need to organize (cluster/rank/filter) the many
relations we will find
31
Caveats for museum software • Be wary of Flash
–Accessibility
• Make sure you can connect others and other can connect to you–“Don’t buy software which does not
support standard open API’s”
• Export facilities to common formats (XML, …)
32
Semantic Web Myths *)
• Sem Web = Artificial Intelligence on the Web • Relies on centrally controlled ontologies for
“meaning”– as opposed to a democratic, bottom-up
control of terms
• One has to manually add metadata to all Web pages, relational databases, XML data, etc to use it
• It is just ugly XML • One has to learn formal logic, knowledge
representation, description logic, etc. • An academic project, of no interest for industry
*) Adapted from a slide by Frank van Harmelen, panel WWW2006