Europeana Semantic Data in
Action (a Pilot Service based on OWLIM)
http://europeana.ontotext.com
Mariana Damova (PhD)
(with contribution to the work by Antoine Isaac, Valentine Charles,
Zdravko Tashev, Svetoslav Petrov)
Europeana AGM
November 2012
September 2012
Europeana Data Standards
• Unified metadata • ESE – Europeana Semantic Elements
• DublinCore & Europeana fields• 36 fields: flat, limited ability semantic links
dc:title europeana:provider dc:creator europeana:dataProvider dc:subject europeana:rights dc:description europeana:typedc:publisher europeana:isShownBy and/or europeana:isShownAt … …
3
• EDM - Europeana Data Model
Basic data model Two contextual classes
Europeana Data in EDM
• 268GB of data in RDF
• 20M+ cultural objects data and linkages to other datasets, mainly DBpedia
4
datasets, mainly DBpedia• EDM model• SKOS
Semantic Technologies – Main Features
• Semantic technologies (RDF, LOD) allow for an unprecedented ease of
integration of heterogeneous data sources
– Already adopted in pharmaceuticals and publishing industries
BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic
Publishing” architecture, the BBC team observed considerable reduction of
complexity of database design, query specification, application
development, and query evaluation time. BBC World Cup 2010 dynamic
semantic publishing. Jem Rayfield, Senior Technical Architect BBC News
and Knowledge.
http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna
mic_sem.html
Linking Open Data
• Linking Open Data (LOD) W3C SWEO Community project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
• Initiative for publishing “linked data” – a set of principles,
which allows browsing of RDF data, spread across different
servers, in the way HTML is browsed
Semantic Technologies and Cultural Heritage
combining facts and knowledge from different datasets need for
convincing real life use cases demonstrating the benefits of these
technologies
The cultural heritage domain can become a useful usecase for the
application of semantic technologies.application of semantic technologies.
MacManus, the Founder and Editor-in-Chief of ReadWriteWeb
defined an exemplary test for the Semantic Web
cities around the world which have Modigliani art works
FactForge of Ontotext solves the Modigliani query
by combining knowledge from 6 datasets from the Linked Open Data Cloud
http://factforge.net
OWLIM - a scalable, robust and efficient triple store
– Serving the two most important web-sites for the London Olympic Games• Official Olympics website
• BBC Olympics website
– Performance highlights• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17
min. for 100M)
• Best query performance among those repositories that can handle update and multi-client query
tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100
queries/sec)
• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario
• OWLIM v5 requires between 25% and 70% less storage space
• OWL 2 RL-type languages have proven to be the only feasible approach for
reasoning with billion statements
Reason-able View with Europeana data in EDM
• 268GB of data• cultural objects data and linkages to other datasets
Loaded into OWLIM with inference wrt OWL-Horst Optimized
Dataset size:Dataset size:NumberOfStatements=3,899,531,218NumberOfExplicitStatements= 993,332,911NumberOfEntities=264,523,842
EDM modelSKOS
SPARQL endpoint
• http://europeana.ontotext.com
Semantic Queries over Structured Data
• Available objects with their aggregators
• Data providers having contributing content to Europeana
• Datasets from Italy
• Objects from the 18th century provided to Europeana
• The original URL, the copyright and the creative commons right of objects provided by The
European Library
• Copyrights and Creative Commons rights of Europeana objects per provider
• Enrichment statements produced by Europeana for objects provided by institutions from
the United Kingdom
• List of Europeana enriched objects from Sweden, their equivalents and related entities
• Time enrichment statements produced by Europeana for provided objects
• The complete ordered list of Europeana aggregators and the specific data providers they
gather
Europeana objects with their images
Other cultural heritage sources available for interlinking
Gothenburg City Museum objects
• Oil paintings from the GIM collection
• Paintings of value less than 5000 Swedish Krona
• Paintings with a Gothenburg motive• Paintings with a Gothenburg motive
• Portraits and their painters
• Museum Objects from Swedish Museums
• Museum objects of height more than 30 centimeter
• Paintings given as a present to the Gothenburg City Museum
http://museum.ontotext.com
Linking Open Data Cloud
Europeana Creative - PSP projectlead by the Austrian National Library26 partnersObjective: experimenting with re-use of cultural
content for creativityProject: Europeana re-use framework and 6 pilots in
Outlook …
17Sofia, 13 March 2012
Project: Europeana re-use framework and 6 pilots in different domains such as education, tourism, etc.
Ontotext: participate in the infrastructure for re-use with the semantic repository OWLIM, and data integration
Ontotext
– Top-5 provider of core Semantic Technology
– Established in year 2000; offices in Bulgaria, UK, USA
– Active both in research and commercial projects (FP7 funding for 10 years)
• 360° semantic technology – unique portfolio:
– Semantic Databases: high-performance RDF DBMS, scalable reasoning
– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)
– Web Mining: focused crawling, screen scraping, data fusion
– Linked Data Management and Data Integration
Good recognition in the SemTech community
– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at
GYM, #3 for “linked data management” at Google
Several joint ventures and subsidiaries
– Innovantage: leading online recruitment intelligence provider in UK
Ontotext Clients (selected)
British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM
– Since Mar’12 BBC Sports
– 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext
Press Association (UK)– Analysis of Sports news
– Concept extraction
– Linked data generation– Linked data generation
Top-3 USA media (not allowed to name)
The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive
British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM
Ontotext in the Cultural Heritage Domain
Selected commercial projects
ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium.
The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext
LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.
SemTech for Cultural Heritage project funded by ITCCSemantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for Europeana
Selected research projects
MOLTO FP7 project, a use case in cultural heritage for a semantic knowledgerepresentationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City
Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext