Nuxeo World Session: Semantic Technologies - Update on Recent Research

50
Nov. 17 2010 - S. Fermigier & O. Grisel, Nuxeo Towards semantic ECM: report on the IKS and Scribo projects Monday, November 22, 2010

description

Presentation from Nuxeo World 2010 (November 17-18, 2010).

Transcript of Nuxeo World Session: Semantic Technologies - Update on Recent Research

Page 1: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Nov. 17 2010 - S. Fermigier & O. Grisel, Nuxeo

Towards semantic ECM:report on the IKS and Scribo projects

Monday, November 22, 2010

Page 2: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Outline

• Introduction to semantic technologies

• Collaborative R&D within the Scribo and IKS projects

• Fise & Apache Stanbol / Nuxeo Integration

Monday, November 22, 2010

Page 3: Nuxeo World Session: Semantic Technologies - Update on Recent Research

1. Introduction to semantic technologies

Monday, November 22, 2010

Page 4: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Illustration source: Mills Davis, “Semantic Social Computing”, sept. 2007Monday, November 22, 2010

Page 5: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Photo source: http://www.flickr.com/photos/pixelydixel/Monday, November 22, 2010

Page 6: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Invented the web in 1989(yeah!)

Photo source: http://www.flickr.com/photos/pixelydixel/Monday, November 22, 2010

Page 7: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Invented the web in 1989(yeah!)

Invented the semantic web in 1999 (duh?)

Photo source: http://www.flickr.com/photos/pixelydixel/Monday, November 22, 2010

Page 8: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Historical perspective

• From web 1.0: web of pages, aka the World Wide Web

• To web 2.0: web of people and of participation, aka the Social Web

• To web 3.0: web of data, of meaning and of connected knowledge, aka the Semantic Web

Monday, November 22, 2010

Page 9: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Picture source: http://www.flickr.com/photos/pixelydixel/Monday, November 22, 2010

Page 10: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Monday, November 22, 2010

Page 11: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Monday, November 22, 2010

Page 12: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Monday, November 22, 2010

Page 13: Nuxeo World Session: Semantic Technologies - Update on Recent Research

A “layer cake” of technologies

Monday, November 22, 2010

Page 14: Nuxeo World Session: Semantic Technologies - Update on Recent Research

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Linked Online Data in 2007

Monday, November 22, 2010

Page 15: Nuxeo World Session: Semantic Technologies - Update on Recent Research

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2008

Monday, November 22, 2010

Page 16: Nuxeo World Session: Semantic Technologies - Update on Recent Research

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2009

Monday, November 22, 2010

Page 17: Nuxeo World Session: Semantic Technologies - Update on Recent Research

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2010

Monday, November 22, 2010

Page 18: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Good for Enterprise apps too!

Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/Monday, November 22, 2010

Page 19: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Key Enablers

• Open Data and Linked Online Data

• Advances in automatic content analysis (linguistics, image processing)

• Computing power (Moore’s law + MapReduce)

• Classical logic and classical AI

Monday, November 22, 2010

Page 20: Nuxeo World Session: Semantic Technologies - Update on Recent Research

let’s put them to use!

The technologies and data are available,

Monday, November 22, 2010

Page 21: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Content Meaning

Text

Image

Sound

Video

Metadata

Relations

EntitiesTags

Reasoning

Semantic ECM

Monday, November 22, 2010

Page 22: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Goals for Semantic ECM(& Nuxeo)

• Repurpose existing content

• Improve search and collaboration

• Make information contextual

• Extract and use information from your content

•Make your content smarter!

Monday, November 22, 2010

Page 23: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Challenges

• Extract meaning from content

• Enrich content with knowledge

• Enhance interaction with content thanks to added meaning

Monday, November 22, 2010

Page 24: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Business valuefrom semantic ECM

• Efficiency gains: 20% to 90% (ex: in search, collaboration)

• Effectiveness gains: better returns from your assets (ex: news and images from AFP)

• Strategic edge: growth, value capture, new services, gain unfair strategic advantage (ex: vertical ontologies for CEVAs / CCAs)

Monday, November 22, 2010

Page 25: Nuxeo World Session: Semantic Technologies - Update on Recent Research

2. SCRIBO and IKS

Monday, November 22, 2010

Page 26: Nuxeo World Session: Semantic Technologies - Update on Recent Research

• Project under the french FUI program, with 9 partners, and a budget of 4.7 M€

• Goal: to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images

• Started in 2008, finishing in Dec. 2010, with results already integrated as a Nuxeo plugin

Monday, November 22, 2010

Page 27: Nuxeo World Session: Semantic Technologies - Update on Recent Research

• European project under the FP7, with 13 partners (6 SMEs) and a 8.5 M€ budget

• Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products

• Started in Jan. 2009, will last until Dec. 2012

• First tangible result: FISE, already integrated in a Nuxeo plugin

Monday, November 22, 2010

Page 28: Nuxeo World Session: Semantic Technologies - Update on Recent Research

3. Linking Semantic EntitiesApache Stanbol - Nuxeo integration

Monday, November 22, 2010

Page 29: Nuxeo World Session: Semantic Technologies - Update on Recent Research

What are entities?

27

Monday, November 22, 2010

Page 30: Nuxeo World Session: Semantic Technologies - Update on Recent Research

28

Monday, November 22, 2010

Page 31: Nuxeo World Session: Semantic Technologies - Update on Recent Research

What is wrong with tags?

29

• Many terms for same meaning

• NYC, New York, New York City

• Many meanings for same terms

• Need context to remove any ambiguity

Monday, November 22, 2010

Page 32: Nuxeo World Session: Semantic Technologies - Update on Recent Research

30

Washington is...

Monday, November 22, 2010

Page 33: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Tagging with Entities

31

• Global namespace / universal meaning context

• Interoperability across domains

• Interoperability across applications

Monday, November 22, 2010

Page 34: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Demo time!

32

Screencast online at http://blogs.nuxeo.com/dev

Monday, November 22, 2010

Page 35: Nuxeo World Session: Semantic Technologies - Update on Recent Research

How does this work?

33

Monday, November 22, 2010

Page 36: Nuxeo World Session: Semantic Technologies - Update on Recent Research

34

Monday, November 22, 2010

Page 37: Nuxeo World Session: Semantic Technologies - Update on Recent Research

35

• Open Source Semantic Engine

• HTTP Services

• For content driven applications

• OSGi: loosely coupled components

• Analysis Engines

• Knowledge RDF vocabularies

Monday, November 22, 2010

Page 38: Nuxeo World Session: Semantic Technologies - Update on Recent Research

What is a semantic engine?

36

• Unstructured content => Knowledge

• Language guessing

• Topic classification (Business, Sports, Media, ...)

• Named Entities extraction and linking

• Relationships and properties extraction

Monday, November 22, 2010

Page 39: Nuxeo World Session: Semantic Technologies - Update on Recent Research

37

Monday, November 22, 2010

Page 40: Nuxeo World Session: Semantic Technologies - Update on Recent Research

38

Monday, November 22, 2010

Page 41: Nuxeo World Session: Semantic Technologies - Update on Recent Research

39

RESTfulis

Beautiful

Monday, November 22, 2010

Page 42: Nuxeo World Session: Semantic Technologies - Update on Recent Research

40

curl -X POST \ -H "Accept: application/json" \ -H "Content-type: text/plain" \ --data "John Smith works at Smith Consulting in Paris." \ http://fise.demo.nuxeo.com/engines

{ "urn:enhancement-1564680b-861c-df6f-fdf9-d34a75d68dfe": { "http://fise.iks-project.eu/ontology/selected-text": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "Paris" } ], "http://fise.iks-project.eu/ontology/selection-context": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "John Smith works at Smith Consulting Paris." } ], "http://purl.org/dc/terms/type": [ { "type": "uri", "value": "http://dbpedia.org/ontology/Place" } ] }, …

Monday, November 22, 2010

Page 43: Nuxeo World Session: Semantic Technologies - Update on Recent Research

41

Monday, November 22, 2010

Page 44: Nuxeo World Session: Semantic Technologies - Update on Recent Research

42

Monday, November 22, 2010

Page 45: Nuxeo World Session: Semantic Technologies - Update on Recent Research

43

= fise +

fast Linked Data local index +

semantic rule engine+

more ?

Monday, November 22, 2010

Page 46: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Apache Stanbol / Nuxeo integration

44

Monday, November 22, 2010

Page 47: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Local IT infrastructure (LAN) 45

Nuxeo DM

addon

1

Apache Stanbol

2

Engine 1

Engine 2

Engine 3

3

DBpedia

Freebase

GeonamesLDAP

Monday, November 22, 2010

Page 48: Nuxeo World Session: Semantic Technologies - Update on Recent Research

46

• Implemented as an Operation for Studio

• Entities & Relationships stored in Nuxeo Core

• CMIS interoperability

Monday, November 22, 2010

Page 49: Nuxeo World Session: Semantic Technologies - Update on Recent Research

Soon available on marketplace.nuxeo.com

47

Monday, November 22, 2010

Page 50: Nuxeo World Session: Semantic Technologies - Update on Recent Research

48

• http://iks-project.eu

• http://fise.demo.nuxeo.com

• http://scribo.ws

• http://incubator.apache.org/stanbol

• http://blogs.nuxeo.com/dev

Questions?

Monday, November 22, 2010