[Webinar] Semantic Technologies

77
Nuxeo & Semantic Technologies Stefane Fermigier - Nuxeo The Web - May 2011 Wednesday, May 25, 2011

description

Want to get an update on Nuxeo's involvement in semantic search and knowledge extraction? Watch this slideshow to hear all the latest news on this topic and learn how it may impact the future of Enterprise Content Management! If you want to go further, watch the video of a webinar using this slideshow http://www.youtube.com/watch?v=YLgJKx1y6Fk

Transcript of [Webinar] Semantic Technologies

Page 1: [Webinar] Semantic Technologies

Nuxeo & Semantic TechnologiesStefane Fermigier - Nuxeo

The Web - May 2011

Wednesday, May 25, 2011

Page 2: [Webinar] Semantic Technologies

Agenda

• A pragmatic introduction to the Semantic Web

• Experience report and demos from Nuxeo

Wednesday, May 25, 2011

Page 3: [Webinar] Semantic Technologies

1. Introduction to the Semantic Web

Wednesday, May 25, 2011

Page 4: [Webinar] Semantic Technologies

Prelude

Wednesday, May 25, 2011

Page 5: [Webinar] Semantic Technologies

Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011

Page 6: [Webinar] Semantic Technologies

Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011

Page 7: [Webinar] Semantic Technologies

Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011

Page 8: [Webinar] Semantic Technologies

Source: Mills Davis, “Semantic Social Computing”, sept. 2007Wednesday, May 25, 2011

Page 9: [Webinar] Semantic Technologies

History

Wednesday, May 25, 2011

Page 10: [Webinar] Semantic Technologies

Wednesday, May 25, 2011

Page 11: [Webinar] Semantic Technologies

Invented the web in 1989(yeah!)

Wednesday, May 25, 2011

Page 12: [Webinar] Semantic Technologies

Invented the web in 1989(yeah!)

Invented the semantic web in 1994 (duh?)

Wednesday, May 25, 2011

Page 13: [Webinar] Semantic Technologies

Historical perspective

• From web 1.0: web of sites and pages, aka the World Wide Web

• To web 2.0: web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.)

• To web 3.0: web of data, of meaning and connected knowledge, aka the Semantic Web

Wednesday, May 25, 2011

Page 14: [Webinar] Semantic Technologies

Semantics & Ontologies

Wednesday, May 25, 2011

Page 15: [Webinar] Semantic Technologies

Wednesday, May 25, 2011

Page 16: [Webinar] Semantic Technologies

Wednesday, May 25, 2011

Page 17: [Webinar] Semantic Technologies

Wednesday, May 25, 2011

Page 18: [Webinar] Semantic Technologies

Wednesday, May 25, 2011

Page 19: [Webinar] Semantic Technologies

Some examples

• FOAF: relationships between people (social network)

• SIOC: relationships between websites, articles, blogs, comments

• Rich Snippets: syndicate RDFa content for SEO by Google, Yahoo

• good-relations: e-commerce (Ebay...)

• rNews: metadata for news agencies (AFP, Reuters...)

Wednesday, May 25, 2011

Page 20: [Webinar] Semantic Technologies

How is it related tothe Web?

Wednesday, May 25, 2011

Page 21: [Webinar] Semantic Technologies

The traditional Web

• A principle: hypertext

• A protocol: HTTP

• An identification scheme: URNs/URIs

• A language: HTML

Wednesday, May 25, 2011

Page 22: [Webinar] Semantic Technologies

“To a computer, then, the web is a flat, boring world devoid of meaning”

Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011

Page 23: [Webinar] Semantic Technologies

“This is a pity, as in fact documents on the web describe real objects and imaginary

concepts, and give particular relationships between them”

Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011

Page 24: [Webinar] Semantic Technologies

“Adding semantics to the web involves two things: allowing documents which have information in

machine-readable forms, and allowing links to be created with relationship values.”

Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011

Page 25: [Webinar] Semantic Technologies

“The Semantic Web is not a separate Web but an extension of the current one, in which information

is given well-defined meaning, better enabling computers and people to work in cooperation.”

Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/Wednesday, May 25, 2011

Page 26: [Webinar] Semantic Technologies

The traditional Web

• A principle: hypertext

• A protocol: HTTP

• An identification scheme: URNs/URIs

• A language: HTML

Wednesday, May 25, 2011

Page 27: [Webinar] Semantic Technologies

The semantic Web

• A principle: hypertext

• A protocol: HTTP

• An identification scheme: URNs/URIs

• A language: HTML RDF

Wednesday, May 25, 2011

Page 28: [Webinar] Semantic Technologies

The W3C “Layer Cake”

Wednesday, May 25, 2011

Page 29: [Webinar] Semantic Technologies

The W3C “Layer Cake”

Alreadystandardized

Wednesday, May 25, 2011

Page 30: [Webinar] Semantic Technologies

URIs and theWeb of Things

• URIs (Unique Resource Identifiers) are used to identify things (also called entities) in the real world

• For instance: people, places, events, companies, products, movies, etc.

Wednesday, May 25, 2011

Page 31: [Webinar] Semantic Technologies

The RDF model

Subject ObjectPredicate

RDF is used to describe relationships between objects, identified by their URIs

Wednesday, May 25, 2011

Page 33: [Webinar] Semantic Technologies

RDF serialization

As XML:

Others, ex: N3:

Wednesday, May 25, 2011

Page 34: [Webinar] Semantic Technologies

SPARQL

• Query language for RDF databases

• Several implementations

• OSS: Apache Jena, Sesame, 4Store, Virtuoso, Mulgara, Redland, Open Anzo...

• Proprietary: 5Store, AllegroGraph RDFStore, Stardog, Dydra, OWLIM...

• More expressive than SQL, scalability is still an open question

Wednesday, May 25, 2011

Page 35: [Webinar] Semantic Technologies

SPARQL Sample

Wednesday, May 25, 2011

Page 36: [Webinar] Semantic Technologies

Where and howto find these data?

Wednesday, May 25, 2011

Page 37: [Webinar] Semantic Technologies

Solution 1: “Lift”

• One can use HTML scrapping and natural language processing (NLP) technique to extract semantic information from existing content / sites

• Generic solutions: OpenCalais, Zemanta, Apache Stanbol

• Pro: no need to change existing content

• Con: error prone, needs human checks

Wednesday, May 25, 2011

Page 38: [Webinar] Semantic Technologies

Example: DBPedia

Wednesday, May 25, 2011

Page 39: [Webinar] Semantic Technologies

Solution 2: export

• RDFa and microformats are used to embed semantic information (expressed using the RDF model) into regular web pages

• RDFa does it using existing (rel) and additional (about, property, typeof) attributes

• Microformats only use usual HTML attributes (class)

Wednesday, May 25, 2011

Page 40: [Webinar] Semantic Technologies

Solution 3: reuse

• Linked Open Data: (usually large) data repositories available on the web (for free or not), expressed using the RDF model

• Interoperability between these repositories (their ontologies) must be defined

Wednesday, May 25, 2011

Page 41: [Webinar] Semantic Technologies

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Linked Open Data in 2007

Wednesday, May 25, 2011

Page 42: [Webinar] Semantic Technologies

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2008

Wednesday, May 25, 2011

Page 43: [Webinar] Semantic Technologies

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2009

Wednesday, May 25, 2011

Page 44: [Webinar] Semantic Technologies

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

2010

Wednesday, May 25, 2011

Page 45: [Webinar] Semantic Technologies

Good for Enterprise apps too!

Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/Wednesday, May 25, 2011

Page 46: [Webinar] Semantic Technologies

Why now?

Wednesday, May 25, 2011

Page 47: [Webinar] Semantic Technologies

Key Enablers

Open Data and Linked Online Data

Advances in automatic content analysis (linguistics, image processing) and machine learning

Classical logic and classical AI

Computing power (Moore’s law + MapReduce)

Wednesday, May 25, 2011

Page 48: [Webinar] Semantic Technologies

Let’s put them to use!

The technologies and data are available,

Wednesday, May 25, 2011

Page 49: [Webinar] Semantic Technologies

2. Nuxeo &Semantic ECM

Wednesday, May 25, 2011

Page 50: [Webinar] Semantic Technologies

Nuxeo

Wednesday, May 25, 2011

Page 51: [Webinar] Semantic Technologies

Nuxeo: an open source ECM vendor

Our Focus is Enterprise Content Management

ECM as a Platform for Content Applications

Open Source as Efficient Development Model

Modern architecture for 21st Century business

“Lean, mobile, social, interoperable”

A Social Marketplace in action

Innovation driven by community of customers, partners, and our core developers

Wednesday, May 25, 2011

Page 52: [Webinar] Semantic Technologies

49

Nuxeo ECM - From Platform to Products

PlatformContent

Infrastructure

Nuxeo Enterprise PlatformComplete set of components covering all aspects of ECM

Nuxeo CoreLightweight, scalable, embeddable content repository

HorizontalPackages

DocumentManagement

Digital AssetManagement

CaseManagement

Framework

StructuredDocument

Server

ContentAggregator

Business Solutions

Correspondence Management

Contracts Management

Invoice ProcessingRecords

Management

Construction Media Government Life Sciences

Wednesday, May 25, 2011

Page 53: [Webinar] Semantic Technologies

Major Customers

Wednesday, May 25, 2011

Page 54: [Webinar] Semantic Technologies

Goals

Wednesday, May 25, 2011

Page 55: [Webinar] Semantic Technologies

Goals for Semantic ECM

Repurpose existing content

Improve search and collaboration

Make information contextual

Extract and use information from your content

Make your content smarter!

Wednesday, May 25, 2011

Page 56: [Webinar] Semantic Technologies

Semantic ECM

Wednesday, May 25, 2011

Page 57: [Webinar] Semantic Technologies

Content

Text

Image

Sound

Video

Semantic ECM

Wednesday, May 25, 2011

Page 58: [Webinar] Semantic Technologies

Content

Text

Image

Sound

Video

Meaning

Metadata

Relations

EntitiesTags

Reasoning

Semantic ECM

Wednesday, May 25, 2011

Page 59: [Webinar] Semantic Technologies

Content

Text

Image

Sound

Video

Meaning

Metadata

Relations

EntitiesTags

Reasoning

Semantic ECM

Wednesday, May 25, 2011

Page 60: [Webinar] Semantic Technologies

Content Stack vs. Knowledge Cake

Architectural Challenge

Wednesday, May 25, 2011

Page 61: [Webinar] Semantic Technologies

Business valuefrom semantic ECM

Efficiency gains: 20% to 90% (ex: in search, collaboration)

Effectiveness gains: better returns from your assets (ex: news and images from AFP)

Strategic edge: growth, value capture, new services, gain unfair strategic advantage (ex: vertical ontologies for CEVAs / CCAs)

Wednesday, May 25, 2011

Page 62: [Webinar] Semantic Technologies

56

Demo

Wednesday, May 25, 2011

Page 63: [Webinar] Semantic Technologies

How does it work?

Wednesday, May 25, 2011

Page 64: [Webinar] Semantic Technologies

58

IKS project

• European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget

• Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products

• Started in Jan. 2009, will last until Dec. 2012

• First tangible result: Apache Stanbol, already integrated in a Nuxeo plugin

Wednesday, May 25, 2011

Page 65: [Webinar] Semantic Technologies

59

Wednesday, May 25, 2011

Page 66: [Webinar] Semantic Technologies

Stanbol: a semantic engine

• From unstructured content to Knowledge

• Language guessing

• Topic classification (Business, Sports, Media, ...)

• Named Entities extraction and linking

• Relationships and properties extraction

• Pluggable with proprietary engines (ex: Temis)

Wednesday, May 25, 2011

Page 67: [Webinar] Semantic Technologies

61

Wednesday, May 25, 2011

Page 68: [Webinar] Semantic Technologies

62

Wednesday, May 25, 2011

Page 69: [Webinar] Semantic Technologies

63

RESTfulis

Beautiful

Wednesday, May 25, 2011

Page 70: [Webinar] Semantic Technologies

64

= Semantic Engines

(Apache OpenNLP) +

Fast Linked Data local index(Apache Solr)

+ Semantic Rule Engine

(Apache Jena)Wednesday, May 25, 2011

Page 71: [Webinar] Semantic Technologies

Local IT infrastructure (LAN) 65

Nuxeo DM

addon

1

Apache Stanbol

2

Engine 1

Engine 2

Engine 3

3

DBpedia

Freebase

GeonamesLDAP

Wednesday, May 25, 2011

Page 72: [Webinar] Semantic Technologies

How to try it?

Wednesday, May 25, 2011

Page 73: [Webinar] Semantic Technologies

https://connect.nuxeo.com/nuxeo/site/marketplace/category/semanticWednesday, May 25, 2011

Page 74: [Webinar] Semantic Technologies

Notes

• Nuxeo EP 5.4.2 (next week) will have significant improvements to enable new features of the semantic plugins

• Source code here: http://hg.nuxeo.org/addons/nuxeo-platform-semantic-entities/

• Join us at the IKS Paris Workshop on July 5-6 to learn much more about Nuxeo and semantic technologies!

Wednesday, May 25, 2011

Page 75: [Webinar] Semantic Technologies

69

Resources• http://iks-project.eu

• http://stanbol.demo.nuxeo.com

• http://incubator.apache.org/stanbol

• http://blogs.nuxeo.com/dev

• http://hadoop.apache.org/

• http://incubator.apache.org/opennlp/

Wednesday, May 25, 2011

Page 76: [Webinar] Semantic Technologies

70

Questions?

Wednesday, May 25, 2011

Page 77: [Webinar] Semantic Technologies

71

Up Next!

Live Demo - Nuxeo StudioJune 1, 2011

Building Packages for the Nuxeo Marketplace

Juen 8, 2011

Wednesday, May 25, 2011