Drupal and Apache Stanbol

31
Drupal and Apache Stanbol SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES Gabriel Dragomir

description

l SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES

Transcript of Drupal and Apache Stanbol

Page 1: Drupal and Apache Stanbol

Drupal and Apache Stanbol

SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES

Gabriel Dragomir

Page 2: Drupal and Apache Stanbol

• Drupal developer, trainer and consultant

• Founding member of Drupal Romania Association

About me

Page 3: Drupal and Apache Stanbol

The Semantic Web

• Tim Berners Lee:

‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’

Page 4: Drupal and Apache Stanbol

What’s the hype?

• Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data

• Examples:

• keyword extraction from content: annotate abstracts

• text categorization: organize big volumes of text based on a thesaurus

• media monitoring of tags: occurences of a specific keyword on social media channels

Page 5: Drupal and Apache Stanbol

Linked data

http://lod-cloud.net/

Page 6: Drupal and Apache Stanbol

Linked data

• Project started in 2007

• Aimed at building the Web of Data by:

• identifying open access data sets

• converting them into RDF vocabularies

• publish them as open access data sets

Page 7: Drupal and Apache Stanbol

Linked data ecosystem

• Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/

• Provides a conceptual map of the vocabularies

• Various providers: libraries, governmental actors, NGOs

Page 9: Drupal and Apache Stanbol

Linked data at work!

Page 10: Drupal and Apache Stanbol

Semantic annotation

• Creates specific metadata that enable new ways to retrieve and aggregate information

• Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core)

• For more on ontologies see: http://www.w3.org/wiki/Good_Ontologies

• The annotations build semantic

Page 11: Drupal and Apache Stanbol

Semantic annotation

• Most common uses:

• Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais)

• Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.

Page 12: Drupal and Apache Stanbol

Apache Stanbol on the fly

• Here comes Apache Stanbol

• A new approach:

• modular semantic analysis of documents

• processing components can be built for virtually any language

• flexible workflows via semantic annotation chains

• any vocabulary (Linked Data, custom) can be used

Page 13: Drupal and Apache Stanbol

Service oriented architecture

• Stanbol is designed to offer service oriented integration

• RESTful web services API returning RDF or JSON/JSON-LD

• Each component exposes an endpoint independently

• Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling

• Remote component management

Page 14: Drupal and Apache Stanbol

Implementation• OSGi layer: Apache Felix and Apache Sling

• Build environment: Apache Maven

• RDF framework: Apache Clerezza

• Triples store, reasoning engine: Apache Jena

• Indexing and semantic search: Apache Solr

• Content analysis/metadata extraction: Apache Tika

• Natural language processing: Apache OpenNLP

Page 15: Drupal and Apache Stanbol

Architecture

Page 16: Drupal and Apache Stanbol

Components

• Semantic layer:

• Enhancer, EntityHub, ContentHub

• Enhancement engines: internal, 3rd party

• User interfaces

• Knowledge integration (rule sets, reasoners)

• Storage integration

Page 17: Drupal and Apache Stanbol

Content enhancement

• Examples:

• retrieve additional metadata for a piece of content

• identify the language of a text

• extract entities (persons, places, organizations)

• create annotations to external sources

• use 3rd party services for named entities recognition

Page 18: Drupal and Apache Stanbol

Drupal meets Stanbol

• Several modules implement RDF support allowing data transport to Stanbol semantic annotations

• Taxonomy system allows for complex annotation

• Fieldable taxonomy terms allow for storage of complex semantic data

Page 19: Drupal and Apache Stanbol

User scenarios

• Semantic indexing via Stanbol (SOLR yard)

• Content enrichment with semantically related information (documents, factual data, images etc.)

• Tag as you type: dynamic annotation of text in editors

Page 20: Drupal and Apache Stanbol

How it works• POST request sends content via REST API

• content is processed by an enhancement chain

• Returns JSON-LD, RDF/XML, RDF/JSON etcJSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format

• for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation

• http://stanbol-yle.jelastic.planeetta.net/demo/enhancer

Page 21: Drupal and Apache Stanbol

Drupal integration

Source: blog.iks-project.eu

Page 22: Drupal and Apache Stanbol

Drupal distribution: IKS CE

• IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor)

• Components:

• Search API Stanbol

• VIE.js - semantic annotation UI

• https://drupal.org/project/iksce

• http://drupal.org/project/vie

• http://drupal.org/project/search_api_stanbol

• https://github.com/fago/stanbol-for-drupal

Page 23: Drupal and Apache Stanbol

Search API Stanbol

• enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub.

• data sent as RDF

• data can be mashed up with data from other sources (Managed Sites, Remote Sites)

Page 24: Drupal and Apache Stanbol

VIE.js

• “Vienna IKS Editables”

• JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.

Page 25: Drupal and Apache Stanbol

Monolitic vs Decoupled Content Management Systems

• Monolitic vs Decoupled Content Management Systems

source: Henri Bergius - http://bergie.iki.fi

Page 26: Drupal and Apache Stanbol

Demo setup

• we store Drupal entities in a SOLR index

• annotations are to be made based on:

• DBPedia - bundled with Apache Stanbol

• a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus

• SemWeb is imported as a SOLR index into Apache Stanbol

Page 27: Drupal and Apache Stanbol

Custom vocabularies

• PoolParty Semantic Web

• 224 concepts related to semantic web

• Author: Andreas Blumauer

• http://vocabulary.semantic-web.at/PoolPartySemanticWeb.html

• http://vocabulary.semantic-web.at/PoolPartySemanticWeb/Drupal.html

Page 28: Drupal and Apache Stanbol

Demo

• index Drupal entities in Apache Stanbol

• retrieve annotated entites via REST API

• annotate entities using dbpedia and semweb indexes

• edit Drupal entities and annotate on the fly

• retrieve linked data tag recommendations

Page 29: Drupal and Apache Stanbol

Questions?

Page 30: Drupal and Apache Stanbol

Contact me

[email protected]

• twitter: gabidrg

Page 31: Drupal and Apache Stanbol

Thank you!