Drupal and Apache Stanbol

Post on 11-May-2015

534 views 1 download

Tags:

description

l SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES

Transcript of Drupal and Apache Stanbol

Drupal and Apache Stanbol

SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES

Gabriel Dragomir

• Drupal developer, trainer and consultant

• Founding member of Drupal Romania Association

About me

The Semantic Web

• Tim Berners Lee:

‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’

What’s the hype?

• Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data

• Examples:

• keyword extraction from content: annotate abstracts

• text categorization: organize big volumes of text based on a thesaurus

• media monitoring of tags: occurences of a specific keyword on social media channels

Linked data

http://lod-cloud.net/

Linked data

• Project started in 2007

• Aimed at building the Web of Data by:

• identifying open access data sets

• converting them into RDF vocabularies

• publish them as open access data sets

Linked data ecosystem

• Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/

• Provides a conceptual map of the vocabularies

• Various providers: libraries, governmental actors, NGOs

Linked data at work!

Semantic annotation

• Creates specific metadata that enable new ways to retrieve and aggregate information

• Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core)

• For more on ontologies see: http://www.w3.org/wiki/Good_Ontologies

• The annotations build semantic

Semantic annotation

• Most common uses:

• Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais)

• Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.

Apache Stanbol on the fly

• Here comes Apache Stanbol

• A new approach:

• modular semantic analysis of documents

• processing components can be built for virtually any language

• flexible workflows via semantic annotation chains

• any vocabulary (Linked Data, custom) can be used

Service oriented architecture

• Stanbol is designed to offer service oriented integration

• RESTful web services API returning RDF or JSON/JSON-LD

• Each component exposes an endpoint independently

• Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling

• Remote component management

Implementation• OSGi layer: Apache Felix and Apache Sling

• Build environment: Apache Maven

• RDF framework: Apache Clerezza

• Triples store, reasoning engine: Apache Jena

• Indexing and semantic search: Apache Solr

• Content analysis/metadata extraction: Apache Tika

• Natural language processing: Apache OpenNLP

Architecture

Components

• Semantic layer:

• Enhancer, EntityHub, ContentHub

• Enhancement engines: internal, 3rd party

• User interfaces

• Knowledge integration (rule sets, reasoners)

• Storage integration

Content enhancement

• Examples:

• retrieve additional metadata for a piece of content

• identify the language of a text

• extract entities (persons, places, organizations)

• create annotations to external sources

• use 3rd party services for named entities recognition

Drupal meets Stanbol

• Several modules implement RDF support allowing data transport to Stanbol semantic annotations

• Taxonomy system allows for complex annotation

• Fieldable taxonomy terms allow for storage of complex semantic data

User scenarios

• Semantic indexing via Stanbol (SOLR yard)

• Content enrichment with semantically related information (documents, factual data, images etc.)

• Tag as you type: dynamic annotation of text in editors

How it works• POST request sends content via REST API

• content is processed by an enhancement chain

• Returns JSON-LD, RDF/XML, RDF/JSON etcJSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format

• for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation

• http://stanbol-yle.jelastic.planeetta.net/demo/enhancer

Drupal integration

Source: blog.iks-project.eu

Drupal distribution: IKS CE

• IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor)

• Components:

• Search API Stanbol

• VIE.js - semantic annotation UI

• https://drupal.org/project/iksce

• http://drupal.org/project/vie

• http://drupal.org/project/search_api_stanbol

• https://github.com/fago/stanbol-for-drupal

Search API Stanbol

• enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub.

• data sent as RDF

• data can be mashed up with data from other sources (Managed Sites, Remote Sites)

VIE.js

• “Vienna IKS Editables”

• JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.

Monolitic vs Decoupled Content Management Systems

• Monolitic vs Decoupled Content Management Systems

source: Henri Bergius - http://bergie.iki.fi

Demo setup

• we store Drupal entities in a SOLR index

• annotations are to be made based on:

• DBPedia - bundled with Apache Stanbol

• a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus

• SemWeb is imported as a SOLR index into Apache Stanbol

Custom vocabularies

• PoolParty Semantic Web

• 224 concepts related to semantic web

• Author: Andreas Blumauer

• http://vocabulary.semantic-web.at/PoolPartySemanticWeb.html

• http://vocabulary.semantic-web.at/PoolPartySemanticWeb/Drupal.html

Demo

• index Drupal entities in Apache Stanbol

• retrieve annotated entites via REST API

• annotate entities using dbpedia and semweb indexes

• edit Drupal entities and annotate on the fly

• retrieve linked data tag recommendations

Questions?

Contact me

• gabriel.dragomir@webikon.com

• twitter: gabidrg

Thank you!