Knowledge Streams: Stream Processing of Semantic Web Content

18
Knowledge Streams: Stream Processing of Semantic Web Content Mike Dean Principal Engineer Raytheon BBN Technologies [email protected] 1

description

Knowledge Streams: Stream Processing of Semantic Web Content. Mike Dean Principal Engineer Raytheon BBN Technologies [email protected]. Assumptions. Technology – Intermediate Familiarity with RDF and OWL Interest in Stream processing Scalability. Presenter Background. - PowerPoint PPT Presentation

Transcript of Knowledge Streams: Stream Processing of Semantic Web Content

Page 1: Knowledge Streams:  Stream Processing of Semantic Web Content

Knowledge Streams: Stream Processing of Semantic Web Content

Mike DeanPrincipal Engineer

Raytheon BBN [email protected]

1

Page 2: Knowledge Streams:  Stream Processing of Semantic Web Content

Assumptions

• Technology – Intermediate– Familiarity with RDF and OWL

• Interest in– Stream processing– Scalability

2

Page 3: Knowledge Streams:  Stream Processing of Semantic Web Content

Presenter Background

• Principal Engineer at Raytheon BBN Technologies (1984-present)• Principal Investigator for DARPA Agent Markup Language (DAML)

Integration and Transition (2000-2005)– Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL

• Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present)

• Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups

– Co-editor of the W3C OWL Reference• Local co-chair for ISWC2009• Other SemTech presentations

– Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher)

– Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher)

– Use of SWRL for Ontology Translation (2008)– Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John

Hebeler)– How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge

Corpus (2009)– Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim

and Todd Schneider)3

Page 4: Knowledge Streams:  Stream Processing of Semantic Web Content

Outline

• Motivation• Vision• Building Blocks• Demonstration

4

Page 5: Knowledge Streams:  Stream Processing of Semantic Web Content

Motivations

• Timeliness• Performance

5

Page 6: Knowledge Streams:  Stream Processing of Semantic Web Content

Timeliness

• Streaming minimizes latency– Processing elements see events as they occur– Resources are expended only when an event occurs

• This is in contrast to polling– Latency averages half the polling interval– Resources are expended on every poll– Popular web syndication mechanisms such as RSS

and Atom involve polling

6

Page 7: Knowledge Streams:  Stream Processing of Semantic Web Content

Performance

• Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access– Analogous to XML SAX vs. DOM

• For suitable applications, this can be 10x faster than loading all statements into memory or a KB

7

Page 8: Knowledge Streams:  Stream Processing of Semantic Web Content

2 Streaming Stories

• dumpont of OpenCyc (circa 2003)– HTML-based ontology visualization tool periodically

bogged down daml.org server– Reimplementation using event-based Jena ARP parser

yielded 10x performance and scalability improvements

• Billion Triples Challenge 2009– Streaming analysis of the 2009 corpus was

performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk

– Compare to loading 10-20K statements/second on a server

8

Page 9: Knowledge Streams:  Stream Processing of Semantic Web Content

Stream Processing Examples

• Unix pipes• Dataflow architectures• Streambase• IBM System S/InfoSphere Streams

9

Page 10: Knowledge Streams:  Stream Processing of Semantic Web Content

aggregationaggregation

persistentqueriespersistentqueries

augmentationaugmentationcontextfiltercontextfilter

alertsalerts

correlationcorrelationtranslationtranslation

inferenceinference

distributiondistribution

DataDataSourcesSources

Distribution And Processing ElementsDistribution And Processing Elements

UsersUsers

CEPCEPNLPNLP

Sensor Sensor NetworkNetwork

ImageryImagery

RSSRSS

IMIM

GazetteerGazetteer

SensorSensor

Semantic Semantic WebWeb

DatabaseDatabase

Persistent pipelines• Streams of statements comprising

object subgraphs• URI naming allows drill-down• Provenance, timestamps

Processing elements •Consume and produce subgraphs •Multiple functions may be combined

ArchiveArchive

User 2User 2

User 3User 3

Community of Interest 1

Community of Interest 2

User 1User 1

Vision: Knowledge Streams

10

Page 11: Knowledge Streams:  Stream Processing of Semantic Web Content

Goals

• Web-scale– Decentralized among multiple sites– Heterogenous implementations

• Long-lived, persistent connections– User accountability

• Introspection over the processing network for control and optimization– E.g. aggregating subscriptions– Balance with security, privacy, and autonomy

concerns

11

Page 12: Knowledge Streams:  Stream Processing of Semantic Web Content

Building Blocks

• RDF Content• Existing stream processing frameworks• Workflow systems• Publish/subscribe message oriented middleware

12

Page 13: Knowledge Streams:  Stream Processing of Semantic Web Content

RDF Payloads

• Malleable data– Standards-based graph structure– Can easily add, remove, and transform statements

• Self-describing– Unique naming via URIs– References to vocabularies and ontologies

• Potential for inference

13

Page 14: Knowledge Streams:  Stream Processing of Semantic Web Content

Workflow Systems

• Graphical environments for developing processing pipelines– Yahoo Pipes, DERI Pipes, SPARQLMotion– Nice user interfaces for development and execution

14

http://pipes.deri.org

Page 15: Knowledge Streams:  Stream Processing of Semantic Web Content

Semantic Complex Event Processing

• Complex Event Processing– One of the leading edges of rules technology – Formal specification of higher-level events in terms of lower-level

events• E.g. alert if the moving average increases 15% within a 10 minute window

– Engine can be compiled/optimized for a specific rule set– High-volume deployments in finance and other industries– Most implementations focus on self-contained tuples

• Semantic Complex Event Processing– Enrich CEP using Semantic Web technology– Emerging topic at recent conferences

• Early implementations– Wrappers around open source CEP engines– Native implementation

• Provides a powerful set of operators and engines for Knowledge Streams

15

Page 16: Knowledge Streams:  Stream Processing of Semantic Web Content

Implementation Approach

• Well-defined APIs for implementing operators• Operator execution containers

– Could encapsulate existing engines

• Start with manual processing network configuration, then automate

16

Page 17: Knowledge Streams:  Stream Processing of Semantic Web Content

Use Cases

• Dissemination of metadata for new satellite imagery

• Social network changes• Alerting of friends’ new publications• …

17

Page 18: Knowledge Streams:  Stream Processing of Semantic Web Content

Demo

• Processing using DERI Pipes with new operators– Ingest of #SemTechBiz tweets using Twitter

Streaming API– Conversion of JSON to RDF– Mapping to SIOC vocabulary using SWRL rules– Enrich by matching Twitter @handles with contacts– Persistent buffering using Java Message Service– Monitoring

18