BigDataEurope @BDVA Summit2016 1: The BDE Platform

57
BIG DATA EUROPE'S INTEGRATOR PLATFORM A ONE-STOP SOLUTION FOR BIG AND SMART DATA MANAGEMENT BDVA Summit 2016, Valencia 1 December 2016 Summit 2016

Transcript of BigDataEurope @BDVA Summit2016 1: The BDE Platform

Page 1: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BIG DATA EUROPE'S INTEGRATOR PLATFORM A ONE-STOP SOLUTION FOR BIG AND

SMART DATA MANAGEMENT

BDVA Summit 2016, Valencia1 December 2016

Summit 2016

Page 2: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Talk outline The BigDataEurope Project, Mission & BDVA Synergies The Big Data Integrator (BDI) platform

o Stakeholder Requirements o Architectureo Supported Componentso Beyond the State-of-the-Art

A look into the BDI platform [DEMO]6-déc.-16www.big-data-europe.eu

Page 3: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Supporting the Societal Domains with Big Data Technology

BigDataEurope Project

6-déc.-16www.big-data-europe.eu

Page 4: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BigDataEurope Action EC Horizon 2020 Coordination & Support Action

o ~5mio €, 2015-2017

Show societal value of Big Datao Across all societal challenges addressed by H2020

Lower barrier for using big data technologieso Effort to setup and deploy use-case workflows

o Lack of skills & expertise

Help establish data value chains across domains & orgs.

6-déc.-16www.big-data-europe.eu

Page 5: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Consortium

NCSRDEMOKRITOS

Page 6: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Stakeholder Engagement Cycle

Present action, showcase deployments

Raise awareness about BDE results, what they mean for stakeholders

Collect requirements to drive further development

6-déc.-16

www.big-data-europe.eu

M12M6 M18 M24 M30

Page 7: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Data Value Chain Evolution

6-déc.-16

Extraction, Curation Quality, Linking, Integration

Publication, Visualization, Analysis

Extraction, Curation, Quality, Linking, Integration, Publication,

Visualization, Analysis

HealthTransport

Security

Extraction Curation Quality Linking Integration Publication Visualization Analysis

Data Repositories

Linked Open Data

TIME

Food SocietiesClimate EnergyProprietary, ‘locked-in’solutions

OS Solutions,Big Data Stacks

www.big-data-europe.eu

Page 8: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Parallels to BDVA Mission

Task Force 6 (Technical)o SG1: Managemento SG2: Big Data Architectures and Infrastructures

The Big Data Integrator Platform (SG2)o Generic Architecture (Blueprint) & Instances

Smart Big Data Management (SG1)o Support for Semantic Components & Data Lakes

6-déc.-16www.big-data-europe.eu

Page 9: BigDataEurope @BDVA Summit2016 1: The BDE Platform

A flexible, generic platform for (Big) Data Value Chain Deployment

1. Stakeholder Requirements

Big Data Integrator

6-déc.-16www.big-data-europe.eu

Page 10: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Workshops

Requirement Elication

SUPPORTED BY BDVA Face-to-face interviews

Feedback from each 7 pilots, in 3 phases 7 held per year with

Societal Communities

Page 11: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Importance of Volume Importance of Velocity

Key Results from the Survey (I)

Page 12: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Importance of Variety Efficiency of Data Infrastructures

Key Results from the Survey (II)

Page 13: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Societal Data Value Chain Requirements

Page 14: BigDataEurope @BDVA Summit2016 1: The BDE Platform

A flexible, generic platform for (Big) Data Value Chain Deployment

2. Architecture

Big Data Integrator

6-déc.-16www.big-data-europe.eu

Page 15: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Big Data Integrator Architecture

Prototype developed by BDEo Incorporates existing BD technologyo Facilitates integration and deployment

Main points of the architectureo Dockerizationo Support layer, including integrated UIo Semantification layer

6-déc.-16www.big-data-europe.eu

Page 16: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Generic Architecture

6-déc.-16www.big-data-europe.eu

Plug-and-play BD Platform

Cloud-deployment ready

Domain independent, Customisable

Stacks Open Source solutions

BDI Prototype Releases

1. [July 2016]2. December 20163. ….

Page 17: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Docker containers

6-déc.-16www.big-data-europe.eu

Docker offers lightweight virtualizationo Containers can be shared/provisioned on different Linux variations/versions

Identical base systemo NOT Required

All BDI componentso Docker containers

Page 18: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Architectural design 1.1

Page 19: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Architectural design 1.219

Page 20: BigDataEurope @BDVA Summit2016 1: The BDE Platform

6-déc.-16www.big-data-europe.eu

Architectural design 1.3

Stack

Page 21: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Architectural design 1.4 (released)21

Page 22: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BDE vs Hadoop distributions

BDE is not built on top of existing distributionsTargets

o Communitieso Research institutions

Bridges scientists and open dataMulti-Tier research efforts towards Smart Data

22

Page 23: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BDE vs Hadoop distributionsHortonworks Cloudera MapR Bigtop BDE

File System HDFS HDFS NFS HDFS HDFS

Installation Native Native Native Native lightweight virtualization

Plug & play components (no rigid schema)

no no no no yes

High Availability Single failure recovery (yarn)

Single failure recovery (yarn)

Self healing, mult. failure rec.

Single failure recovery (yarn)

Multiple Failure recovery

Cost Commercial Commercial Commercial Free Free

Scaling Freemium Freemium Freemium Free Free

Addition of custom components Not easy No No No Yes

Integration testing yes yes yes yes --

Operating systems Linux Linux Linux Linux All

Management tool Ambari Cloudera manager MapR Control system

- Docker swarm UI+ Custom

23

Page 24: BigDataEurope @BDVA Summit2016 1: The BDE Platform

A flexible, generic platform for (Big) Data Value Chain Deployment

3. Supported Components

Big Data Integrator

6-déc.-16www.big-data-europe.eu

Page 25: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Dockerized Components

6-déc.-16www.big-data-europe.eu

Processing and storage componentso Re-used existing docker containers (where available)o Dockerized by BDE otherwiseo Ensuring all can be provisioned through Docker Swarm

Other Componentso Semantic Layero Support Layer

Page 26: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Data Acquisition: Apache Flume

Data Storage: Hue, Apache Cassandra, ScyllaDB, Apache Hive, Postgis Search/Indexing: Apache Solr Message Passing: Apache Kafka Data Processing: Spark, Flink Semantic Components: Sansa, Silk, Strabon, Sextant, GeoTriples,

Semagrow, Limes, 4Store, Openlink Virtuoso

BDI Docker Containers (..and counting)

6-déc.-16www.big-data-europe.eu

Page 27: BigDataEurope @BDVA Summit2016 1: The BDE Platform

A flexible, generic platform for (Big) Data Value Chain Deployment

4. In-use: Deployment & Installation

Big Data Integrator

6-déc.-16www.big-data-europe.eu

Page 28: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BDI User profiles28

Page 29: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Platform installation

Manual installation guide

Using Docker Machineo On local machine (VirtualBox)o In cloud (AWS, DigitalOcean, Azure)o Bare metal

Screencasts (Getting Starting with the Platform)

29

Page 30: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Developing a component

Base Docker imageso Serve as a template for a (Big Data) technologyo Easily extendable custom algorithm/data

Published componentso Responsibilities divided b/w partnerso Image repositories on GitHubo Automated builds on DockerHubo Documentation on BDE Wiki

30

Page 31: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Deploying a Big Data Stack

Stack: Collection of communicating components to solve a specific problem

Described in Docker Composeo Component configurationo Application topology

Orchestrator required for initialization processo Components may depend on each othero Components may require manual intervention

31

Page 32: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Support Layer (User Interfaces)

6-déc.-16www.big-data-europe.eu

Integrator UI o Web UIs from BDE dockers (including 3rd party components)

follow these BDE stylesheets

Stack Monitor Appo Workflow Buildero Workflow Monitor

Swarm UI o Allows scaling up/down multiple Docker instances

Stack

Page 33: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Integrator UI33

Page 34: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BDE Workflow Builder34

Page 35: BigDataEurope @BDVA Summit2016 1: The BDE Platform

BDE Workflow Monitor35

Page 36: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Swarm UI36

Page 37: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Demonstrating the ease-of-use in deploying custom instances of the BDI Platform

Recorded video showing an example available:https://www.youtube.com/watch?v=1zHIhFDDdCg

BDI Platform – A Demo

6-déc.-16www.big-data-europe.eu

Page 38: BigDataEurope @BDVA Summit2016 1: The BDE Platform

A flexible, generic platform for (Big) Data Value Chain Deployment

5. Beyond the State-of-the-Art

Big Data Integrator

6-déc.-16www.big-data-europe.eu

Page 39: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Smart Big Data

Increase Big Data value by adding meaning to it!

39

Page 40: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Quelle: Gesellschaft für Informatik

Variety – The most neglected V?

Data Source Heterogeneity

Lack of interoperability/semantics

Page 41: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Semantic Layer tools

6-déc.-16www.big-data-europe.eu

BDE tooling for Semantic Data Lake:o Swagger: Semantics of RESTful APIso Semantic Analytics Stack (SANSA):

Distributed data processing over large-scale Knowledge Graphs

o Semagrow: SPARQL over Big Data storeso Ontario: Querying over Semantic Data

Lakes

Page 42: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Semantic Layer

www.big-data-europe.eu

Semantic Data Lakes o Minimal ingestion

pre-processingo Semantic layer

maintains metadatao Add meaning when

retrieving/processing Data Lake: scalable unstructured data store

Relationship definitions and metadata

JSON-LD CSVW R2RMLXML2RDF

Ongoing Research for Semantic Big Data & Analytics

Knowledge Graphs

Page 43: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Ontario: Semantic Data Lakes

Repository of data in its raw formato Structured, semi-structured, unstructured

Schema-lesso No schema is defined on write, it is defined only on read

Open to any kind of processingAdd a Semantic layer on top of the source datasets

o Semantic data is handled as-iso Non-Semantic data is semantically lifted using existing

ontology terms

43

Page 44: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Ontario: Architecture44

Translate and execute Query via Source-specific Access Method

Decompose to Source-specific Entities

Decompose SPARQL Query

Page 45: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA: Semantic Analytics Stack Abundant machine readable structured information is

available (e.g. in RDF)o Across SCs, e.g. Life Science Data (OpenPhacts)o General: DBpedia, Google knowledge grapho Social graphs: Facebook, Twitter

Need for scalable querying, inference & MLo Link predictiono Knowledge base completiono Predictive analytics

45

Page 46: BigDataEurope @BDVA Summit2016 1: The BDE Platform
Page 47: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA Stack47

Page 48: BigDataEurope @BDVA Summit2016 1: The BDE Platform

More Information

Big Data Integrator:https://github.com/big-data-europe

README includes extensive documentation, instructions and information on supported components

6-déc.-16www.big-data-europe.eu

Page 49: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Free Workshops, Hangouts & Webinars

BigDataEurope Activities

6-déc.-16www.big-data-europe.eu

Page 50: BigDataEurope @BDVA Summit2016 1: The BDE Platform

2nd round of Societal Workshops

6-déc.-16www.big-data-europe.eu

Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop

Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-20 stakeholder consultation

Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage”

Climate 11 October 2016 Brussels Collocated with Melodies Project Event – Exploiting Open Data

Security 18 October 2016 Brussels Standalone WorkshopSocieties 5 December 2016 Cologne Collocated with EDDI16- 8th Annual

European DDI User Conference Health 9 December 2016 Brussels Standalone Workshop

Page 51: BigDataEurope @BDVA Summit2016 1: The BDE Platform

Other Activities

Fresh set (7) of Societal Workshops in 2017

Various SC-focussed and general hangouts, follow!o Apache Flink & BDE (20 Oct) – available onlineo BDVA & BDE Webinar planned early next yearo Keep track on BDE Website (Events)

6-déc.-16www.big-data-europe.eu

Page 52: BigDataEurope @BDVA Summit2016 1: The BDE Platform

WEB: www.big-data-europe.eu EMAIL: [email protected]

BIG DATA INTEGRATOR www.github.com/big-data-europe

PROJECT COORDINATION (Fraunhofer IAIS)Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · deEIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany

Questions & Contacts

www.big-data-europe.eu6-déc.-16

#BigDataEurope

leads the FraunhoferBig Data Alliance

Page 53: BigDataEurope @BDVA Summit2016 1: The BDE Platform

6-déc.-16www.big-data-europe.eu

Page 54: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA: Read Write Layer

Ingest RDF and OWL data in different formatsusing Jena / OWL API style interfaces

Represent data in multiple formats (e.g. RDD, DataFrames, GraphX, Tensors)

Allow transformation among these formatsCompute dataset statistics and apply functions to

URIs, literals, subjects, objects → DistributedLODStats

54

Page 55: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA: Query Layer

To make generic queries efficient and fast using:o Intelligent indexingo Splitting strategieso Distributed Storageo Distributed/ Federated Querying

Early work in progress: query evaluation (SPARQL-to-SQL approaches, Virtual Views)

Provision of W3C SPARQL compliant endpoint

55

Page 56: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA: Inference Layer

W3C Standards for Modelling: RDFS andOWL

Parallel in-memory inference via rule-basedforward chaining

Beyond state of the art: dynamically build arule dependency graph for a rule set

→ Adjustable performance levels

56

Page 57: BigDataEurope @BDVA Summit2016 1: The BDE Platform

SANSA: ML Layer

Distributed Machine Learning (ML) algorithms thatwork on RDF data and make use of its structure /semantics

Work in Progress:o Tensor Factorization for e.g. KB completion (testing stage)o Simple spatiotemporal analytics (idea stage)o Graph Clustering (testing stage)o Association rule mining (evaluation stage)o Semantic Decision trees (idea stage)

57