PROTEUS H2020 at Ficloud2016
-
Upload
nacho-garcia-fernandez -
Category
Engineering
-
view
388 -
download
0
Transcript of PROTEUS H2020 at Ficloud2016
An incremental approach for real-time Big Data visualanalytics
Nacho Garcıa Fernandez
Treelogic S.L.
August 23, 2016
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 1 / 23
About me
Academics
BSc in Computer Science
MSc in Computer Science
PhD Student
Professional
R&D Engineer at Treelogic S.L
Lecturer at Master of Big Data (KSchool)
Others
Computer security enthusiast
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 2 / 23
About my company: Treelogic
R&D intensive company with the mission of adapting technologicalknowledge to improve quality standards in our daily life
8 ongoing H2020 projects (coordinating 3 of them)
8 ongoing FP7 projects (coordinating 5 of them)
Focused on providing Big Data Analytics in all the world
Internal Organisation
Research Lines
Big Data
Computer Vision
Data Science
Social Media Analysis
Security
ICT Solutions
Security & Safety
Justice
Health
Transport
Financial Services
ICT tailored solutions
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 3 / 23
Overview
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 4 / 23
Outline
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 5 / 23
PROTEUS
PROTEUS: Scalable Online Machine Learning and Real-TimeInteractive Visual Analytics
Funding program: H2020 project
Duration: 36 months (Dic, 2016 - Nov, 2018)
Consortium: Treelogic (Coordinator), ArcelorMittal, DFKI, Novelti,Bournemouth University, Trilateral Research
What is PROTEUS about? Its mission is to investigate and
develop ready-to-use scalable online machine learning algorithms and
real-time interactive visual analytics to deal with extremely large data
sets and data streams.
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 6 / 23
Content of this talk
What this talk is aboutBig Data scalable architectureReal-time processingIncremental processingReal-time visualization
What this talk is not aboutMachine Learning
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 7 / 23
Outline
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 8 / 23
Big Data : Introduction
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 9 / 23
Big Data: real-time processing & visualization
Big Data real-time processing engines
They are usually general-purpose data analytics systems
Di↵erent stream process approaches: micro-batches, flexiblewindows, etc.
They provide multi-purpose libraries that work on top ofthem
They are also compatible with many distributed data sources:HBase, Apache Kafka, RabbitMQ, S3, etc.
Open source
Big Data Visual Analytics
Provide connectors to access and visualize previously storeddata from (almost) anywhere
Allow users to create and customize their dashboards with apredefined set of charts
Make data understandable so decisions can be driven by data
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 10 / 23
Big Data and visual analytics challenges
Big data processing frameworks
Processing time still depends on data volume
You need to write two di↵erent programs: back-end (big data) andfront-end (visualization)
Visual Analytics tools
Non-customizable visualization methods
Non-Open Source licenses
Most of them require to move data to the cloud
Installation and configuration process: rocket science
Bad performance with Big Data
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 11 / 23
Outline
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 12 / 23
Solution?
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 13 / 23
PROTEUS: Architecture
Enables you to process & visualize data in real timeAlmost the whole program is written in the backend
Component-based architectureData collectorIncremental processing engineReal-time visualization library
It is based on existing solutions (Apache Kafka, Apache Flink andD3).
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 14 / 23
PROTEUS: Incremental algorithms
Allow end-users to obtain results in real-time
Avoid recomputing whole data volumes after every small change
Create very interactive applications
Allow decision making in real-time
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 15 / 23
PROTEUS: Data collector
It is in charge of collecting data from di↵erentsources.
When new data is generated/available, a KafkaProducer stores it in a distributed Kafka cluster.
Serializes, compresses and encrypts data, if needed.
Splits data streams into chunks and generateswindows
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 16 / 23
PROTEUS: Incremental processing engine
Receives data in chunks from the previouscomponent
It performs an incremental operation for each data
chunkArithmetics & Statistics: sum, multiply, divide, average,covariance, pearson correlation, etc.Others: Sorting, cleaning, filtering, etc.
Returns an output for each data chunk
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 17 / 23
PROTEUS: Real-time visualization library
Receives outcomes from the previous component
Enables data visualization in real time
Allows users to easily interact and explore data
Wide range of graphsClassical charts: Linechart, Barchart, Piechart, etc.Novel charts: Streamgraph, Swimlane, Gauge, Sunburst
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 18 / 23
Outline
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 19 / 23
Conclusions
Allows users to easily write back-end and front-end programs all in one
It allows users to not only learn the final result, but also to visualizeintermediate outcomes.
Enable decision making before knowing the final resultReal-time data exploration and visualization
Open-source solution1. Feel free to contribute!
1http://github.com/proteus-h2020
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 20 / 23
Outline
1 Introduction
2 State-of-the-art
3 PROTEUS
4 Conclusions
5 Work in progress
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 21 / 23
Future Work
Work in progress
Extend the current incremental operationsInclude not only arithmetics and statistic operations, but also onlinemachine learning techniques.Anomaly detection
Extend the visualization library.Include new interactive chartsSupport Canvas image renderingResearch in new visualization methods and techniques
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 22 / 23
That’s all!
Contact us: [email protected]
0xNacho
0xNacho
Nacho Garcıa Fernandez (Treelogic S.L.) BigR&I 2016 August 23, 2016 23 / 23