Time Series With OrientDB - Fosdem 2015

Time flows, on Graph

Managing event sequences and time series with a Document-Graph Database

FOSDEM 2015

Enrico Risa

Orient Technologies LTD

Twitter: @wolf4ood

Emanuele Tagliaferri

Twitter: @tglman

Time What…?

Time series: A time series is a sequence of data points, typicallyconsisting of successive measurements made over atime interval (Wikipedia)

Time What…?

Event sequences:

• A set of events with a timestamp

• A set of relationships “happenedbefore/after”

• Cause and effect relationships

Graph approaches

•. Nodes/Edges

•. Index free adjacency

•. Fast traversal

•. Dynamic structure

Graph approaches

Linked sequence

e1e1 e2e2next

e3e3next

e4e4next

e5e5next

(timestamp on vertex)

Graph approaches

linked sequence (tag based)

e1e1 e2e2

nextTag1

nextTag2

e4e4nextTag1

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Graph approaches

Hierarchy

e1e1 e2e2 e60

22 6060…

Minutes

Seconds

Graph approaches

e1e1 e2e2 e60

22 6060…

Minutes

Seconds

Current approaches

Advantages

•. Flexible

•. Events can be connected together in different ways

•. You can navigate events following a path by time ortag.

Current approaches

Disadvantages

•. Slow query for a high number of event

Optimization

● Data Pre-Aggregation

Optimization

Pre-aggregate

22 6060…

Minutes

…Graph

Optimization

Pre-aggregate

22 6060…

Minutes

…Graph

Optimization

Pre-aggregate

22 6060…

Minutes

…Graph

Optimization

Aggregation logic

• Second 0 -> insert

• …

• Second 59 -> insert + aggregate update– Write aggregate value on minute vertex

● Minute == 59? Calculate aggregate on hour vertex

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript),executed when DB operations happen (eg. Insert orupdate)

Java interface:

Public RESULT onBeforeInsert(…);

public void onAfterInsert(…);public RESULT onBeforeUpdate(…);

public void onAfterUpdate(…);

Optimization

22 6060…

Minutes

sum = 1000

sum = 15000

sum = 300

incomplete

complete

sum = null

Optimization

Query logic:

• Traverse from root node to specified level(filtering based on vertex data)

• Is there aggregate value?

– Yes: return it

– No: go one level down and do the same

Aggregation on a level will be VERY fast if youhave horizontal edges!

OrientDB

How to calculate aggregate values with a query

Input params:

- Root node (suppose it is #11:11)

select sum(aggregateVal) from (

traverse out() from #11:11

while in().aggregateVal is null

With the same logic you can query based on timewindows

Time Series Proof of Concept

POC Implementation

Core:● As OrientDB Plugin

● Rely on Hooks

● Aggregation Engine

● Handle all Time Unit

Data Visualization:

● Simple UI (Realtime/History)

● Query in Studio

● Plugin that register hook and some input/outputsource (websocket ,message queue, socket etc..)

● Hook on Event Class (entry point)

- Event can be saved or not.- Aggregations are made when the lower time units changes- Pre-allocation of TimeUnit Pointers

● Time unit tracked:-Year-Month-Day-Minute-Second

Advantages

● Simple (Few lines of code)

● No Indexes

● Easy to use

– Plain OrientDB sql to insert an eventinsert into event set bets = 1, cpu = 50

● Fast (Especially in plocal mode)

Disadvantages

● Too Simple (For now)

● Aggregator hardcoded (Maybejavascript aggregator?)

Data Visualization

Two Charts:

● Realtime data through WebSocket

The engine pushes the events received every seconds

● Range query for history Data

Using the powerfull array range notation we can query fora specific time range

Let's Run It

Data Query Time unit

● Array Notation

selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015

● Traverse with Next

traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3

Data Query Aggregation

● Array Notation

select sum(bets)from (selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015)

● Traverse with Next

select sum(bets)from {traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3)

Multi-Model Optimization!We got OrientDB

• Document database (schema-free, complexproperties)

• Graph database (index-free adjacency, fast traversal)

• SQL (extended)

• Operational (schema - ACID)

• OO concepts (Classes, inheritance, polymorphism)

• REST/JSON interface

• Native Javascript (extend query language, exposeservices, event hooks)

• Distributed (Multi-master replica/sharding)architecture

● Studio 2.0

● Lucene & ETL in bundle

● WAL management (Fuzzy Checkpoint)

● Schema Driven Serialization

● Autosharding strategy on Distributed

OrientDB

First step: put them together

22 6060…

Minutes

{0: 1000,1: 1500.…59: 96

OrientDB

First step: put them together

22 6060…

Minutes

{0: 1000,1: 1500.…59: 96

<- IT’S A VERTEX TOO!!!

Document

OrientDB

put them together

Hours…

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

Document

Where should I stop?

It depends on my domain andrequirements.

OrientDB

Third step: Complex domains

11 22 6060…

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags: [tag1, tag2]…

Document <- Enrich the domain

One model is not enough

One of most common issues of my customersis:

“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

Thank you!

Enrico Risa

Twitter: @wolf4ood

Emanuele Tagliaferri

Twitter: @tglman

Time Series With OrientDB - Fosdem 2015

Data & Analytics

Transcript of Time Series With OrientDB - Fosdem 2015

OrientDB a database for the Web

OrientDB & Lucene

OCRFeeder (FOSDEM 2010)

OrientDB – The Multi-Model and Graph Database · Introduction to NoSQL, graph databases and OrientDB Introduction OrientDB is a multi-model database. It is based on a NoSQL engine

Download OrientDB Tutorial

Aptoide - FOSDEM

Time Series With OrientDB - Fosdem 2015

Lucas Nussbaum lucas.nussbaum@univ-lorraine.fr Licence ...F Prometheus - A Next Generation Monitoring System (FOSDEM 2016) F Alerting with Time Series (FOSDEM 2017) F Deploying Prometheus

FOSDEM 2012 - wolfSSL · info@yassl.com Technical / Community Update! FOSDEM 2012

Foreman Fosdem

Crazyflie FOSDEM 2015

OrientDB & Node.js Overview - JS.Everywhere() KW

Présentation fosdem v0.8.0

SigDigger - FOSDEM

Intoduction to OrientDB

OrientDB the graph database

OrientDB Codemotion 2014

OrientDB - the 2nd generation of (Multi-Model) NoSQL

OrientDB - Voxxed Days Berlin 2016

Works with persistent graphs using OrientDB