Big Data BluePrint

27
Big Data BluePrint Architect for change @daangerits #bdbp

Transcript of Big Data BluePrint

Page 1: Big Data BluePrint

Big Data BluePrintArchitect for change

@daangerits#bdbp

Page 2: Big Data BluePrint

Who am I?

@[email protected]

Page 3: Big Data BluePrint

Agenda

ConceptsArchitecture

Examples

Page 4: Big Data BluePrint

Concepts

Page 5: Big Data BluePrint

TransCo

Meet TransCo - Parcel delivery service

Page 6: Big Data BluePrint

Common interactions

A customer requesting a quote

A website visitor clicking on a link

Booking a financial transaction

A delivery truck pinging its GPS coördinates

Page 7: Big Data BluePrint

TransCo

All these have a similar thing:

Events

ITFinanceLegalLogisticsSalesCommunications...

Page 8: Big Data BluePrint

Events

Events used to manipulate our master data

Page 9: Big Data BluePrint

Events

Today, events ARE our master data

Page 10: Big Data BluePrint

Anatomy of an event

Timestamp

When did it happen?

Origin

Where did it came from?

Actor

Who did it?

Subject

Who was affected?

Facts

What changed?

Event

Page 11: Big Data BluePrint

Anatomy of an event - example

2014-05-0313:40:51

timestamp

CRM Application

origin

Daan Gerits

actor

Alfred Hitchcock

subject

street=”...”vat=”...”

facts

Event

Page 12: Big Data BluePrint

Architecture

Page 13: Big Data BluePrint

Store

View Generator

View Generator

Overview

Translate entities into events and

facts.

Resolve values to ids. Especially

subject, actor and origin.

Explode a single fact to multiple

rollup levels. Only explode if applicable.

Store the raw events so we can replay whenever

we want.

DetonatorLinkerTranslator

Ingest View generators can perform analytical tasks on the incoming events.

The generated view can be stored in a storage system of choice.

S

I

T L D

V

V

Page 14: Big Data BluePrint

Ingest

S

I

T L D

V

V

Get records in from other systems

- Event Bus/Broker

- Ingestion System like Flume / Sqoop / …

- ETL processes (not recommended)

- Backups

- Nagios / Statsd / Ganglia / ...

Page 15: Big Data BluePrint

Translator

Convert records into events- 1 record field = 1 fact- record timestamp vs generated timestamp

Only store changed facts- What changed?- Compare with existing views

S

I

T L D

V

V

Page 16: Big Data BluePrint

Store

Persist the events as they are

Raw Data- Source of truth- Recovery

Optimize Storage- Parquet, Avro, Thrift, ...

S

I

T L D

V

V

Page 17: Big Data BluePrint

Linker

Resolve event fields- “Daan Gerits” == id 44543-45436-9928

Optimize for speed- Use lookup tables- Group data if needed

S

I

T L D

V

V

Page 18: Big Data BluePrint

Detonator

Explode a fact to multiple rollup levels

Why?- Real-time rollups- Running analytics

When?- if there is an hierarchy in actor or actee- if there is an hierarchy in timestamp

S

I

T L D

V

V

IN OUT

{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}{ts: 2014-05, fact: …}

{ts: 2014, fact: …}

Page 19: Big Data BluePrint

View Generator

Use facts to generate a view

A view is- != database view- read-only- optimised data model for a single purpose- disposable- based on all facts (facts depth & width)

A view generator manipulates- RDBMs, graphs, search indexes, ...

S

I

T L D

V

V

Page 20: Big Data BluePrint

Rules of the game

Only add and remove are allowed

Events are re-playable

Remove only be done by BDA’s (Big Data Administrators)

Page 21: Big Data BluePrint

Example

Page 22: Big Data BluePrint

Add Customer

IN:processing system: CRM

user: “fbaker”

data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

Page 23: Big Data BluePrint

Update Customer

IN:processing system: ERP

user: “wvl”

data: { id: “9332-DG”, address: “container 24” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

Page 24: Big Data BluePrint

DELETE Customer

IN:processing system: ERP

user: “fbaker”

data: { id: “9332-DG” }

DATA:event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

63 erp fbaker 9332-DG 20141201 address

63 erp fbaker 9332-DG 20141201 name

Page 25: Big Data BluePrint

Aaaarrgghhh!!

IN:processing system: ERP

user: “fbaker”

data: { id: “9332-DG” }

event ID origin actor subject timestamp fact value

1 crm fbaker 9332-DG 20140514 name Daan Gerits

1 crm fbaker 9332-DG 20140514 address container 9

39 erp wvl 9332-DG 20141109 address container 24

63 erp fbaker 9332-DG 20141201 address

63 erp fbaker 9332-DG 20141201 name

64 erp wvl 9332-DG 20141109 address container 24

64 crm fbaker 9332-DG 20140514 name Daan Gerits

Page 26: Big Data BluePrint

Allows fact trendingdriver statistics for his whole career

Allows state regenerationthe state of all facts on februari 12, 2005

Is human-error-proofremove the facts with eventId #

Scales very well

Conclusion

Page 27: Big Data BluePrint

We don’t hire datascientists, architects, developers, ux designers

or engineers.We hire individuals

Sh

am

ele

ss P

lug

Th

an

k Yo

u!