Big Data BluePrint
-
Upload
daan-gerits -
Category
Data & Analytics
-
view
427 -
download
1
Transcript of Big Data BluePrint
Big Data BluePrintArchitect for change
@daangerits#bdbp
Who am I?
Agenda
ConceptsArchitecture
Examples
Concepts
TransCo
Meet TransCo - Parcel delivery service
Common interactions
A customer requesting a quote
A website visitor clicking on a link
Booking a financial transaction
A delivery truck pinging its GPS coördinates
TransCo
All these have a similar thing:
Events
ITFinanceLegalLogisticsSalesCommunications...
Events
Events used to manipulate our master data
Events
Today, events ARE our master data
Anatomy of an event
Timestamp
When did it happen?
Origin
Where did it came from?
Actor
Who did it?
Subject
Who was affected?
Facts
What changed?
Event
Anatomy of an event - example
2014-05-0313:40:51
timestamp
CRM Application
origin
Daan Gerits
actor
Alfred Hitchcock
subject
street=”...”vat=”...”
facts
Event
Architecture
Store
View Generator
View Generator
Overview
Translate entities into events and
facts.
Resolve values to ids. Especially
subject, actor and origin.
Explode a single fact to multiple
rollup levels. Only explode if applicable.
Store the raw events so we can replay whenever
we want.
DetonatorLinkerTranslator
Ingest View generators can perform analytical tasks on the incoming events.
The generated view can be stored in a storage system of choice.
S
I
T L D
V
V
Ingest
S
I
T L D
V
V
Get records in from other systems
- Event Bus/Broker
- Ingestion System like Flume / Sqoop / …
- ETL processes (not recommended)
- Backups
- Nagios / Statsd / Ganglia / ...
Translator
Convert records into events- 1 record field = 1 fact- record timestamp vs generated timestamp
Only store changed facts- What changed?- Compare with existing views
S
I
T L D
V
V
Store
Persist the events as they are
Raw Data- Source of truth- Recovery
Optimize Storage- Parquet, Avro, Thrift, ...
S
I
T L D
V
V
Linker
Resolve event fields- “Daan Gerits” == id 44543-45436-9928
Optimize for speed- Use lookup tables- Group data if needed
S
I
T L D
V
V
Detonator
Explode a fact to multiple rollup levels
Why?- Real-time rollups- Running analytics
When?- if there is an hierarchy in actor or actee- if there is an hierarchy in timestamp
S
I
T L D
V
V
IN OUT
{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}{ts: 2014-05, fact: …}
{ts: 2014, fact: …}
View Generator
Use facts to generate a view
A view is- != database view- read-only- optimised data model for a single purpose- disposable- based on all facts (facts depth & width)
A view generator manipulates- RDBMs, graphs, search indexes, ...
S
I
T L D
V
V
Rules of the game
Only add and remove are allowed
Events are re-playable
Remove only be done by BDA’s (Big Data Administrators)
Example
Add Customer
IN:processing system: CRM
user: “fbaker”
data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
Update Customer
IN:processing system: ERP
user: “wvl”
data: { id: “9332-DG”, address: “container 24” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
DELETE Customer
IN:processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
DATA:event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
Aaaarrgghhh!!
IN:processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
event ID origin actor subject timestamp fact value
1 crm fbaker 9332-DG 20140514 name Daan Gerits
1 crm fbaker 9332-DG 20140514 address container 9
39 erp wvl 9332-DG 20141109 address container 24
63 erp fbaker 9332-DG 20141201 address
63 erp fbaker 9332-DG 20141201 name
64 erp wvl 9332-DG 20141109 address container 24
64 crm fbaker 9332-DG 20140514 name Daan Gerits
Allows fact trendingdriver statistics for his whole career
Allows state regenerationthe state of all facts on februari 12, 2005
Is human-error-proofremove the facts with eventId #
Scales very well
Conclusion
We don’t hire datascientists, architects, developers, ux designers
or engineers.We hire individuals
Sh
am
ele
ss P
lug
Th
an
k Yo
u!