Using Riak for Events storage and analysis at...

119
Using Riak for Events storage and analysis at Booking.com Damien Krotkine

Transcript of Using Riak for Events storage and analysis at...

Page 1: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Using Riak for Events storage and analysis at Booking.com

Damien Krotkine

Page 2: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Damien Krotkine

• Software Engineer at Booking.com

• github.com/dams

• @damsieboy

• dkrotkine

Page 3: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 4: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

KEY FIGURES

• 600,000 hotels

• 212 countries

• 800,000 room nights every 24 hours

• guest reviews 43 million+

• offices worldwide 155+

• 8,600 people

• not a small website…

Page 5: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

INTRODUCTION

Page 6: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 7: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 8: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 9: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 10: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APImobi

Page 11: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APImobi

front

end

Page 12: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

www APIfro

nten

dba

cken

dmobi

events storage

events: info about subsystems status

Page 13: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

back

end

web mobi api

databases

caches

load balancersavailability

cluster

email

etc…

Page 14: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT IS AN EVENT ?

Page 15: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT STRUCTURE

• Provides info about subsystems

• Data

• Deep HashMap

• Timestamp

• Type + Subtype

• The rest: specific data

• Schema-less

Page 16: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

{ timestamp => 12345, type => 'WEB', subtype => 'app', dc => 1, action => { is_normal_user => 1, pageview_id => '188a362744c301c2', # ... }, tuning => { the_request => 'GET /display/...' bytes_body => 35, wallclock => 111, nr_warnings => 0, # ... }, # ... }

Page 17: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

{ type => 'FAV', subtype => 'fav', timestamp => 1401262979, dc => 1, tuning => { flatav => { cluster => '205', sum_latencies => 21, role => 'fav', num_queries => 7 } } }

Page 18: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENTS FLOW PROPERTIES

• Read-only

• Schema-less

• Continuous, sequential, timed

• 15 K events per sec

• 1.25 Billion events per day

• peak at 70 MB/s, min 25MB/s

• 100 GB per hour

Page 19: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

Page 20: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

ASSESS THE NEEDS

• Before thinking about storage

• Think about the usage

Page 21: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 22: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

• Graph in real-time ( few seconds lag )

• Graph as many systems as possible

• General platform health check

Page 23: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

Page 24: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

GRAPHS

Page 25: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DASHBOARDS

Page 26: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

META GRAPHS

Page 27: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 28: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DECISION MAKING

• Strategic decision ( use facts )

• Long term or short term

• Technical / Non technical Reporting

Page 29: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 30: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

SHORT TERM ANALYSIS

• From 10 sec ago -> 8 days ago

• Code deployment checks and rollback

• Anomaly Detector

Page 31: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

USAGE

1. GRAPHS 2. DECISION MAKING 3. SHORT TERM ANALYSIS 4. A/B TESTING

Page 32: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

A/B TESTING

• Our core philosophy: use facts

• It means: do A/B testing

• Concept of Experiments

• Events provide data to compare

Page 33: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT AGGREGATION

Page 34: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EVENT AGGREGATION

• Group events

• Granularity we need: second

Page 35: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

SERIALIZATION

• JSON didn’t work for us (slow, big, lack features)

• Created Sereal in 2012

• « Sereal, a new, binary data serialization format that provides high-performance, schema-less serialization »

• https://github.com/Sereal/Sereal

Page 36: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

eventeventevents storage

eventevent

even

t

eventevent

eventevent

event

even

t

event event

Page 37: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

e ee e

ee

e e

e ee

e eee

ee

e

LOGGER

e

e

Page 38: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web apieeeeee

eee

e ee e

ee

e e

e ee

e eee

ee

ee

e

Page 39: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

ee

e

Page 40: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

e

1 sec

e

e

Page 41: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeeeeee

eeeeeeeee

eeeeeee

e ee e

ee

e e

e ee

e eee

ee

e

1 sec

e

e

Page 42: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbseeeee

eeeee

eeeee

1 sec

events storage

Page 43: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbs

ee

eeeee

eeeee

1 sec

events storage

ee e reserialize + compress

Page 44: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

events storage

LOGGER …LOGGER LOGGER

Page 45: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

STORAGE

Page 46: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT WE WANT

• Storage security

• Mass write performance

• Mass read performance

• Easy administration

• Very scalable

Page 47: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WE CHOSE RIAK

• Security: cluster, distributed, very robust

• Good and predictable read / write performance

• The easiest to setup and administrate

• Advanced features (MapReduce, triggers, 2i, CRDTs …)

• Riak Search

• Multi Datacenter Replication

Page 48: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 49: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CLUSTER

• Commodity hardware • All nodes serve data • Data replication

• Gossip between nodes • No master • distributed system

Ring of servers

Page 50: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 51: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hash(key)

Page 52: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

KEY VALUE STORE

• Namespaces: bucket

• Values: opaque or CRDTs

Page 53: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RIAK: ADVANCED FEATURES

• MapReduce

• Secondary indexes (2i)

• Riak Search

• Multi DataCenter Replication

Page 54: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

MULTI-BACKEND FOR STORAGE

• Bitcask

• Eleveldb

• Memory

Page 55: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

BACKEND: BITCASK

• Log-based storage backend

• Append-only files (AOF files)

• Advanced expiration

• Predictable performance (1 disk-seek max)

• Perfect for sequential data

Page 56: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CLUSTER CONFIGURATION

Page 57: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK SPACE NEEDED

• 8 days

• 100 GB per hour

• Replication 3

• 100 * 24 * 8 * 3

• Need 60 T

Page 58: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HARDWARE

• 12 then 16 nodes (soon 24)

• 12 CPU cores ( Xeon 2.5Ghz)

• 192 GB RAM

• network 1 Gbit/s

• 8 TB (raid 6)

• Cluster total space: 128 TB

Page 59: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DATA DESIGN

Page 60: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

web api dbs

ee

eeeee

eeeee

1 sec

events storage

1 blob per EPOCH / DC / CELL / TYPE / SUBTYPE 500 KB max chunks

Page 61: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DATA

• Bucket name: “data“

• Key: “12345:1:cell0:WEB:app:chunk0“

• Value: List of events (Hashmaps), serialized & compressed

• 200 keys per seconds

Page 62: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

METADATA

• Bucket name: “metadata“

• Key: <epoch>-<dc> “1428415043-2“

• Value: list of data keys:[ “1428415043:1:cell0:WEB:app:chunk0“, “1428415043:1:cell0:WEB:app:chunk1“ … “1428415043:4:cell0:EMK::chunk3“ ]

• As pipe separated value (PSV)

Page 63: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WRITE DATA

Page 64: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

PUSH DATA IN

• In each DC, in each cell, Loggers push to Riak

• Use ProtoBuf

• Every seconds:

• Push data values to Riak, async

• Wait for success

• Push metadata

Page 65: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

JAVA

Bucket DataBucket = riakClient.fetchBucket("data").execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk0", Data1).execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk1", Data2).execute(); DataBucket.store("12345:1:cell0:WEB:app:chunk2", Data3).execute();

Bucket MetaDataBucket = riakClient.fetchBucket("metadata").execute(); MetaDataBucket.store("12345-1", metaData).execute(); riakClient.shutdown();

Page 66: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Perl

my $client = Riak::Client->new(…);

$client->put(data => '12345:1:cell0:WEB:app:chunk0', $data1); $client->put(data => '12345:1:cell0:WEB:app:chunk1', $data2); $client->put(data => '12345:1:cell0:WEB:app:chunk2', $data3);

$client->put(metadata => '12345-1', $metadata, 'text/plain' );

Page 67: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ DATA

Page 68: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ ONE SECOND

• For one second (a given epoch)

• Request metadata for <epoch>-DC

• Parse value

• Filter out unwanted types / subtypes

• Fetch the keys from the “data” bucket

Page 69: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Perl

my $client = Riak::Client->new(…); my @array = split '\|', $client->get(metadata => '1428415043-1'); @filtered_array = grep { /WEB/ } @array; $client->get(data => $_) foreach @filtered_array;

Page 70: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

READ ONE SECOND

• For an interval epoch1 -> epoch2

• Generate the list of epochs

• Fetch in parallel

Page 71: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RIAK CLUSTER

Page 72: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CPU

Page 73: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK IOPS

Page 74: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK IO %

Page 75: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISK SPACE RECLAIMED

one day

Page 76: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REAL TIME PROCESSING OUTSIDE OF RIAK

Page 77: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

STREAMING

• Fetch 1 second every second

• Or a range ( ex: last 10 min )

• Fetch all epochs from Riak

• Use data on the client side

Page 78: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

EXAMPLES

• Streaming => Graphite ( every sec )

• Streaming => Anomaly Detector ( last 2 min )

• Streaming => Experiment analysis ( last day )

• Every minute => Hadoop

• Manual request => test, debug, investigate

• Batch fetch => ad hoc analysis

• => Huge numbers of read requests

Page 79: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

events storage

graphite cluster

Anomaly detector

experimentcluster

hadoop cluster

mysql analysis

manual requests

50 MB/s

50 MB/s

50 M

B/s 50 MB/s

50 MB/s50 MB/s

Page 80: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THIS IS REALTIME

• 1 second of data

• Stored in < 1 sec

• Available after < 1 sec

• Issue : network saturation

Page 81: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REAL TIME PROCESSING INSIDE RIAK

Page 82: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE IDEA

• Instead of

• Fetch data,

• Crunch data (ex: average),

• Produce a small result

• Do

• Bring code to data

• Crunch data on Riak

• Fetch the result

Page 83: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WHAT TAKES TIME

• Takes a lot of time

• Fetching data out: network issue

• Decompressing: CPU time issue

• Takes almost no time

• Crunching data

Page 84: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

MAPREDUCE

• Input: epoch-dc

• Map1: metadata keys => data keys

• Map2: data crunching

• Reduce: aggregate

• Realtime: OK

• network usage: OK

• CPU time: NOT OK

Page 85: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HOOKS

• Every time metadata is written

• Post-Commit hook triggered

• Crunch data on the nodes

Page 86: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •
Page 87: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Riak post-commit hook

REST serviceRIAK service

key keysocket

result sent for storage

decompressprocess all tasks

NODE HOST

Page 88: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

HOOK CODE

metadata_stored_hook(RiakObject) -> Key = riak_object:key(RiakObject), Bucket = riak_object:bucket(RiakObject), [ Epoch, DC ] = binary:split(Key, <<"-">>), MetaData = riak_object:get_value(RiakObject), DataKeys = binary:split(MetaData, <<"|">>, [ global ]), send_to_REST(Epoch, Hostname, DataKeys), ok.

Page 89: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

send_to_REST(Epoch, Hostname, DataKeys) -> Method = post, URL = "http://" ++ binary_to_list(Hostname) ++ ":5000?epoch=" ++ binary_to_list(Epoch), HTTPOptions = [ { timeout, 4000 } ], Options = [ { body_format, string }, { sync, false }, { receiver, fun(ReplyInfo) -> ok end } ], Body = iolist_to_binary(mochijson2:encode( DataKeys )), httpc:request(Method, {URL, [], "application/json", Body}, HTTPOptions, Options), ok.

Page 90: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

REST SERVICE

• In Perl, using PSGI (WSGI-like), Starman, preforks

• Allow to write data cruncher in Perl

• Also supports loading code on demand

Page 91: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

ADVANTAGES

• CPU usage and execution time can be capped

• Data is local to processing

• Two systems are decoupled

• REST service written in any language

• Data processing done all at once

• Data is decompressed only once

Page 92: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

DISADVANTAGES

• Only for incoming data (streaming), not old data

• Can’t easily use cross-second data

• What if the companion service goes down ?

Page 93: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

FUTURE

• Use this companion to generate optional small values

• Use Riak Search to index and search those

Page 94: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH PROBLEM

Page 95: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• PUT - bad case

• n_val = 3

• inside usage = 3 x outside usage

Page 96: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• PUT - good case

• n_val = 3

• inside usage = 2 x outside usage

Page 97: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - bad case

• inside usage = 3 x outside usage

Page 98: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - good case

• inside usage = 2 x outside usage

Page 99: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• network usage ( PUT and GET ): • 3 x 13/16+ 2 x 3/16= 2.81 • plus gossip • inside network > 3 x outside network

Page 100: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• Usually it’s not a problem • But in our case: • big values, constant PUTs, lots of GETs • sadly, only 1 Gbit/s

• => network bandwidth issue

Page 101: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

Page 102: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

1. Optimize GET for network usage, not speed 2. Don’t choose a node at random

Page 103: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - bad case

• n_val = 1

• inside usage = 1 x outside

Page 104: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• GET - good case

• n_val = 1

• inside usage = 0 x outside

Page 105: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WARNING

• Possible only because data is read-only

• Data has internal checksum

• No conflict possible

• Corruption detected

Page 106: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RESULT

• practical network usage reduced by 2 !

Page 107: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE BANDWIDTH SOLUTIONS

1. Optimize GET for network usage, not speed 2. Don’t choose a node at random

Page 108: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• bucket = “metadata” • key = “12345”

Page 109: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

• bucket = “metadata”

• key = “12345”

Hash = hashFunction(bucket + key)

RingStatus = getRingStatus

PrimaryNodes = Fun(Hash, RingStatus)

Page 110: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hashFunction()

getRingStatus()

Page 111: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

hashFunction()

getRingStatus()

Page 112: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE IDEA

• Do the hashing on the client • By default:

• “chash_keyfun”:{"mod":"riak_core_util", "fun":"chash_std_keyfun"},

• look at the source on github • riak_core/src/chash.erl

Page 113: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

THE HASH FUNCTION

• Easy to re-implement client-side

• sha1 as a BigInt

• example: sha1(bucket + key) = 2450

Page 114: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RING STATUS

• If hash = 2450 => nodes 16, 17, 18

$ curl -s -XGET http://$host:8098/riak_preflists/myring{ “0" :"[email protected]" “5708990770823839524233143877797980545530986496” :"[email protected]" “11417981541647679048466287755595961091061972992”:"[email protected]" “17126972312471518572699431633393941636592959488":“[email protected]” …etc…}

Page 115: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

WARNING

• Possible only if • Nodes list is monitored • In case of failed node, default to random • Data is requested in an uniform way

Page 116: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

RESULT

• Network usage even more reduced ! • Especially for GETs

Page 117: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CONCLUSION

Page 118: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

CONCLUSION

• We used only Riak Open Source

• No training, self-taught, small team

• Riak is a great solution

• Robust, fast, scalable, easy

• Very flexible and hackable

• Helps us continue scaling

Page 119: Using Riak for Events storage and analysis at Bookinggotocon.com/.../slides/...EventsStorageAndAnalysisWithRiakAtBookin… · KEY FIGURES • 600,000 hotels • 212 countries •

Q&A@damsieboy