Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ......

47
Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018

Transcript of Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ......

Page 1: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018

Page 2: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Presenting Today

Jon HymanCTO & Co-Founder

Braze (Formerly Appboy)

@jon_hyman

Page 3: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Digital is the main reason just over

half of Fortune 500 companies have

disappeared since the year 2000

PIERRE NANTERME, CEO, ACCENTURE

[…] the roller coaster will be accelerating

faster than ever, only this time it’ll be

about actual experiences, with much

less emphasis on the way those

experiences get made

WALT MOSSBERG, AMERICAN JOURNALIST &

FORMER RECODE EDITOR AT LARGE

Mobile is at the vanguard of a new wave of borderless engagement.

SOURCE: DIGITAL DISRUPTION HAS ONLY JUST BEGUN (DAVOS WORLD ECONOMIC FORUM), THE DISAPPEARING COMPUTER (RECODE)

Page 4: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Braze empowers you to humanize your

brand-customer relationships at scale.

Tens of Billions of

Messages Sent

Monthly

Global Customer Presence

More than 1 Billion

MAU

ON SIX CONTINENTS

Page 5: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

TOC

Quick Intro to Redis

Coordinating Customer Journeys with Redis

Buffering Analytics to Redis

Today

Page 6: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Quick Intro to Redis

Page 7: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

What is Redis?

• Redis is an open source (BSD licensed), in-memory data structure store,

used as database, cache and message broker. It supports data structures

such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps,

hyperloglogs and geospatial indexes with radius queries. Redis has built-

in replication, Lua scripting, LRU eviction, transactions and different

levels of on-disk persistence, and provides high availability via Redis

Sentinel and automatic partitioning with Redis Cluster.

• Braze uses all the data types from Redis

• Today’s talk we’ll look at sorted sets, sets, hashes, and strings

Page 8: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

SET key value NX EX 10

• Set “key” to “value” if it does not exist, and expire the key in 10 seconds

• Redis returns whether or not the set succeeded

Redis data types

• Strings: key value storage.

• Redis has atomic operations to set a key if it doesn’t exist and to set expiry

• You can use this to create a basic locking mechanism

Page 9: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

SADD key “a”

SADD key “b”

SADD key “a”

SMEMBERS key

[ “a”, “b”]

Redis data types

• Sets: Lists of string values that do not contain any duplicates. Sets do not have an ordering.

Page 10: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Redis data types

• Hashes: A data structure that can store string keys and string values

HSET key foo bar

HSET key bar bang

HGETALL key

{“foo”:”bar”, “bar”:”bang”}

• Hashes can also have keys be incremented

HINCRBY key baz 1

HINCRBY key baz 3

HGET key baz

“4”

Page 11: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Redis data types

• Sorted Sets: Like sets, but each element also has a numerical score associated with it.

Sorted Sets are ordered by that score.

ZADD scores alice 100

ZADD scores bob 80

ZADD scores carol 110

ZRANGEBYSCORE scores 0 -1

[ [bob, 80], [alice, 100], [carol, 110] ]

ZREVRANGEBYSCORE scores 0 -1

[ [carol, 110], [alice, 100], [bob, 80] ]

Page 12: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Coordinating Customer

Journeys with Redis

Page 13: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Canvas

Allows customers to create

multi-step, multi-message,

multi-day customer journeys

Page 14: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Canvas

• Canvas is distributed and event driven

• When messages are sent, we fire “received campaign event”

• Processes listen for the “received campaign event” and determine if that should schedule new message

• If a new message should be scheduled, enqueue a new job process to send the message.

Page 15: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Using Redis as a Job Queue

• Jobs are added to Redis sorted set with Unix timestamp as the score and value as job data

• One new job added per message

• Worker processes on servers poll scheduled set with ZRANGEBYSCORE -INF <now> LIMIT 0 1,

then one worker process ZREMs

•ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation

of sorted sets

•ZREM has O(log N)runtime

• For canvas, enqueue one job per each branch.

• When the job runs, the process determine if the branch path is valid and grab a lock to prevent other

branches from processing

• Lock takes the form of a SET NX EX operation

Page 16: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Canvas

• This architecture worked great in staging, in beta, and for the first

few months of the general release and all was good

• Processing runtime depends on number of branches a canvas has

and the number of users entering the canvas.

• January, 2017 one customer created a canvas with 11 branches

targeting more than 10 million users to run at 10am the next day.

• Canvas architecture design meant we had to process 110 million

jobs right at 10am

Page 17: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

What happened?

Page 18: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Thundering Herd: Enqueuing Jobs

• This particular canvas created 110 million jobs to all run at 10am the next morning at the same timestamp

• These jobs are stored in a sorted set, where workers are polling to move jobs from sorted set to queues

•ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation of sorted sets

•ZREM has O(log N) runtime

• Every worker server’s ZRANGEBYSCORE would return something, only one process would successfully ZREM the job

• Excessive ZREM operations slowed down Redis

• It took more than 40 minutes just to enqueue the jobs, meaning that if it was 10:35am, we hadn’t finished enqueuing

the 10am jobs yet. This was now a customer facing incident.

Page 19: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

One user per job inefficiencies

• Each job was one {user, branch} pair

• Determining if the user should go down that path involves querying database state and making Redis locks

• 110 million roundtrips to each database to determine if processing should continue

• It took more than 90 minutes to process the next steps

Page 20: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

What did we do?

Page 21: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Fixing Canvas architectural issues

• Initial code design was inefficient: one job per {user, branch} pair. Each job needs

access to database state, so we made a lot of extra database calls.

• Because messages tend to go to multiple users around the same time, we figured

we could buffer them and have a single job process multiple users at once.

Page 22: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Use Redis sets as a buffer

Page 23: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Fixing one user per job inefficiencies

• When a “received campaign event” is fired, instead of

enqueueing a new job to send a message, create a new set

with key “buffer:STEP_ID:TIMESTAMP”. Add user to this set.

• This lets users buffer up for the same timestamp.

• Periodically flush this set in batches of 100 users:

• When doing an SADD, also do a SET NX EX to a key

to determine if we should enqueue a job to run in 3

seconds which will flush the set.

• The job does an SPOP 100 to get 100 elements, and will re-enqueue other jobs to run to continue flushing the set if

it is non-empty

Page 24: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Fixing the thundering herd

• Added random microsecond jitter to all jobs in the sorted set to split up one second into a million pieces

• Existing code used ZRANGEBYSCORE -INF <now> 0 1 to consume from left side of sorted set

• Consume from the right side with ZREVRANGEBYSCORE

• Consume from the middle

• Keep track of how far backlogged we are in the set

• Randomly add jitter or whole seconds to move along the set to start consuming the middle

Page 25: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Results of architectural changes

• Saved more than 50 gigabytes of RAM for the original canvas

• Instead of 110 million jobs, we enqueued only about 1.4 million jobs

• Instead of 40 minutes to enqueue from the sorted set, all jobs enqueued in a few seconds

• Next steps of the canvas processed in about 14 minutes, down from 90 minutes.

Page 26: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

We adapted buffering in other

places, such as our REST API

Page 27: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

REST API Buffering

• Braze has REST APIs to ingest user attribute data, event data and purchases

• Application servers query user state when processing, it is more efficient to make batch roundtrips to databases

• We encourage customers to batch data, but some integrations make 1 API call per data point

POST /users/track

{ attributes: [{“user_id”: “123”, “first_name”:”Alice”}],}

POST /users/track

{ attributes: [{“user_id”: “456”, “first_name”:”Bob”}],}

POST /users/track

{ attributes: [ {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, ],}

Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State

Page 28: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

• We use the same pattern and SADD data to a Redis set and flush it every second

• This lets us buffer multiple API calls and process them together

REST API Buffering

• Braze has REST APIs to ingest user attribute data, event data and purchases

• Application servers query user state when processing, it is more efficient to make batch roundtrips to databases

• We encourage customers to batch data, but some integrations make 1 API call per data point

POST /users/track

{ attributes: [{“user_id”: “123”, “first_name”:”Alice”}],}

POST /users/track

{ attributes: [{“user_id”: “456”, “first_name”:”Bob”}],}

POST /users/track

{ attributes: [ {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, ],}

Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State

Page 29: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Improving Writes for Time Series Analytics

Page 30: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

We collect a lot of time series analytics

Page 31: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Time series analytics are stored in MongoDB

Non-hashed MongoDB sharding divides data into ranges and puts them on different nodes

Page 32: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Time series data is easy to pre-aggregate

{ app_id: “www.braze.com”, date: 2018-04-24, name: “website_visits”, 6: 120, 7: 541, 8: 1200, 9: 800, … }

Page 33: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

{ app_id: “www.braze.com”, date: 2018-04-24, name: “website_visits”, 6: 120, 7: 541, 8: 1200, 9: 800, … }

Shard on {app_id:1, name:1, date:1}

Page 34: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

{app_id: 1, name: 1, date: 1}

One document per app, per event name, per day

Page 35: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

{app_id: 1, name: 1, date: 1}

What happens when more events come in at once than one shard can handle?

Page 36: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

🙈🔥😭

{app_id: 1, name: 1, date: 1}

What happens when more events come in at once than one shard can handle?

Page 37: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Treat Redis hashes as if they were MongoDB sub-documents

Page 38: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

{

app_id: “www.braze.com”,

date: 2018-04-23,

name: “website_visits”,

6: 120,

7: 541,

8: 1200,

9: 800,

}

MongoDB Redis

Use a hash based on shard key where keys are hours and values are

the amount to increment by

Page 39: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

HINCRBY “www.braze.com|2018-04-23|website_visits” 8 1

SADD "buffered" “www.braze.com|2018-04-23|website_visits”

Page 40: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Periodically flush from Redis to MongoDB just like we do with Canvas sets

Page 41: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Flush buffer from Redis to MongoDB

keys = SMEMBERS(“buffered”)

increment_hashes = REDIS MULTI keys.each {|key| HGETALL(key) } SREM(“buffered”, k) keys.each {|key| DEL(key) } END MULTI

keys.each_with_index do |key, i| app_id, name, date = deserialize(key) db.my_timeseries.find( {app_id: app_id, name: name, date: date} ).update_one($inc: increment_hashes[i]) end

* This example algorithm is vulnerable to data loss, do not use directly

Page 42: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

We do this with 12 Redis servers to shard out writes to a single MongoDB document

Can buffer the same hash key to each Redis and flush independently

Page 43: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Scale

• We’re doing over 1 million ops per second to Redis

• That’s 1 million writes to Mongo deferred per second

• Mongo flush rate is approximately 7k writes per second

• Redis is handling 142x more writes per second than

Mongo for analytics

Page 44: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Summary

• When processing a flurry of events, holding and batching can be

efficient to improve throughput

• Redis’ multiple data types can be used for buffering

• Braze uses sets to buffer streams of data to process in bulk

• Add with SADD, remove with SPOP

• Reduces database roundtrips and storage costs

• Braze uses hashes to buffer time series analytics using HINCRBY

Page 45: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

Thank you! We are hiring!

braze.com/careers

Page 46: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings:

5

Rate My Session

Page 47: Buffering to Redis for Efficient Real-Time Processing · brand-customer relationships at scale. ... • Redis returns whether or not the set succeeded Redis data types • Strings: