with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime...

127
DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Transcript of with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime...

Page 1: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

DBS302:Driving a Realtime Personalization Enginewith Cloud Bigtable

Calvin French-Owen, Co-Founder & CTO, Segment

Page 2: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

You’re making a hard choice...

Page 3: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Our roadmap

- A bit of background- Personas architecture- BigQuery + Cloud Bigtable- Making hard choices

Page 4: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

A bit of background

Page 5: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 6: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 7: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- 19,000 users- 300B monthly events- 450B outbound API calls- TB of data per day

Segment by the numbers

Page 8: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Under the hood...

Page 9: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

API Kafka Consumer

DB

api.google.com

api.salesforce.com

api.intercom.io

api.mixpanel.com

Page 10: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

The biggest advantage of this system

Page 11: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

The biggest advantage of this system

It’s stateless

Page 12: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

API Kafka Consumer

DB

api.google.com

api.salesforce.com

api.intercom.io

api.mixpanel.comAPI Kafka Consumer

Page 13: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

In 2018... we started getting a new set of requirements

Page 14: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 15: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Personas brought some decidedly stateful use cases

Page 16: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

The use cases of personas

Page 17: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

1) Profile API

Page 18: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 19: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 20: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

2) Identity resolution

Page 21: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 22: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

3) Audience computation

Page 23: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 24: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Query profiles in real-time- Match users by identity- Create audiences of users

Personas

Page 25: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Personas architecture

Page 26: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Let’s first talk about lambda architectures...

Page 27: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 28: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Data is sent to the batch and speed layers- Batch layers runs bigger computations- Speed layer serves real-time updates (+ diffs)

Lambda architecture

Page 29: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Query profiles in real-time (speed)- Match users by identity (speed)- Create audiences of users (batch)

Personas

Page 30: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Different pipelines, different datastores

Page 31: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Kafka Pubsub

BigQuery

Cloud Bigtable

(batch)

(speed)

Worker

Worker

Page 32: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Kafka Pubsub

BigQuery

Cloud Bigtable

(batch)

(speed)

Worker

Worker

Page 33: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Kafka -> Pub/Sub

Page 34: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Segment messages

- Tracking things like pageviews, user events, etc

- Semi-structured JSON- Typically ~1kb

Page 35: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 36: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Hundreds of thousands of 1kb messages- Published from Kafka to Cloud PubSub- Writes data twice, once for realtime, once for batch- Audience computation in BigQuery- Real-time reads in Cloud Bigtable

Personas architecture

Page 37: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery + Cloud Bigtable

Page 38: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Use case- Architecture- Data model- Query patterns

BigQuery + Cloud Bigtable

Page 39: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Use case

Page 40: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery

Cloud Bigtable

computeservice

Kafka Pubsub

Worker

Worker

Page 41: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Want to find users who meet arbitrary criteria- Terabytes of data within a few minutes- Tables have billions of rows- We rarely care about all of the columns - Real-time reads are not a big deal- Tens of concurrent queries

BigQuery: Use case

BigQuery

Cloud Bigtable

computeservice

PubSub

Worker

Worker

Page 42: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Architecture

Page 43: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

2004: MapReduce

Page 44: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

2010: Dremel (built in 2006)

Page 45: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: architecture

- Designed to interactively query datasets (seconds-minutes)- Nested, structured data- Uses SQL, no programming- Private version: Dremel

Page 46: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery Architecture: four good ideas

Page 47: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #1:Column-oriented

Page 48: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Suppose we want to build a database...

Page 49: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

A row-oriented database

Page 50: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

What if my database has billions of rows...

...and I only need location?

Page 51: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

What if my database has billions of rows...

...and I only need location?

Store columns, not rows!

Page 52: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

What if we invert the rows?

Page 53: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 54: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 55: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #2: Compression

Page 56: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #2: Compression

Page 57: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Columns on disk

- We have a lot of repeated data- Run-length-encoding (RLE)- Let’s compress it...

Page 58: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Columns on disk

- We have a lot of repeated data- Run-length-encoding (RLE)- Let’s compress it...

Page 59: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #3: Efficient nested decoding

Page 60: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #3: Efficient nested decoding

Page 61: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 62: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 63: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

What happens when I select *?

Page 64: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

FSM

Page 65: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #4: More servers, more efficiency

Page 66: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery idea #4: More servers, more efficiency

Page 67: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 68: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Root

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

query

Page 69: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Root

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

MERGE!

query

Page 70: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Root

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

Level 1

Level 1

Level 1

query

Page 71: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Root

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

leaflet

Level 1

Level 1

Level 1

query

MERGE!

MERGE!

MERGE!

MERGE!

Page 72: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

More servers == more distributed work

Page 73: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery’s good ideas

1. Column-oriented2. Compression3. Fast, nested, data encoding4. Distribute the work (separate data + compute)

Page 74: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Data model

Page 75: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 76: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

We want to take user-supplied criteria…

…and turn it into query parameters

Page 77: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

UI JSON

Page 78: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

SQLJSON

Page 79: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Data Model

- Dataset per customer- Table per {collection,event}- Additional tables for traits,

identity, merges

Page 80: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Data Model

- Dataset per customer- Table per {collection,event}- Additional tables for traits,

identity, merges- Repeated fields for

external_ids

Page 81: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery: Data Model

- Dataset per customer- Table per {collection,event}- Additional tables for traits,

identity, merges- Repeated fields for

external_ids- Explode arbitrary nested

properties

Page 82: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery:Query patterns

Page 83: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery

Cloud Bigtable

computeservice

Kafka Pubsub

Worker

Worker

Page 84: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Compute service runs queries every minute

Page 85: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Scan gigabytes in seconds

Page 86: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 87: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

2GB/s scanned(170T/day)

800 slots

Page 88: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Tens of concurrent queries- Scans terabytes of data independently- Partitioned by customer- Query by arrays of external_ids- Stored AST as JSON and converted to SQL

Batch computations in BigQuery

Page 89: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Use case

Page 90: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 91: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery

Cloud Bigtable

profileAPI

Kafka Pubsub

Worker

Worker

Page 92: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Use case

- Small amounts of data (kb to mb)- Able to be indexed for a single user- A high read and write rate (tens of thousands of qps)- Data should be reflected in real-time

Page 93: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Use case

- Small amounts of data (kb to mb)- Able to be indexed for a single user- A high read and write rate (tens of thousands of qps)- Data should be reflected in real-time

(Not a new idea)

Page 94: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Page 95: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Bigtable (published in 2006)

Page 96: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

Page 97: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

write: <k, v>

Page 98: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

write: <k, v>

memtable.append(k, v)

Page 99: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

write: <k, v>

memtable.append(k, v)

append(k, v)

Page 100: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Writes are fast appends

Page 101: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k)

Page 102: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k)

memtable[k]

Page 103: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

<value>

memtable[k]

Page 104: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k) fetch(offset)

Page 105: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

<value><data>

Page 106: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Reads first cache,then merge

Page 107: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

What about failures?

Page 108: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k) fetch(offset)

Page 109: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k) fetch(offset)

Page 110: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

Client

BT Node

GFS Tablet

GFS Tablet

GFS Tablet

memtable

BT Node

memtable

GFS Tablet

read(k)

fetch(

offset

)

Page 111: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Architecture

- Multi-tenant- Row-oriented- Log-structured merge tree- Immutable, with in-memory caching- Bloom filters save on reads- Lock service maps nodes to keyspace

Page 112: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Data model

Page 113: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

- Separate tables for different datatypes- Records- Properties- Events

- Keys are ID and time-ordered- Values are snappy-encoded

Cloud Bigtable: Data Model

Page 114: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable: Data Model

- Records provide metadata to stitch together the full record

- User properties power the profile API

- Events are sorted to query the last range of events

Page 115: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Cloud Bigtable + BigQuery:In production

Page 116: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

In production

- Cloud Bigtable- 55,000 rows written per second- 175,000 rows read per second- 10 TB of data- 16 nodes

- BigQuery- Hundreds of queries per minute- Scanning hundreds of GB/minute- 500TB worth of data stored

Page 117: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Back to that hard choice...

Page 118: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

BigQuery is hard to compare

Page 119: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

A few placesCloud Bigtable shines

Page 120: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

1. Identification of hot keys

Page 121: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment
Page 122: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

2. Write-heavy workloads

Page 123: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Split compute

- Compute is separated from storage

- Writes can be spread across many nodes

Page 124: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

In summary...

Page 125: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Segment Personas

- Powered by Cloud Bigtable and BigQuery- Cloud Bigtable for small, random reads- BigQuery for batch aggregations

- Processes billions of events- Large, multi-tenant architecture- SQL for flexible feature development- Favorable read/write costs- Millions of dollars in revenue- Scales to Google-levels

Page 126: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Fin

Page 127: with Cloud Bigtable Personalization Engine Driving a ... · DBS302: Driving a Realtime Personalization Engine with Cloud Bigtable Calvin French-Owen, Co-Founder & CTO, Segment

Your Feedback is Greatly Appreciated!

Complete the session survey in mobile app

1-5 star rating system

Open field for comments

Rate icon in status bar