Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...

Post on 27-May-2020

3 views 0 download

Transcript of Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...

Scaling Saved Searches Serving real time push-notifications for millions

saved searches

466382733

Who are we?

ebaykleinanzeigen≠ebay

What are we?

ads= classified ads

some numbers

22M ads live!

18M searches/day

Saved Searches Serving real time push-notifications for

millions saved searches

466382733

700k new ads/day8M saved searches

48.000.000.000theoretical matchesa day!

p r o c e s si t !

What?

How?

* * 0/1 * * ?

r e a l t i m e ?

s c a l a b l e ?

C a n w e d ob e t t e r ?

2 0 1 5

src=https://www.esciencecenter.nl/img/main/logo-elastic.png

Percolator

Traditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries.

src=https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

H o wm a n yp u s h e sp e r d a y ?

~3x

How?

700k new ads/day

match all?

a s k s e a r c h

h o wm a n yr e s u l t s ?

c r e a t eb u c k e t s

0 - 100: RT101 - 1000: 1h1001 - 10000: 2h> 10000: 6h

...

l i f e t i m eo f a s e a r c h

s l e e p ...Z Z Z Z Z Z

Z Z Z Z Z Z

S e t u p

S e t u p

cloud

2 data centers

2 data centers 10 data + 3 master

2 data centers 10 data + 3 master

replication x1shards x80

SOLVED ES5

s k i po n o v e r l o a d

e l a s t i c f a s to n i n d e x i n g

f i l t e rs l e e p i n gs e a r c h e s

m e t a d a t a

filter:{“next_pushdate”:[* TO NOW]}

o n l y 3 0 % s e a r c h e sa r eo n l i n e

d e s k t o p

a v o i dd b - r e a d p e rs e a r c h

h a s h p e rs e a r c h

b l o o m f i l t e ri n c o o k i e

src=https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/2000px-Bloom_filter.svg.png

a p p s

d e e p l i n ko n r e s u l t s i z e

5

5

1

s t o r es e a r c h e sl o c a l

b a c k e n ds y n co n a c t i o n s

S a v e dS e a r c h

S t a b l e ?

S t a b i l i z ee l a s t i c

Boost your percolator!

Tips & Tricks

“This indeed seems like a large application of

percolate.”Elastic support, June 2015

Performance linear with number of queries

1. Consider using other systems.

1. Consider using other systems.

“It is worth noting that simple exist matches on a field are probably not a great application for percolator. This doesn’t

utilize any text matching capability or complex boolean.”

Anything,anywhere!

Every ad offering something for free!

1. Consider using other systems.

2. Optimise your data structure.

2. Optimise your data structure.

2. Optimise your data structure.

3. Filter, filter, filter!

3. Filter, filter, filter!

“The filter only works on the metadata fields. The query field isn’t indexed by default.”

3. Filter, filter, filter!

CATEGORY: cars

CATEGORY:all

CATEGORY:cars OR all

… what else can we filter?

3. Filter, filter, filter!

3. Filter, filter, filter!

4. Use bulk requests.

5. Use parallel bulk requests.

5. Use parallel bulk requests.

index

node1 A1

node2 A2

5. Use parallel bulk requests.

“Currently, to utilise all of your shards, you would need to consider sending multipercolate requests in parallel.”

index

node1 A1

node2 A2

https://github.com/elastic/elasticsearch/issues/13177

5. Use parallel bulk requests.

6. Degrade gracefully

Matthias:Antique copper lamps in Pankow

André:Cars in Berlin

6. Degrade gracefully

6. Degrade gracefully

André:Cars in Berlin

Matthias:Antique copper lamps in Pankow

6. Degrade gracefully

HIGH PRIORITYLOW PRIORITY

André:Cars in Berlin

Matthias:Antique copper lamps in Pankow

6. Degrade gracefully

Outcome

Reduced percolation time:

Outcome

Doubled the number of push notifications:

S t a b i l i z ee l a s t i c

S t a b l e ?

8 0 0 0 0 0 0 s e a r c h e s7 0 0 0 0 0a d s / d a y

S t a b i l i z ep l a t f o r m

eBayK saved searches goes 2016

architecture

Before: one DB rules it all

MySQL

Before: one DB rules it all

create saved search

MySQL

Before: one DB rules it all

create saved search

change saved search

MySQL

Before: one DB rules it all

create ad

create saved search

change saved search

MySQL

Before: one DB rules it all

create ad

create saved search

change saved search

MySQL

found match

Before: one DB rules it all

create ad

create saved search

change saved search

MySQLgot push

found match

MySQL

Before...

MySQL

AwakeJob

Before...

MySQL

AwakeJob

SendJob

CreateJob

Before...

MySQL

CleanupJob

AwakeJob

SendJob

IndexerJobCreateJob

ExpireJob

Before...

Before: bottleneck communication via DB

super high performance

resiliency

scalability..?

Goal: event-driven data pipeline

What is Apache Kafka?distributed messaging system - persistent - high throughput

Topic 1

Topic 2

Producer

Producer

Consumer

Consumer

Consumer

But what’s new?

But what’s new?

12 3

Now: streams and data flows

percolate

create ad

Now: streams and data flows

percolate

create ad

found match

Now: streams and data flows

percolate

process match

create ad

found match

Now: streams and data flows

percolate

process push

create ad

found match

Now: streams and data flows

percolate

process push

create ad

found match

MySQL

Now: streams and data flows

percolate

process push

create ad

found match

MySQL

Compaction

Compaction: Kafka == source of truth?

Compaction: Kafka == source of truth?

A:23

B:12

B:null

C:0

A:24

time

Compaction: Kafka == source of truth?

A:23

B:12

B:null

C:0

A:24

A:24

C:0

time

Compaction: Kafka == source of truth?

A:24

C:0

time

Compaction: Kafka == source of truth?

Consumer

A:24

C:0

Compaction: Kafka == source of truth?

Consumer

A:24

C:0

Compaction: Kafka == source of truth?

Consumer

A:24

C:0

Issues encountered

Issues encountered

latency - used local cache

Issues encountered

some components couldn’t keep up - spot-on optimisation

latency - used local cache

Issues encountered

some components couldn’t keep up - spot-on optimisation

out of order writes - ?

latency - used local cache

w r a p u p

simplicityfine tune elasticuse streaming

T h a n ky o u

References

”Building LinkedIn’s Real-time Activity Data Pipeline”, Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, Victor Yang Ye