Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...
Transcript of Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...
Scaling Saved Searches Serving real time push-notifications for millions
saved searches
466382733
Who are we?
ebaykleinanzeigen≠ebay
What are we?
ads= classified ads
some numbers
22M ads live!
18M searches/day
Saved Searches Serving real time push-notifications for
millions saved searches
466382733
700k new ads/day8M saved searches
48.000.000.000theoretical matchesa day!
p r o c e s si t !
What?
How?
* * 0/1 * * ?
r e a l t i m e ?
s c a l a b l e ?
C a n w e d ob e t t e r ?
2 0 1 5
src=https://www.esciencecenter.nl/img/main/logo-elastic.png
Percolator
Traditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries.
src=https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
H o wm a n yp u s h e sp e r d a y ?
~3x
How?
700k new ads/day
match all?
a s k s e a r c h
h o wm a n yr e s u l t s ?
c r e a t eb u c k e t s
0 - 100: RT101 - 1000: 1h1001 - 10000: 2h> 10000: 6h
...
l i f e t i m eo f a s e a r c h
s l e e p ...Z Z Z Z Z Z
Z Z Z Z Z Z
S e t u p
S e t u p
cloud
2 data centers
2 data centers 10 data + 3 master
2 data centers 10 data + 3 master
replication x1shards x80
SOLVED ES5
s k i po n o v e r l o a d
e l a s t i c f a s to n i n d e x i n g
f i l t e rs l e e p i n gs e a r c h e s
m e t a d a t a
filter:{“next_pushdate”:[* TO NOW]}
o n l y 3 0 % s e a r c h e sa r eo n l i n e
d e s k t o p
a v o i dd b - r e a d p e rs e a r c h
h a s h p e rs e a r c h
b l o o m f i l t e ri n c o o k i e
src=https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/2000px-Bloom_filter.svg.png
a p p s
d e e p l i n ko n r e s u l t s i z e
5
5
1
s t o r es e a r c h e sl o c a l
b a c k e n ds y n co n a c t i o n s
S a v e dS e a r c h
S t a b l e ?
S t a b i l i z ee l a s t i c
Boost your percolator!
Tips & Tricks
“This indeed seems like a large application of
percolate.”Elastic support, June 2015
Performance linear with number of queries
1. Consider using other systems.
1. Consider using other systems.
“It is worth noting that simple exist matches on a field are probably not a great application for percolator. This doesn’t
utilize any text matching capability or complex boolean.”
Anything,anywhere!
Every ad offering something for free!
1. Consider using other systems.
2. Optimise your data structure.
2. Optimise your data structure.
2. Optimise your data structure.
3. Filter, filter, filter!
3. Filter, filter, filter!
“The filter only works on the metadata fields. The query field isn’t indexed by default.”
3. Filter, filter, filter!
CATEGORY: cars
CATEGORY:all
CATEGORY:cars OR all
… what else can we filter?
3. Filter, filter, filter!
3. Filter, filter, filter!
4. Use bulk requests.
5. Use parallel bulk requests.
5. Use parallel bulk requests.
index
node1 A1
node2 A2
5. Use parallel bulk requests.
“Currently, to utilise all of your shards, you would need to consider sending multipercolate requests in parallel.”
index
node1 A1
node2 A2
https://github.com/elastic/elasticsearch/issues/13177
5. Use parallel bulk requests.
6. Degrade gracefully
Matthias:Antique copper lamps in Pankow
André:Cars in Berlin
6. Degrade gracefully
6. Degrade gracefully
André:Cars in Berlin
Matthias:Antique copper lamps in Pankow
6. Degrade gracefully
HIGH PRIORITYLOW PRIORITY
André:Cars in Berlin
Matthias:Antique copper lamps in Pankow
6. Degrade gracefully
Outcome
Reduced percolation time:
Outcome
Doubled the number of push notifications:
S t a b i l i z ee l a s t i c
S t a b l e ?
8 0 0 0 0 0 0 s e a r c h e s7 0 0 0 0 0a d s / d a y
S t a b i l i z ep l a t f o r m
eBayK saved searches goes 2016
architecture
Before: one DB rules it all
MySQL
Before: one DB rules it all
create saved search
MySQL
Before: one DB rules it all
create saved search
change saved search
MySQL
Before: one DB rules it all
create ad
create saved search
change saved search
MySQL
Before: one DB rules it all
create ad
create saved search
change saved search
MySQL
found match
Before: one DB rules it all
create ad
create saved search
change saved search
MySQLgot push
found match
MySQL
Before...
MySQL
AwakeJob
Before...
MySQL
AwakeJob
SendJob
CreateJob
Before...
MySQL
CleanupJob
AwakeJob
SendJob
IndexerJobCreateJob
ExpireJob
Before...
Before: bottleneck communication via DB
super high performance
resiliency
scalability..?
Goal: event-driven data pipeline
What is Apache Kafka?distributed messaging system - persistent - high throughput
Topic 1
Topic 2
Producer
Producer
Consumer
Consumer
Consumer
But what’s new?
But what’s new?
12 3
Now: streams and data flows
percolate
create ad
Now: streams and data flows
percolate
create ad
found match
Now: streams and data flows
percolate
process match
create ad
found match
Now: streams and data flows
percolate
process push
create ad
found match
Now: streams and data flows
percolate
process push
create ad
found match
MySQL
Now: streams and data flows
percolate
process push
create ad
found match
MySQL
Compaction
Compaction: Kafka == source of truth?
Compaction: Kafka == source of truth?
A:23
B:12
B:null
C:0
A:24
time
Compaction: Kafka == source of truth?
A:23
B:12
B:null
C:0
A:24
A:24
C:0
time
Compaction: Kafka == source of truth?
A:24
C:0
time
Compaction: Kafka == source of truth?
Consumer
A:24
C:0
Compaction: Kafka == source of truth?
Consumer
A:24
C:0
Compaction: Kafka == source of truth?
Consumer
A:24
C:0
Issues encountered
Issues encountered
latency - used local cache
Issues encountered
some components couldn’t keep up - spot-on optimisation
latency - used local cache
Issues encountered
some components couldn’t keep up - spot-on optimisation
out of order writes - ?
latency - used local cache
w r a p u p
simplicityfine tune elasticuse streaming
T h a n ky o u
References
”Building LinkedIn’s Real-time Activity Data Pipeline”, Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, Victor Yang Ye