Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...

Scaling Saved Searches Serving real time push-notifications for millions

saved searches

466382733

Who are we?

ebaykleinanzeigen≠ebay

What are we?

ads= classified ads

some numbers

22M ads live!

18M searches/day

Saved Searches Serving real time push-notifications for

millions saved searches

466382733

700k new ads/day8M saved searches

48.000.000.000theoretical matchesa day!

p r o c e s si t !

* * 0/1 * * ?

r e a l t i m e ?

s c a l a b l e ?

C a n w e d ob e t t e r ?

2 0 1 5

src=https://www.esciencecenter.nl/img/main/logo-elastic.png

Percolator

Traditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries.

src=https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

H o wm a n yp u s h e sp e r d a y ?

700k new ads/day

match all?

a s k s e a r c h

h o wm a n yr e s u l t s ?

c r e a t eb u c k e t s

0 - 100: RT101 - 1000: 1h1001 - 10000: 2h> 10000: 6h

l i f e t i m eo f a s e a r c h

s l e e p ...Z Z Z Z Z Z

Z Z Z Z Z Z

S e t u p

2 data centers

2 data centers 10 data + 3 master

replication x1shards x80

SOLVED ES5

s k i po n o v e r l o a d

e l a s t i c f a s to n i n d e x i n g

f i l t e rs l e e p i n gs e a r c h e s

m e t a d a t a

filter:{“next_pushdate”:[* TO NOW]}

o n l y 3 0 % s e a r c h e sa r eo n l i n e

d e s k t o p

a v o i dd b - r e a d p e rs e a r c h

h a s h p e rs e a r c h

b l o o m f i l t e ri n c o o k i e

src=https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/2000px-Bloom_filter.svg.png

a p p s

d e e p l i n ko n r e s u l t s i z e

s t o r es e a r c h e sl o c a l

b a c k e n ds y n co n a c t i o n s

S a v e dS e a r c h

S t a b l e ?

S t a b i l i z ee l a s t i c

Boost your percolator!

Tips & Tricks

“This indeed seems like a large application of

percolate.”Elastic support, June 2015

Performance linear with number of queries

1. Consider using other systems.

“It is worth noting that simple exist matches on a field are probably not a great application for percolator. This doesn’t

utilize any text matching capability or complex boolean.”

Anything,anywhere!

Every ad offering something for free!

1. Consider using other systems.

2. Optimise your data structure.

3. Filter, filter, filter!

“The filter only works on the metadata fields. The query field isn’t indexed by default.”

CATEGORY: cars

CATEGORY:all

CATEGORY:cars OR all

… what else can we filter?

4. Use bulk requests.

5. Use parallel bulk requests.

node1 A1

node2 A2

“Currently, to utilise all of your shards, you would need to consider sending multipercolate requests in parallel.”

node1 A1

node2 A2

https://github.com/elastic/elasticsearch/issues/13177

6. Degrade gracefully

Matthias:Antique copper lamps in Pankow

André:Cars in Berlin

HIGH PRIORITYLOW PRIORITY

Outcome

Reduced percolation time:

Outcome

Doubled the number of push notifications:

S t a b i l i z ee l a s t i c

S t a b l e ?

8 0 0 0 0 0 0 s e a r c h e s7 0 0 0 0 0a d s / d a y

S t a b i l i z ep l a t f o r m

eBayK saved searches goes 2016

architecture

Before: one DB rules it all

create saved search

change saved search

create ad

create saved search

change saved search

create ad

create saved search

change saved search

found match

create ad

create saved search

change saved search

MySQLgot push

found match

Before...

AwakeJob

Before...

AwakeJob

SendJob

CreateJob

Before...

CleanupJob

AwakeJob

SendJob

IndexerJobCreateJob

ExpireJob

Before...

Before: bottleneck communication via DB

super high performance

resiliency

scalability..?

Goal: event-driven data pipeline

What is Apache Kafka?distributed messaging system - persistent - high throughput

Topic 1

Topic 2

Producer

Consumer

But what’s new?

Now: streams and data flows

percolate

create ad

percolate

create ad

found match

percolate

process match

create ad

found match

percolate

process push

create ad

found match

percolate

process push

create ad

found match

percolate

process push

create ad

found match

Compaction

Compaction: Kafka == source of truth?

B:null

Consumer

Issues encountered

latency - used local cache

Issues encountered

some components couldn’t keep up - spot-on optimisation

Issues encountered

some components couldn’t keep up - spot-on optimisation

out of order writes - ?

w r a p u p

simplicityfine tune elasticuse streaming

T h a n ky o u

References

”Building LinkedIn’s Real-time Activity Data Pipeline”, Ken Goodhope, Joel Koshy, Jay Kreps, Neha Narkhede, Richard Park, Jun Rao, Victor Yang Ye

Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...

Documents

Transcript of Scaling Saved Searches - GOTO Conference · Scaling Saved Searches Serving real time...

SAVED SEARCHES A - Z - WordPress.com

Advanced Searches Using History Advanced Searches What? For a given session, a list of Standard Format Past Searches is automatically saved each time.

Scaling Geospatial Searches in Large Spatial Databases

Working with Prospect Searches - edu.torontomls.netedu.torontomls.net/ProspectSearch/ProspectSearchinNewStratus.pdf · 1 Working with Prospect Searches With Prospect Searches, you

Dark Matter Searches Dark Matter Searches

Searches: Mastering Splunk Improve searches by 500k+ times

docs.prospects.comdocs.prospects.com/FS-Discover-MLSTouch.pdf · Your List of Saved Searches Your Flagged Properties Instant Market Activity Settings & Additional Features (Where

Saved Searches - RF-SMART Searches... · KPI Scorecard Portlet. See the help topic Using a Custom KPI in a KPI Scorecard. KPI Meters. See the help topic KPI Meters. Trend Graphs.

Specialized Uses of Wood Auto Saved) Auto Saved) Auto Saved)

Scaling the Patent System - Boston College · 2017. 7. 8. · 3 about potentially relevant patents. Typically this means hiring patent lawyers to conduct patent searches, which may

internet searches

Data Independence and The Semantic Web Roadmap · Data Searches Search By Association Taxonomy/Classification Searches Pattern-Based Searches: On-Demand Mining Manual Searches Agent-Based

A Jump Start Guide for Employers...Saved Resume Folders Saved Searches & Agents Resource Library Search by Last Name Revise or Search to search by other criteria Candidate Search Matched

Job Search Tools - career.engineering.asu.edu · •On Handshake, company, LinkedIn and other websites, complete your profile, upload your customized résumé, set-up saved job searches

FEP GRADED Auto Saved) Auto Saved)

Once Saved, Always Saved? - Lifestyle Clifestylec.com/wp-content/uploads/2015/06/Once-Saved-Always-Saved.pdfnot believe in an unconditional “once saved, always saved”, that you

Graph Searches Revisited : A lego of graph searches - Lix

Conducting Effective Searches Conducting Effective Searches: Orientation for Search Committees.

How to Register - Sacramento State to Register Update Profile, ... Last 7 (19) My Saved Searches: Sacramento State ... Worksho ps Career Coaching edit Action

LA Study Away SURVIVAL - Temple TFMA · healthy and best of all: it saved me a lot of money. I also saved a lot of money by using Gasbuddy, an app that searches for the cheapest gas