Search@airbnb
-
Upload
mousom-gupta -
Category
Software
-
view
1.012 -
download
1
Transcript of Search@airbnb
Technical Stack
____________________________
DropWizard as a service framework (incl. Jetty, Jersey, Jackson)
ZooKeeper (via Smartstack) for service discovery.
Lucene for index storage and simple retrieval.
In-house built forward index, real-time indexing, ranking, advanced filtering.
Web App
Search1
150 Search Threads
Lucene Index
~30 replicas of same index dataJVM
…Search2 SearchN
Search
Overview
search
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Com
bine
r Filtering
and
Ranking
Shards
____________________________
Each box has 8 shards of Lucene Index Latency is 50% less than a single shard index
Challenges ____________________________
Bootstrap (creating the index from scratch) Ensuring consistency of the index with ground truth data in real time
Indexing
What’s in the Lucene index? ____________________________
Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy)
Categorical and numerical properties like room type and maximum occupancy Full text (descriptions, reviews, etc.)
~40 fields per listing from a variety of data sources, all updated in real time
Tails binary update logs from Mysql Servers (5.6+) Converts changes in any of the tables into actionable objects called “Mutations” (Inserts, deletes, Updates) Broadcasts them to Medusa using Kafka
Spinaltap
Source of truth for search index data.
Listens to updates from Spinaltap and builds new IndexData by querying ~15 mysql tables from three different databases.
Persists everything in a DataStore and broadcasts latest version to all search nodes.
Uses ZooKeeper for leader election.
Medusa
What’s in the forward index? ____________________________
Holds all the metadata about a listing required by scoring and filtering.
We also have complicated business rules to calculate Price, Availability, InstantBook etc which needs a ton of
metadata. ~50 fields built from multiple data source and updated
in realtime.
public final class ForwardIndexData { private final CalendarData calendarData; private final PricingData pricingData; private final HostInfo hostInfo; . . . . . . . .}!public final class CalendarData { private final DateRanges reservationDates; private final SeasonalValues startDayOfWeeks; . . . .
}!private final class SeasonalValues<T> { private final DateRange startDate; private final T value; . . . .}
Forward Index
Availability ____________________________
!Depends on the profile of guest.
The checkin date must be one of the valid start days of the week. Must satisfy seasonal minimum nights.
There must be enough preparation time for the host. Import busy dates from external calendars to avoid booking conflict.
Pricing ____________________________
!
Depends on number of guests , number of nights. How close or further away the checkin date is.
How long is the trip, does the host have Weekly and Monthly pricing. Is there special price override for these nights.
Instant Book ____________________________
!
Depends on number of guests , number of nights. Profile of the guest like positive reviews, does have profile photo?
How much preparation time the host has etc.
Needs to store objects with 50-100 fields as values keyed by listing id. Should avoid the cost of serialization/deserialization during every fetch.
Data must be available in-memory for fast lookup, but also persisted on disk.
Highly Concurrent, writer shouldn’t block the readers (One writer but >100 reader threads)
Requirements
Why did we need our custom Forward Index?
// Forward Indexpublic interface ForwardIndex<V> {! Map<Long, V> asMap(); void put(long id, V value);! void putAll(Map<Long, V> values);! void remove(long id);! void commit();!}
Forward Index Interface
// WriterforwardIndex.put(listingId, listingData);. . .// write to disk and also make it visible to readers.forwardIndex.commit();
// Reader// Fetch forward index data from in-memory mapMap<Long, ListingData> fwdIndex = forwardIndex.asMap();ListingData data = fwdIndex.get(listingId);!// Use it to evaluate business rules checkAvailability(data, searchRequest);calculatePrice(data, searchRequest)
NonBlocking In-Memory HashMap
DiskStore
// Forward Indexpublic class ForwardIndexStore<V> implements ForwardIndex<V> { private final DB<V> diskStore; private final Cache<V> cache;! . . . .! @Override Map<Long, V> asMap() { return Collections.unmodifiableMap(cache); } void put(long id, V value) { diskStore.put(id, value); cache.put(id, value); }! . . . .! void commit() { diskStore.commit(); cache.commit(); }}
Forward Index Implementation
Ranking Problem ____________________________
Not a text search problem Users are almost never searching for a specific item, rather they’re looking to
“Discover” The most common component of a query is location Highly personalized – the user is a part of the query
Optimizing for conversion (Search -> Inquiry -> Booking) Evolution through continuous experimentation
Ranking
Ranking Components ____________________________
Relevance Quality
Bookability Personalization
Desirability of location etc.
Ranking
Several hundred signals used to build machine learning models:
!
Properties of the listing (reviews, location, etc.)
Behavioral signals (mined from request logs)
Image quality and click ability (computer vision)
Host behavior (response time/rate, cancellations, etc.)
Host preferences model
DB snapshots Logs
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
GeocodingConfiguring retrieval optionsChoosing ranking models
QualityBookabilityRelevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and Availability
25 results
2000 results
25 results
Second Pass Ranking ____________________________
Traditional ranking works like this: !
then sort by In contrast, second pass operates on the entire list at once:
!
Makes it possible to implement features like result diversity, etc.
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
GeocodingConfiguring retrieval optionsChoosing ranking models
QualityBookabilityRelevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and Availability
25 results
2000 results
25 results