How Lazada ranks products to improve customer experience and conversion

How Lazadaranks productsto improve customer experience and conversion

Strata Hadoop Singapore 2016

Leading e-commerce platform in South-East Asia

Lazada Data ScienceData App Devs expose, integrate, platform-ize

Data Scientists explore, prepare, model

Data Engineers collect, store, maintain

Start from bottom up

Ranking affects what appears on top

Ranking is different from recom-mendation

“How can I rank well on an e-commerce platform?”

Ranking products for catalog and search

Introducing new products

Emphasizing product quality

Web Tracker(JavaScript)

Mobile Tracker(Adjust)

3rd Party(e.g. ,ZenDesk, SurveyGizmo)

Kafka Queues

Bulk Loaders (Spark)

HadoopHadoop

Data Exploration

+Data

Preparation+

Feature Engineering

+Modelling

(Spark)

Manual Boosting(Django)

Local Validation

A/B Testing

Product

Seller

Transaction

Product rankings

Split traffic and measure outcomes

(Category Managers)

(User devices)

Overall resultsBetter ranking improved conversion and revenue per session

Introducing new products improved new product engagement

Emphasizing product quality had neutral to positive outcomes

Ranking products for catalog and search

IntentProvide shoppers quick access to best products in catalog/search results, making shopping easy

ProblemLazada has millions of products—not easy to navigate

How to identify products that interest users in the future?

How do we measure interest?

MethodologyMeasure shoppers’ interest through product engagement as a proxy

Clicks, add-to-cart, checkouts, etc.

Predict future interest

Collecting behavioral dataTrack and collect events on web (JavaScript) and app (Adjust)

Stream and process via Kafka

Store in Hive tables

Data preparationFilter and categorize online behavioral events (e.g., impressions, clicks, etc.)

Merge various views of product data (e.g. price, stock, etc.)

Exclude outliers and potentially fraudulent events

Feature engineeringCalculate product engagement metrics (e.g., average clicks, conversion rate, etc.)

Derive product attributes (e.g., age, discount, etc.)

Exclude outliers (e.g., conversion rate > 1.00)

Modelling (i.e., machine learning)Predict future (tomorrow’s) product clicks/checkouts

Examine results against a benchmark model

Pandas + XGBoost is faster and more effective than Spark + MLlib; assessing XGBoost4J-Spark

Boosting products (manually)Manually increase rank of certain products (e.g., highly anticipated products, campaign tie-ups)

User-friendly interface to drag-and-drop products

Limits on how many products can be boosted

Validation and A/B testingLocal validation is easy, but difficult to ensure similar results via A/B testing

A/B test all updates before production

ResultsIncreased conversion rate by 3 – 8%

Increased revenue per session by 5 – 20%

Introducing new products

IntentProvide potentially good new products with exposure

Provide shoppers with new products they like

Keep catalog fresh

ProblemProducts with strong engagement stay on top

Products without engagement don’t get traffic

How can we identify new products that are likely to interest users?

Methodology (demand)Find what people need

Measure needs through internal/external data

Rank new products in terms of demand

Methodology (supply)Find products similar to top products

Measure similarity with top products

Rank new products based on similarity and top product volume

Data preparation and feature engineeringParse (log) data to identify shoppers’ needs

Measure potential product demand

Model product similarity (Spark GraphX / ElasticSearch)

Validation and A/B testingLimited capability on existing A/B testing platforms to track specific products

Measure performance of new products across experimental groups using in-house tracker

ResultsIncreased new product click-thru rate by 30 – 80%

Increased new product add-to-cart by 20 – 90%

Expected overall conversion to decrease—increased instead (though not statistically significant)

Emphasizing product quality

IntentImprove customer experience throughout purchase journey

From online browsing to receiving of product

Product quality identified as key driver

ProblemHow do we measure product “quality”?

Methodology (online)Content (e.g., title quality, richness of content)

Reviews (e.g., average rating, negative reviews)

Performance (e.g., click-thru rate, browsing time)

Methodology (offline)Perfect order rate (i.e., not cancelled, not returned, etc.)

Negative feedback (e.g., counterfeit, complaints, etc.)

Seller metrics (e.g., timely shipped-rate, return rate, etc.)

Data preparation and feature engineeringDerive product features (e.g., title quality, image quality, etc.)

Measure content richness (e.g., attributes available, grouping, etc.)

Measure delivery performance and customer feedback

ResultsImproved quality of products displayed

Increased conversion by 3 – 5% for some countries

Small conversion change in other countries (non-significant)

Key takeawaysData science is (i) team sport, (ii) partly R&D, (iii) iterative

How you use data to solve problems (methodology), data preparation, and feature engineering > machine learning

Thank [email protected]

How Lazada ranks products to improve customer experience and conversion

Data & Analytics

Transcript of How Lazada ranks products to improve customer experience and conversion