Making advertising personal, 4th NL Recommenders Meetup

15
Copyright © 2014 Criteo Making Advertising Personal Large Scale Real-Time Product Recommendation at Criteo Olivier Koch & Romain Lerallut May 19 th , 2015

Transcript of Making advertising personal, 4th NL Recommenders Meetup

Page 1: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Making Advertising Personal

Large Scale Real-Time Product Recommendation at CriteoOlivier Koch & Romain Lerallut

May 19th, 2015

Page 2: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Performance Advertising

We buy

• Inventory ! (ad spaces)• Billions of times a day• All over the Internet• For 95% of the population

Where is the need for tech ?

We sell

• Clicks !• (that convert)

• (that convert a lot)

We take the risk

You pay only for what you get

Page 3: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2013. Confidential

International infrastructureKey figures

Traffic

550 k HTTP requests / sec (peak activity)

23000 impressions /sec (peak activity)

180 k requests / sec on RTB (average)

Less than 10 ms to process an RTB request

3Figures of May2013

Physical infrastructure

6 Data centers on 3 continents operated and conceived in-house

~ 12000 servers, largest Hadoop cluster in Europe

Availability / Uptime >99.95%

More than 20 PB of storage Big Data

Page 4: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2012. Confidential 4

Arbitrage• Should we bid?• At which price?

Recommendation• Which products should we

display?

Graphical optimization• Big image vs small image• Background color, ...

Prediction• Generic prediction engine

• Specific models trained on TBs of data

Global Engine Architecture

Page 5: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Building ads with personalized content in realtime

Recommendation for Advertising

2.5 billion products~ 8ms response time

Page 6: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Data Sources

• Catalog data• Feed provided by the merchants

• User behavior data• Large scale intent data• All visits to merchant websites• Page views, basket, sales events

• Ad display data• Displayed and clicked ads

6

Page 7: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2012. Confidential 7

Recommendation execution flow

Candidates Generation• Get candidates from all

sources using user historical products and bestofs

Candidates Aggregation• Remove duplicates• Aggregate features

Preselection • Call degraded prediction to

decrease number of candidates

Scoring• Call full prediction model to

score each candidate

Winner Selector• Select N products on score,

randomization

Glup Logging• Log recommended

products, prediction variables

Page 8: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Issue #1: Retrieving user-specific products

• The need: storing [user → interesting products] vectors• Difficult to store and retrieve at scale (900+ mln users)• Hard to keep up to date

• Using seen products as a proxy• Store the [user → viewed products] vectors

• Easier to maintain• Store [viewed product → interesting products] vectors

• Based on aggregated user behavior data• Computable offline

• Final ranking by a ML model

8

Page 9: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Issue #2: Scoring products

• Need to fuse data from several sources• Product-specific• User-specific• User-product interactions• Display-specific

• 1st solution: regression model• Predict P(product click then sale)• Easy to evaluate

• 2nd solution: ranking model• More appropriate for our needs• Still maximizing post-click sales• Can be evaluated only on multi-product banners

9

Page 10: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Issue #3: Picking the right products

• Several questions:• User-product fatigue• Independent product choice assumption• Explore / exploit

• Solution 1: Randomization in the banners• Keep independent products assumption• Separate optimization process to shuffle displayed items

• Solution 2: A better scoring model• Score a full banner, not independent products• Store all product display counts

10

Page 11: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Issue #4: 8ms response time ! Tweaking for performance

• CTR / Sales prediction takes ~40 µs per candidate using in-house library• 2-step prediction:

• A fast pass to remove most of the candidates• A slow pass to score accurately the remaining candidates

And the technical fine print:

• All real-time code in C#

• Async I/O for better efficiency

• HAProxy to scale the front-end

• Memcached to store all required data in memory

11

Page 12: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

12

Upcoming challenges

• Long(er)-term user profiles• More and better product information (images, semantic, NLP)• Vertical-optimized engine

• Classifieds (catalog-free recommendation ?)• Travel

• Instant-update of similarities • (because batch computation is soooo last year)

Page 13: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Fancy a try ?

13

On your own:

With us !http://www.criteo.com/careers/

• Our 1st public dataset is online: http://bit.ly/1vgw2XC• 4GB display and click data, Kaggle challenge in 2014

• NEW : 1TB dataset released a few weeks ago• Hosted on Microsoft Azure, just waiting for you

Page 14: Making advertising personal, 4th NL Recommenders Meetup

Copyright © 2014 Criteo

Questions?