Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization...

48
Introduction to Data Science Week 10, Lecture 19 Jeff Hammerbacher March 20, 2012 1 Tuesday, March 20, 12

Transcript of Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization...

Page 1: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Introduction to Data ScienceWeek 10, Lecture 19

Jeff HammerbacherMarch 20, 2012

1

Tuesday, March 20, 12

Page 2: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

2

Lecture Outline▪ 0. In the news

▪ 1. Regularization

▪ 2. Gradient descent

▪ 3. Hyperparameters

▪ 4. Recommender systems at Facebook

▪ 5. Recommender systems at Yahoo!

Tuesday, March 20, 12

Page 3: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

0. In the news

3

Tuesday, March 20, 12

Page 4: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

4

Tuesday, March 20, 12

Page 5: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

5

Tuesday, March 20, 12

Page 6: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

6

Tuesday, March 20, 12

Page 7: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

7

Tuesday, March 20, 12

Page 8: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

8

Tuesday, March 20, 12

Page 9: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

9

= +

Tuesday, March 20, 12

Page 10: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

▪ Nielsen: television

▪ Arbitron: radio

▪ Scarborough: newspapers

▪ DMA: Designated Marketing Area

▪ 77 DMAs in the US▪ “a group of counties in which the commercial television stations in the metro/central area

achieve the largest audience share”

10

Tuesday, March 20, 12

Page 11: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

In the newsThe decline of print

11

Tuesday, March 20, 12

Page 12: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

1. Regularization

12

Tuesday, March 20, 12

Page 13: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Regularization

13

Tuesday, March 20, 12

Page 14: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

RegularizationSupervised learning

▪ Examples

▪ Features in X

▪ Labels in Y

▪ If Y is categorical, it’s classification▪ If Y is numerical, it’s regression

▪ Hypothesis space H

▪ Goal: find an f in H such that f(x) accurately predicts the label for x

▪ Loss function L(f(x), y) measures accuracy

14

{(x1, y1), ..., (xn

, y

n

)}

Tuesday, March 20, 12

Page 15: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

RegularizationEmpirical risk minimization

▪ Empirical risk

▪ Empirical risk minimization: find the f in H with lowest empirical risk

15

R

emp

(f ) =1

n

nX

i=1

L(f (xi

), yi

)

Tuesday, March 20, 12

Page 16: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

RegularizationMotivation

16

Tuesday, March 20, 12

Page 17: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Regularization

17

Tuesday, March 20, 12

Page 18: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

2. Gradient descent

18

Tuesday, March 20, 12

Page 19: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descent

19

Tuesday, March 20, 12

Page 20: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descent

20

Tuesday, March 20, 12

Page 21: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descentBatch

▪ Compute cost function and partial derivatives

▪ Update weights at each step

▪ If we’re making progress, increase the learning rate

▪ If not, decrease the learning rate

21

Tuesday, March 20, 12

Page 22: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descentBatch

22

Tuesday, March 20, 12

Page 23: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descentOnline

▪ Randomly shuffle the data set

▪ Compute update

▪ Known as stochastic gradient descent

23

Tuesday, March 20, 12

Page 24: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Gradient descentOnline

24

Tuesday, March 20, 12

Page 25: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

3. Hyperparameters

25

Tuesday, March 20, 12

Page 26: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

Hyperparameters▪ Regularization

▪ Regularization parameter

▪ Gradient descent

▪ Initial weights▪ Learning rate schedule▪ Batch size▪ Momentum▪ Stopping criteria

26

Tuesday, March 20, 12

Page 27: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

HyperparametersTuning

▪ Split data set into training, cross-validation (cv), and test

▪ Fit model on training set

▪ Tune hyperparameters on CV set

▪ Evaluate on test set

27

Tuesday, March 20, 12

Page 28: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

4. Recommender systems at Facebook

28

Tuesday, March 20, 12

Page 29: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

29

Tuesday, March 20, 12

Page 30: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

30

Tuesday, March 20, 12

Page 31: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

31

Tuesday, March 20, 12

Page 32: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

32

Tuesday, March 20, 12

Page 33: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

33Tuesday, March 20, 12

Page 34: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

34

Tuesday, March 20, 12

Page 35: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

35

Tuesday, March 20, 12

Page 36: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

36

Tuesday, March 20, 12

Page 37: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

37

Tuesday, March 20, 12

Page 38: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

5. Recommender systems at Yahoo!

38

Tuesday, March 20, 12

Page 39: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

39

3Deepak Agarwal @ Berkeley’11

Motivating application: Yahoo! front page

Recommend articles:ImageTitle, summaryLinks to other pages

For each user visit,Pick 4 out of a pool of K

Routes traffic to other pages(e.g.  sports,  news,  finance,…)  

1 2 3 4

Tuesday, March 20, 12

Page 40: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

40

5Deepak Agarwal @ Berkeley’11

Single module: Problem definition

• Display articles on Today Module for every user visit to– Maximize total clicks subject to constraints

• (Voice, freshness, diversity)– Clicks on links generate advertising opportunities on content

landing pages, content engage users and get them addicted to Y!– Click is a good proxy of positive user experience with content.

• Inventory of articles?– Created by human editors– Small pool (30-50 articles) but refreshes periodically

• Article lifetime short (6-24 hours)

• In this talk, for ease of exposition, assume content recommendation on a single slot – (the one with maximum exposure)

Tuesday, March 20, 12

Page 41: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

41

6Deepak Agarwal @ Berkeley’11

Where are we today?

• Before this research – Articles created and selected for display by editors

• After this research – Article selection done through statistical models

• Methods– Multi-armed bandit + elaborate statistical models – Expensive computation done offline, other (e.g. item level statistics)

updated online in epochs of 5 minutes. • Hundreds of millions of observations/day, ~600M visitors per month,

requires non-trivial infrastructure and engineering effort

• How successful ? (significant increase in clicks)"Just look at our homepage, for example. Since we began pairing our content optimization

technology with editorial expertise, we've seen click-through rates in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4, 2009)

Tuesday, March 20, 12

Page 42: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

42

7Deepak Agarwal @ Berkeley’11

Content Optimization in General

Users visits

Inventory ofArticles

Statistical algorithm selects article(s) to show

Gets feedback (click/no-click) States (model parameters) updated

Repeat (for each visit)

Goal: Maximize utility over long time period

Match making

Tuesday, March 20, 12

Page 43: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

43

8Deepak Agarwal @ Berkeley’11

• Items: Articles, web pages, ads, modules, queries, users, updates, etc.

• Opportunities: Users, query keywords, pages, etc.

• Metric (e.g., editorial score, CTR, revenue, engagement)– Currently, most applications are single-objective– May be multi-objective optimization (maximize X subject to Y, Z,..)

• E.g. Maximize revenue subject to 5% max loss in clicks

• Properties of the item pool– Size (e.g., all web pages vs. 40 stories)– Quality of the pool (e.g., anything vs. editorially selected)– Lifetime (e.g., mostly old items vs. mostly new items)

Important Factors affecting solution in Match-making Problems

Tuesday, March 20, 12

Page 44: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

44

10Deepak Agarwal @ Berkeley’11

Recommendation vs. Other Match-Making Problems

Recommendation Search AdvertisingMain Metric User engagement Relevance to the query RevenueItems Anything

(except for ads)Anything(except for ads)

Ads

Opportunities Push (implicit)The system guesses users info needs

Pull (explicit)Users specify their info needs

Push

Examples Recommend articles, friends, feeds to usersRecommend related items given an item

Web searchVertical search

Sponsored searchContent matchBehavior targetingDisplay advertising (non-guaranteed)

Tuesday, March 20, 12

Page 45: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

45

12Deepak Agarwal @ Berkeley’11

This is an Explore/Exploit Problem

Explore/Exploit high level idea• Two Items: Item 1 CTR= 2/100 ; Item 2 CTR= 25/1000

– Greedy: Show Item 2 to all; not a good idea– Item 1 CTR estimate noisy; item could be potentially better

• Invest in Item 1 for better overall performance on average– Show both Item 1 and Item 2

• Optimal choice of design is the Explore/Exploit problem

• Classical solutions: Multi-armed bandit– Gittins’  approach  (maximize  discounted  cumulative  reward)– Upper confidence bound schemes (minimize regret from best)

Tuesday, March 20, 12

Page 46: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

46

14Deepak Agarwal @ Berkeley’11

Bandit Problem: quick tutorial

• Consider a slot machine with two arms

p2(unknown payoff

probabilities)

The gambler has 1000 plays, what is the best way to experiment ?To maximize total expected reward

• Solution to this innocuous looking problem notoriously difficult.

• Gittins’  provided  a  principled  solution  under  some  assumptions• Lai later provided what are called Upper confidence bound policies (UCB)

p1 >

Tuesday, March 20, 12

Page 47: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

47

16Deepak Agarwal @ Berkeley’11

Content optimization: bandit problem

• Articles are arms of bandits, clicks are rewards , CTRs are unknown payoffs– Goal is to converge to the best CTR article quickly– But this assumes user visits are iid, no personalization

• Personalization– Each user is a separate bandit– Hundreds of millions of bandits (huge casino, multi-armed mafia)

• Other differences– Set of arms not fixed– Delayed response, need batched updates

• Scheme to serve items for next epoch of 5 minutes

• Math gets challenging– For practical applications: Need approximate solutions that are easy to compute at

run time (latency constraints)

Tuesday, March 20, 12

Page 48: Introduction to Data ScienceMar 20, 2012  · Deepak Agarwal @ Berkeley’11 7 Content Optimization in General Users visits Inventory of Articles Statistical algorithm selects article(s)

48

33Deepak Agarwal @ Berkeley’11

An experiment on Live Traffic on Y! Front Page

15% e/e (dotted line) 85% serve best using argmax mean (solid line)Y-axis: CTR lift relative to complete randomization

Tuesday, March 20, 12